Methods
SPRITE:
Enter the PBD ID for the protein of interest and exclude 2-residue hits. Once the search is complete, browse "List of Hits" results obtained. View the "Full details" result alignments. It is also possible to view the alignment between the protein of interest and the matched protein. View hits by each side of protein by viewing "Arranged by sites" function. Review alignments with an RMSD below 2.0 Angstroms and determine whether results are consistent with established function of the protein. Capture images of alignments that match proteins of interest the best and record RMSD values.
Chimera:
Enter PBD ID of protein of interest and click fetch in order to show the structure of the protein. Using the results from SPRITE, load the protein of known function into Chimera using its PBD ID. Hide everything but the subunit of interest, and then align the active site motif (type in "match... followed by any atoms of interest). An RMSD value will be shown, and use that to determine the quality of alignment (anything below 2.0 Angstroms is considered high quality). To better visualize the alignment, delete "match" and replace it with "sel". Make it so that the structures align in a similar way that they did on SPRITE. Name and save the file for future use.
Dali:
Open the website and click on PDB search lab, enter the four-letter PDB of the assigned protein (3B7F). Next, submit the structure with the chain identifier (never use one with DNA, pick one representing the sequence the group is interested in). Enter a meaningful description in the job name field, enter email address, submit the job and wait for a link to be emailed. Once the link is emailed, download the results.
BLAST:
Go to RCSB webpage and search for the protein by PDB ID, explore the page for the protein. Select "FASTA sequence: and look at the protein sequence and copy it. Go into the NCBI BLAST search page and paste the protein sequence into the sequence field. Using the sequence that shows up, try and identify proteins similar to the query protein.
InterPro:
Perform an InterPro search for the sequence of 3B7F. Conifer protein superfamily identification and domains found. Identify related proteins in the domain organization. Think about what these related proteins have in common with the protein of interest, and what the function of the domains are. Repeat the InterPro search using "View a Structure" link on the main page. Explore the links that InterPro provides and compare these findings with the seqience search.
Molecular Docking with SwissDock:
Molecular Docking with SwissDock:
Select "docking with AutoDock Vina" tab. Submit a ligand by using SMILES string (found through the PubChem database). Click "prepare ligand", and submit a target (can use PDB ID). Define the search space, choose x, y, and z coordinates for the center of the space being searched. Check the parameters and start docking. To view the results: click on "here" link provided in the email. A window with an image of the protein will show up. Below the structure, a table with ranked sets of models and energies will show up. Can overlap multiple ligand poses, and a .zip file can be exported and viewed in Chimera.
Making Buffers and Solutions:
General steps for Buffers and solutions include adding 80% of total DiH2O to a container, weighing and adding chemicals, adjusting pH as needed, and adding DiH2O to the total volume.
Hand Casting Polyacrylamide Gels:
Made a 4% stacking gel at a pH of 6.8, and a 10% resolving gel at a pH of 8.8.
Prepare the resolving and stacking gel solutions without APS or TEMED. Place a comb into the assembled gel sandwich with a marker, place a mark on the glass plate 1 cm below the teeth of the comb, and remove the comb. Add the APS and TEMED to the resolving gel and pour the solution to the mark. Using a Pasteur pipet overlay the monomer solution with water-saturated n-butanol. Allow the gel to polymerize for 45-60 minutes. Once done, pour off the overlay solution and rinse the top of the gel with diH2O. Dry the area above the separating gel with filter paper before pouring the stacking gel. Place the comb in the cassette and tilt so that the teeth are at a 10º angle. This prevents air from becoming trapped under the comb. Add APS and TEMED to the stacking solution and pour the solution down the spacer. Pour until all of the teeth are covered by the solution. Realign the comb in the sandwich and add monomer to fill the cassette completely. Allow the gel to polymerize for 30-45 minutes. Store the gels wrapped in a wet paper towel in the 4ºC fridge.
Expression of Proteins from Lactose-Inducible Vectors:
First, make the LB Broth by adding 10g of tryptone, 10g of NaCl, and 5g of yeast extract. Add all of it into 1000 mL of millipore water. Move 5 mL of this mixture into an overnight culture tube. Autoclave it for 1.5 hours.
Protein Purification:
500 mL of the protein was grown rather than 1L in an attempt to speed up induction. The overnights were grown at night on 03/16/2025 for 11 hours with 50 µg/mL of kanamycin. It was inoculated at 8 am on 03/17/2025. OD600 nm was taken in the morning until 0.4-0.8 OD (an of of 0.4 was reached around 10 am). It was then induced with 1 nM IPTG, and left to grow for three hours before centrifuging. In order to purify, the samples were centrifuged at 5000 x g for 20 minutes. 10 mL of lysis buffer was added and the pellets were resuspended with a pipette (50 µL was added to a seperate centrifuge tube). The cells were sonicated 5x for 30 seconds on ice in between each sonication (50 µL of a sample was added to a new centrifuge tube). The samples were centrifuged for 20 minutes at 15,000 x g (50 µL of a sample was added to a new centrifuge tube). The protein column was set up with 500 µL Ni-NTA beads and a lysis buffer was ran through it to equilibrate it (5x column volumes). The resin was pre-washed with a binding buffer. The cell extract was applied to the resin and allowed to enter. ALL of the supernatant was added (50 µL of a sample was added to a new centrifuge tube). The column was washed with 5 column volumes of buffer (50 µL of a sample was added to a new centrifuge tube). The column was eluted with 8 column volumes of buffer, the fractions were collected in 1 mL volumes and stored in 5 tubes. To store the column, 5 column volumes of water, and 1-2 mL of 20% EtOH was added and the column was capped off and stored.
Protein Concentration:
Obtain cuvettes for
SDS-PAGE:
Protein Activity Assay:
Structural Alignment Through SPRITE, Chimera, Dali, and BLAST
Based on the findings through SPRITE and Chimera, 1XNY_C00 had the lowest RMSD value at 1.875 angstroms. Therefore, it is hypothesized that 3B7F was a carboxylase.
Dali gave mainly xyloglucanase matches and no matches were carboxylases, the hypothesis that 3B7F was a carboxylase was proven to be wrong. It is not hypothesized that 3B7F is a xyloglucanase. This was due to the fact that 1XNY did not show up as a match in Dali.
Xyloglucanases break down xyloglucan, a hemicellulose in plant cell walls. The substrate is xyloglucan and water for the (endo-)beta-1,4-xyloglucanases, and a common cofactor is calcium.
The BLAST search shows glycosyl hydrolases having similar sequences to 3B7F.
Using InterPro to Predict Protein Function
Research shows that 4Q7Q is a member of the SGNH Hydrolase protein super family. BLAST and InterPro both suggested 4Q7Q’s inclusion in this family, and the known conserved residues seen from SPRITE analysis—Serine, Glycine, Asparagine, and Histidine—line up with those observed throughout this family.D,E Notably, this superfamily is also referred to as the GDSL Hydrolase superfamily.D,E
4Q7Q’s inclusion in this family also supports its SPRITE-derived hypothetical functionality. Rhamnogalacturonan Acetylesterase—the enzyme with one of the best SPRITE-based alignment relative to 4Q7Q—is a member of this family.F Proteins in this family are also known for containing a “unique hydrogen bond network that [stabilizes]” the active site.F
Regarding what protein family 4Q7Q belongs to, DALI results suggest it is a part of a sub-family of the greater GDSL/SGNH superfamily. A PDB90% DALI search labels 4Q7Q as a part of the “Lipolytic Protein G-D-S-L Family,” which refers to enzymes that hydrolyze lipid substrates.I
Molecular Docking with SwissDock
The primary sequence of 4Q7Q shows several conserved sequences between it and esterase-like proteins. A sequence of GDSI—similar to the GDSL sequence seen from its family and superfamily—can be seen between 4Q7Q and enzymes like Isoamyl Acetate-Hydrolyzing Esterase. Other noteworthy conserved sequences between esterases and 4Q7Q include GxND and DGxH.
These enzymes also share similar secondary structures. Segments of alpha-helixes and beta-sheet strands appear and remain nearly entirely conserved throughout esterase analysis. A few conserved coils appear, but these sections do not appear as often as the other two secondary structures.
Similar conserved sequences could be found between 4Q7Q and lipases. The GDSI, GxND, and DGxH sequences can be seen from lipases like 7BXD.? The same secondary structure segments can also be located in the lipases analyzed.
Protein Purification
Right-hand SPRITE analysis revealed 4Q7Q exhibited residues like those seen from enzymes operating with acetyl-like substrates. Specifically, residues Ser. 30, Gly. 69, Asn. 97, Asp. 251, and His. 254 on the A and B chains of 4Q7Q line up with similarly positioned residues on esterases like Platelet-Activating Factor Acetylhyd (PAFA), which exhibited an RMSD of 0.25 angstroms when compared to 4Q7Q.
Other proteins with similar motifs of note are Thioesterase I and Rhamnogalacturonan Acetylesterase, with RMSD values of 0.46 and 0.61, respectively. These alignments focus on the same active site as PAFA did, suggesting the acetyl-like substrates 4Q7Q focuses on are similar to esters.
PFAM graphics from DALI revealed significant structural equivalence between 4Q7Q, a lipase-like protein, Rhamnogalacturonan Acetylesterase, and Sialate O-acetylesterase.
Protein Concentration
SwissDock analysis showed a preference for larger molecules, specifically fatty acids. Lactide, Ethyl Butyrate, and Triethylene Glycol exhibited noticeably weak binding affinities to the theorized active site of 4Q7Q. These ligands may be ill-suited to act as substrates for 4Q7Q as they are remarkably polar, and lipids—one of the potential categories of substrates for 4Q7Q—are mostly non-polar.
Despite this, these ligands show noticeable hydrophobic interactions with the active site. This implies 4Q7Q uses hydrophobic regions to help guide substrates into the right orientation for enzymatic processes. This also further supports the possibility that 4Q7Q primarily operates with hydrophobic lipid-based substrates. This also explains why Methyl Acetate exhibited a relatively weaker affinity for 4Q7Q, as its smaller structure prevented hydrophobic interactions.
Protein Analysis
Highlight the data that helped you come to your conclusion here including any relevant figures. Make sure include potential substrates and binding sites.