Homology

From Jacobson Lab Wiki

(Redirected from Homology model construction)
Jump to: navigation, search

Homology is a top level plop command used to build homology models.

Specifically, this section deals with constructing a homology model from a sequence alignment. It is possible to do some rather sophisticated things with PLOP, including composite alignments (using multiple templates/alignments), incorporation of HET groups into the initial model, and creation of homodimeric (or other oligomeric) initial models. But first, the basics.


 homo[log] single alignment_file pdb_file &
   chain character &
   model integer &
   ofac real &
   conserve yes/no &
   x2p yes/no &
   g2x yes/no &
   sym bio/xtal/none &
   initfile file &
   gapfile file &
   insertfile file &
   x2pfile file &
   g2xfile file &
   sidefile file &
   finalfile file

The 2 mandatory arguments are the locations of the files containing the alignment and the template PDB file.

[edit] Template and Alignment

There are a huge number of alignment file formats, and I've just tried to make sure a few key ones are supported, esp. Blast/Psi-Blast. Gaps in the alignment can be represented by either dots "." or dashes "-". If your alignment is not working, then try removing as much extraneous junk as possible. A very simple default format is to simply start each line containing the target sequence with the word "Targ[et:]" and the template sequence with the word "Temp[late:]", following by any amount of whitespace, and then the sequence. Other lines in the alignment file will just be ignored. As an example,

 Targ:  VLITGLRTRAVNVPLAYPVHTAVGTVG-TAPLVLIDLATSAGVVGHSYLFAYTPVALKSL
 Temp:  PVVTEMQVIPVAGHD-SMLMNLSGAHAPFFTRNIVIIKDNSGHTGVGEIPG-----GEKI
 Targ:  KQLLDDMAAMIVNEPL.APVSLEAMLAKRFCLAGYT...........GLIRMAAAGIDMA
 Temp:  RKTLEDAIPLVVGKTLGEYKNVLTLVRNTF..ADRDAGGRGLQTFDLRTTIHVVTGIEAA

The complete target sequence (or at least as much of it as you wish to model) should be represented in the alignment file. This is the only information the program has about the sequence of the target. The program gets the template sequence from the template PDB file (as well as from the alignment file). The SEQRES records are used to define the template sequence, thus the template PDB file must contain the SEQRES records (real PDB files always have this, as do files generated by PLOP; files generated by other programs do not always contain these portions). The reason that PLOP uses the SEQRES records is that the ATOM lines may not contain every residue, which can create confusion. PLOP tries to be reasonably tolerant of mis-matches between the sequence as found in the alignment file and the sequence from the PDB file (especially missing "tail" residues), but there are limits.

If the template PDB file contains multiple chains or MODEL records, you will want to specify which chain to load, using the "chain" and/or "model" options.

[edit] Options

conserve
One key decision to make relates to the side chain conformations of residues that are conserved (i.e., identical) between target and template. For any aligned residues, the initial backbone coordinates are taken from the template. However, particularly at high sequence identity, it may be advantageous to copy not only the backbone conformations but also the side chains from the template, for the conserved residues; the parameter is "conserve yes". This can help, for example, to maintain active site residues in a reasonable conformation. Any and all side chains can be optimized later on. The only disadvantage to maintaining side chains in their template conformations is that it may make it more difficult for the algorithm to close the various chain breaks successfully, because it must navigate around fixed side chains. In practice, this is rarely an issue, but in recalcitrant cases, try "conserve no".
ofac
One other key decision is what level of steric clash is acceptable in the final structure. This is set by the value of an "overlap factor", representing the worst acceptable steric overlap. The default value if 0.75, but in a tough case (for example a very low sequence identity case with many gaps in the alignment), you may need to set it to a lower value to successfully build the model.
x2p
The major job for the homology model construction algorithm is the closing of chain breaks due to gaps in the alignment. However, it can also optionally optimize certain backbone regions in which there are highly nonconservative amino acid substitutions. A particular concern is mutations of any non-Pro residue in the template to Pro in the target. Because backbone conformations for Pro are restricted to a smaller region of the (f,y) conformational space than for other residues, simply imposing the backbone coordinates from a non-Pro residue in the template on a Pro in the target can induce significant energetic strain. (In addition, the peptide bond for Pro can be either cis or trans, whereas the cis conformation is extremely rare for other residues). By default, these nonconservative substitutions to Pro are optimized, but you can turn this off using "x2p no".
g2x
Conversely, backbone dihedral angles for Gly are less restricted than for other residues due to the lack of a side chain, and thus mutations of a Gly residue in the template to any non-Gly residue in the target can introduce substantial strain energy as well. Currently, the algorithm do NOT perform this optimization by default, but I tend to do so, using "g2x yes".
Intermediate files
A final problematic substitution involved any cis-Pro residue in the template that is aligned to a non-Pro residue in the target.
In each of these cases, the algorithm automatically rebuilds the region containing the problematic residue, using the same algorithm used to close the chain breaks. That is, the optimization region is gradually increased around the problematic residue until a satisfactory closed backbone conformation can be found. The algorithm proceeds as follows:
  • close chain breaks due to deletions
  • build in insertions
  • re-optimize regions containing nonconservative substitutions to Pro
  • re-optimize regions containing nonconservative substitutions from Gly
  • build/optimize side chains not taken from template
  • minimize all previously optimized portions of protein
Because there are many steps, it can be useful to see what has happened at any given stage, and there are a series of optional parameters specifying file names for writing out the structure (superimposed on the template so you can see what's changed) at various stages: initfile (before any optimization), gapfile (after "deletions" closed), insertfile (after insertions built), x2pfile, g2xfile, sidefile (after side chain optimization), and finalfile (after final energy minimization).

[edit] Composite Models

Building a "composite" model using 2 or more templates requires a little more work. The first step is to align the relevant templates however you like, such as CE. HOWEVER, THERE IS ONE MAJOR CHANGE THAT MUST BE MADE BY HAND TO CE FILES. As discussed above, the homology building code requires that the sequence be specified in the SEQRES section of the PDB file, as in standard PDB format. This is to ensure that there are no problems relating the sequence in the alignment files to the sequence in the PDB files. CE does not do this, but it is fairly easy to just copy over the relevant sections by hand from the original, unaligned PDB files. The alignments to the templates can just be in separate files. Lastly, you must create a file that indicates which template should be used for the various portions of the model. This file has a very simple format, e.g.,

 Targ: MSANFTDKNGRQSKGVLLLRTLAMPSDTNANGDIFGGWIMSQMDMGGAILAKEIAHGRVV
 Temp: .............11111111111111111111112222222222222222222222111

where the "target" lines just contain the target sequence, and the "template" line specifies which template to use for that residue (the numbers correspond to the order in which the templates are specified in the PLOP command, see below). If you don't want to use either template for a given portion, just use dots. Then, the structure of the composite homology building command is as follows:

 homolog composite &
   temp /home/jacobson/casp5/t132/1bvq.align &
   /home/chemserv/friesner/jacobson/casp5/t132/ce.pdb &
   model 2 &
   chain B &
   temp /home/jacobson/casp5/t132/1mka.align &
   /home/chemserv/friesner/jacobson/casp5/t132/ce.pdb &
   model 1 &
   chain A &
   compfile /home/jacobson/casp5/t132/comp.align

Here, both template structures are in the same (CE) file. CE has a somewhat unusual format in which each structure is given both a different chain name (ABC...) and a different model number (123...). But it works fine if you just specify these as above. The "compfile" is the file which specifies how to build the composite.

The building process proceeds as normal, except that there is now a new type of chain break to close, which is caused by two adjacent residues being taken from 2 different templates (even if the templates are well aligned, it's still best to rebuild these joints).

Personal tools