Loop prediction

From Jacobson Lab Wiki

Jump to: navigation, search

The basic command is:

 loop predict residue selection &


[edit] Options controlling generation of loop candidate structures

 ofac real

This is the “overlap factor”, which defines what we mean by a steric clash. The default value is 0.7. Lower values may be appropriate when using low-resolution structures, or if loop prediction with the default value results in no loops generated.

  nconf_min integer

This the minimum number of loops to be generated by the loop build-up algorithm. The default is 2^Nres, where Nres is the number of residues in the loop. It may be necessary to decrease this, particularly for long loops, if the number of loops “blows up” and exceeds the allocated memory (currently set to 250k loops).

 ideal[ize] yes/no

If yes, then impose “ideal” bond lengths and angles during the loop build-up. If no, then use the bond lengths/angles from the input structure.

 mid_loop residue

The loop build-up procedure splits the loop into 2 parts, and then builds up from both sides. The midpoint of the loop is identified automatically, but if you want to pick it yourself, you can do so with this option. This can occasionally be helpful if one half of the loop is much “floppier” than the other.

 mid_move yes/no

By default, the algorithm moves the “break-point” of the loop if the number of conformations on each side becomes grossly imbalanced. But you can turn this behavior off with this option.

 cling  yes/no

Protein loops very rarely just dangle out in solution (at least when they adopt a well-defined structure at all); typically they form contacts with the body of the protein, through side chains. By default, the build-up algorithms prevents the loop from adopting conformations where the loop travels from the body of the protein, to improve the sampling efficiency. But in some cases you may want to remove that constraint with this option (i.e., “cling no”), e.g., predicting a floppy loop involved in ligand binding.

 ofac real

After building up loop conformations from both sides, the fragments have to be joined in the middle. The middle residue can wind up adopting highly strained conformations as a result of the closure procedure, and the algorithm weeds these out. This parameter represents the maximum angle deviation (in degrees) from the Ramachandran allowed regions for phi/psi, deviation from ideality for the N-Calpha-C bond-angle, and deviation from planarity for the peptides. Default: 25 degrees.

[edit] Options to specify names of output files

 rmsdfile file

This file contains energies and RMSDs (to the native or to a reference structure loaded using “load native”). Format:

 Rank  Model#  Energy  RMSD1 RMSD2 RMSD3 RMSD4

“Rank” is the energy rank of the loop. There are 2 special loop conformations listed: “-1” is the minimized starting loop structure, and “0” is the side chain optimized and minimized starting loop structure. The energies of these structures are frequently useful points-of-comparison (did we generate any loops lower in energy than where we started?). “Model#” is an identifier for the loop structure, reflecting the order in which it was generated by the program; it corresponds to the MODEL record number in the “pdbfile”. The four default RMSD values represent: global backbone (RMSD1) and all-heavy-atom (RMSD2) rmsd’s, and local backbone (RMSD3) and all-heavy-atom (RMSD4) rmsd’s. (Global refers to aligning the body of the protein; local refers to aligning just the loop itself.)

 pdbfile file

This contains the energy minimized loop structures generated in the course of the prediction, organized by MODEL records, as well as the minimized and side chain optimized starting structures (MODELs “-1” and “0”), and the complete starting structure (MODEL “-2”), for convenience. For all generated loops, only those atoms that are actually “moving” during the simulation are included. This is to reduce the size of the files.

[edit] Options to specify side chains to optimize on body of protein

The default behavior of the loop prediction algorithm is to optimize only the loop and sidechains on the loop itself, keeping the remainder of the protein rigid. But for many applications, including homology modeling, this is often inadequate, and it is necessary to allow side chains on the body of the protein near the loop optimize as well (i.e., when you can’t assume that these side chains are in reasonable conformations to begin with). These options allow you to do this:

 sidecut distance

Optimize all side chains within a distance cutoff (in Å) from the initial loop conformation. That is, during the side chain optimization and minimization of loop candidates, these side chains will be optimized along with those on the loop itself.

 sideadd [residue selection]

Optimize specific side chains that you specify with the usual side chain selection options.

 sidefrz  yes/no

This option determines whether side chains specified by "sidecut" or "sideadd" are included during the loop build-up or if they are temporarily deleted. "yes" means the side chains are included and is the default setting. "sidefrz no" is helpful when the side chains surrounding the loop are so far off from where they "should" be, that they physically block the loop from adopting the native conformation during the build-up. Note that "sidefrz no" can dramatically increase the sampling space depending on the loop length and the number of side chains that are listed.

[edit] Options to constrain the loop prediction

For many purposes, it is useful to restrict the sampling during the loop build-up, based on either the Cartesian coordinates or dihedral angles.


Constrains the C-alpha atoms from moving more than some distance (in Å) away from the initial positions. You can apply this to either a single residue or all residues: “maxcalpha 5.0” applies a 5 Å constraint on all C-alpha atoms, which “maxcalpha A:50 5.0” applies it only to residue 50 on chain A.


Analogous to “maxcalpha”, except that it applies constraints to the dihedral angles (both phi and psi), in degrees, to one or all residues.

 constrain atom distance x y z

This is similar to “maxcalpha” but more general. It can be applied to any atom in the loop, and it does not depend on the initial structure (the “x y z” parameters specify the center of a sphere, in the Cartesian space of the protein, with a radius “distance”; the atom must be found within that sphere if the loop is to be accepted).

 cross[link] movable_atom1 fixed_atom2 distance 

This is similar to “constrain” but figures out the x, y, and z for you. It can be applied to any atom in the loop, and the coordinates of movable_atom1 are compared with the coordinates of fixed_atom2 from the structure in memory before sampling. The atom must be found within that distance to fixed_atom2's coordinates (taken from the structure in memory before sampling) if the loop is to be accepted.

 helix residue constraint

Constrain the dihedrals (phi/psi) for this residue to be roughly within the alpha-helix portion of the Ramachandran plot. “constraint” is in degrees, and basically specifies how close to an “ideal” helix the residue is required to be.

 strand residue constraint

Analogous to the “helix” option, but for the beta-sheet portion of the Ramachandran plot.

[edit] Support for nonstandard amino acids and ligands

Amino acids or ligands with nonstandard side chains and even nonstandard backbones can be included in the loop prediction. The file plop/PlopRotTemp.py can be used in combination with Schrodinger's Maestro and PRIME products to prepare these nonstandard residues.

[edit] Nested options

The loop prediction algorithm calls the side chain optimization code twice for each loop (see the paper for details), and so you can pass options to the side chain code using “side1” or “side2” (for the 2 different optimizations), followed by any of the side chain prediction options. The loop prediction algorithm also uses clustering routines, and you can pass options to them using the “clust” option, as described here.

Personal tools