Basics of Plop

From Jacobson Lab Wiki

Jump to: navigation, search


[edit] Specifying Residues

Residues are specified using the format "chain:number" (e.g., "A:134" to specify residue 134 in chain A). The term “residues” refers to both amino acids as well as ligands (each ion, explicit water, or ligand is considered to be a “residue” internally).

If a chain name is not specified in the PDB file (usually just single chain proteins), then use an underscore as the chain identifier (e.g., _:134). If there is only 1 protein chain loaded, then PLOP will accept incorrect chain identifiers (i.e., you use _ for the chain, but the only chain loaded is named "A").

As a convenience, the first residue of the protein can be written as "beg[inning]" and the last residue as "end".

To specify multiple residues at once, several commands provide the following options:

1. To specify a range, simply specify first and last residues, e.g. "A:123 A:145"

2. To specify all residues of a given amino acid type, use the following syntax "all lys".

3. To specify an entire protein chain, use “chain A”, “chain _”, etc.

4. To specify the entire protein, "all res[idues]" will work, or you can specify the entire range explicitly, e.g., "beg end".

5. To specify all residues with a certain secondary structure, the syntax is "ss helix" (all residues in helices), "ss sheet" (all beta sheet residues), "ss loop" (all loops), "ss tails" (all unstructured residues at the termini), "ss nterm" (only the tail at the N-terminus), and "ss cterm" (C-terminal tail residues).

6. For more complicated combinations, the residues to be included can be read from a file (one residue per line), using the syntax "file res.list", where res.list would contain the list of residues.

7. To specify just a single residue instead of a range in one of these commands, you can use, e.g., "single A:123", or you could still specify it as a range, albeit a range with only one residue, i.e., "A:123 A:123".

8. To specify residues within a certain distance of a particular residue, use, e.g., “within5.0 A:123” (in this case, all residues within 5 Å of residue 123).

[edit] Specifying Atoms

Occasionally it is necessary to specify a specific atom. The syntax is "chain:residue:atom"; that is, similar to the residue specification, except with an atom specifier tacked on as well. In PDB format, every atom in a residue (or HETATOM group) must have a unique "graph" name, e.g., "CB" for the beta-carbon in an amino acid, so these names are used as the atom identifiers. AS OF VERSION 1-7: THE WAY OF SPECIFYING ATOMS HAS CHANGED. A perennial problem has been the odd way that atom names are specified in the PDB, specifically their “justification”. All atom names are specified in 4 columns of the column-formatted PDB file. To make it completely unambiguous which atom you are referring to, PLOP now requires that you type out all four characters in these columns, substituting an underscore for any spaces. For example, to specify the charged nitrogen at the end of the side chain of Lys 99 in chain B, the syntax would be


and to specify one of the hydrogens attached to this nitrogen,


[edit] Command Structure

To make commands easier to remember, I try to use simple English words that convey the meaning, such as "sidechain predict". However, to reduce typing, usually only the first few letters of each word is required, i.e., "side pred". In the following, the optional part of the command words will be put in brackets, i.e., "side[chain] pred[ict]".

Most commands have both mandatory and optional arguments. The mandatory arguments come first. In the sidechain prediction example, the mandatory arguments specify which residues' sidechains to optimize, e.g.,

                       side pred _:37 _:43

to optimize the conformations of side chains on residues 37-43. For this command and most others, there are a variety of optional arguments that specify, e.g., parameters for the optimization; when these are not explicitly specified, default values are used which should be appropriate for many purposes. Although it is not required, it helps to make the input script more clear if each option is specified on a continuation line, e.g.,

                       side pred _:37 _:43 &
                             iter 20 &
                             randomize yes

Optional parameters may be specified in any order. The optional parameters can be of type logical (i.e., yes/no), integer, real, or a file name.

[edit] Preliminary Commands

Before loading a structure, there are a few basic parameters that are normally specified.

One required parameter for PLOP to be able to operate at all is the location of the "data directory", which contains force field parameters and other information about amino acids, such as rotamer libraries. Generally, this will always be in some fixed location, but if you want to play with modifying energy parameters or try a new rotamer library, it is convenient to create a copy of the data directory which you can then modify and use. The command structure is

                       data[directory] path

where path is the location of the data directory. It is good practice to specify the full path, so that there is no ambiguity, e.g.,

                       datadir /home/chemserv/friesner/jacobson/plop/data/

[Note to people in my group: I compile our local versions of PLOP with the default value for "datadir" set to my copy of the data directory on thales, i.e., /home/jacobson/plop/data. So you don't have to set this parameter at all for most purposes.]

Although not strictly required, it is usually helpful to name the PLOP job; the information written out will often make use of the job name, which makes it easy to keep track of what PLOP job created the files. The command is

                       job[name] name

where name is any character string with no whitespace. This could simply be the PDB identifier for the protein you are working on, i.e.,

                       jobname 2igd

or something more specific

                       jobname 2igd_loop2_run3

The default value of the parameter is "plop_job". The jobname can be changed in the middle of a PLOP script, but usually is not.

One final parameter that may be useful to set is a default location (directory) for files to be created by PLOP. This can be over-ridden for specific output files, but serves as a default. The command is

                       out[putdirectory] path

where path is any valid directory. The default parameter is the current working directory.

[edit] Nonstandard Residues

Plop supports not only standard amino acids but also a number of nonstandard amino acids by default. Ligands or nonstandard amino acids can be prepared using Schrodinger's hetgrp_ffgen utility bundled with PRIME.

To prepare a ligand, use:

 $SCHRODINGER/utilities/hetgrp_ffgen 2005 [maestro file]

This generates a 3-letter template file corresponding to the residue name. The Maestro file needs to have unique PDB Atom Names for each atom. If you would like to do any sampling with the ligand or nonstandard residue, you'll want to run

 $SCHRODINGER/utilities/python [options] [maestro file] has a number of options for preparing a properly-reordered template file, identifying ligand torsions to sample, generating backbone libraries for rings, etc.

[edit] Overlap Factors: A Measure of Steric Clash

Many of the PLOP algorithms rely on the rapid identification of steric clashes to eliminate conformations as being unreasonable before wasting time on calculating an energy (which would be huge). PLOP also reports steric clashes at various stages, including when loading a PDB structure. As a result, there is a need for a clear, simple-to-compute definition of a steric clash. The definition we use is the "overlap factor", which is defined as the ratio of the distance between two atom centers to the sum of their van der Waals radii. This definition has the advantage that it is independent of the atom types in question. This number will be less than 1 if the two atom spheres interpenetrate at all. Some interpenetration is normal, for example in a hydrogen bonding arrangement, and does not lead to a huge Lennard-Jones energy. Overlap factors of 0.8 or even 0.75 are common in high resolution x-ray crystal structures. An overlap factor of 0.7 would be a relatively minor steric clash, while 0.65 or lower would be more severe. Many of the prediction algorithms allow the specification of the overlap factor used to screen out conformations; there are default values set, but you may wish to override them in specific cases. For example, if performing optimizations on an initial structure with many steric clashes, it may be necessary to use a lower overlap factor to successfully generate any new (side chain/loop/helix) conformations at all.

Personal tools