Geometric Methods in Structural Computational Biology by Lydia Kavraki - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

bonded atoms. Given four consecutive atoms A i − 2 , A i − 1 , A i , and A i + 1 , the dihedral angle is defined as the smallest angle between the planes π 1 and π 2 , as shown in the figure. Variation of

the dihedral angle is a consequence of rotation of the two outer bonds about the central bond.

Figure 24. A Dihedral Angle

π 1 is the plane uniquely defined by the first three atoms A i − 2 , A i − 1 , and A i . Similarly, π 2 is the plane uniquely defined by the last three atoms A i − 1 , and A i , and A i + 1 . The dihedral angle, θ, is defined as the smallest angle between these two planes. You can read more about the angle between two intersecting planes here.

In this module, because bond lengths and bond angles are being ignored as underlying degrees of

freedom of a protein, the only remaining degrees of freedom are the dihedral rotations.

Representing protein conformations with the dihedral angles as the only underlying degrees of

freedom is known as the idealized or rigid geometry model. Ignoring bond lengths and bond

angles greatly reduces the number of degrees of freedom and therefore the computational

complexity of representing and manipulating protein structures. Even more efficient

representations which reduce the number of degrees of freedom even further exist [link], but these are beyond the scope of this introduction.

index-33_1.jpg

Dihedral Representation of Protein Conformations

All amino acids share the same core of one nitrogen, two carbon, and one oxygen atoms. This

shared core makes up the backbone of the protein. There are two freely rotatable backbone

dihedral angles per amino acid residue in the protein chain: the first, designated φ, is a

consequence of the rotation about the bond between N and , and the other, ψ, which is a

consequence of the rotation about the bond between and C . The peptide bond between C of one

residue and N of the adjacent residue is not rotatable.

The number of backbone dihedrals per amino acid is 2, but the number of side chain dihedrals

varies with the length of the side chain. Its value ranges from 0, in the case of glycine, which has

no sidechain, to 5 in the case of arginine.

Figure 25. Dihedral Angles in Arginine

The backbone atoms appear at the bottom of the illustration (the peptide bond is not rotatable). The sidechain dihedrals are conventionally designated by χ and a subscript.

One can generate different three dimensional structures of the same protein by varying the

dihedral angles. There are 2N backbone dihedral DOFs for a protein with N amino acids, and up to

4N side chain dihedrals that one can vary to generate new protein conformations. Changes in

backbone dihedral angles generally have a greater effect on the overall shape of the protein than

changes in side chain dihedral angles. Think about why.

Protein Forward Kinematics

Kinematics is a branch of mechanics concerned with how objects move in the absence of mass

(inertia) and forces. You can imagine that varying the dihedral angles will move a protein's atoms

relative to each other in space. The problem of computing the new spatial locations of the atoms

given a set of dihedral rotations is known as the forward kinematics problem.

The importance of this problem to protein modeling and simulation should be clear: as stated

earlier, the only internal degrees of freedom usually considered for a protein are its dihedral

angles. Thus, moving a protein will be achieved by setting some of its dihedral angles to new

values. For some applications, such as the rendering of an image of the protein and the

computation of its Energy, however, the Cartesian (x,y,z) coordinates for each atom are needed.

These are obtained by forward kinematics.

Mathematical Background: Matrices and Transformations

The math involved in solving forward kinematics requires some background in linear algebra,

specifically in the anatomy and application of transformation matrices. The links provided in this

section should provide enough mathematical background to understand the rest of this module and

eventually write a simple protein manipulation program.

Background on Transformations

Transformation Matrices: The main transformations you will apply to polypeptide chains will

be a combination of translations and rotations. Please see introduction to translations and

introduction to two- and three-dimensional rotations. One special rotation matrix is the

Euler matrix . Please pay particular attention to the different conventions used for defining the Euler matrix. The one adopted for this module is the XYZ convention (there is also the ZXZ

convention). Now that you know what an Euler matrix looks like, you need to get familiar with

rotations about an arbitrary vector or line. Please read more on rotations around an arbitrary

vector .

Homogeneous Transformations: The use of homogenous coordinates and transformations can

simplify some of the calculations involved in using three-dimensional transformations. In

particular, they allow translation, which is not a linear operator in 3D, to become a linear

operator in the 3D subspace (x,y,z,1) of a 4D space. The advantage of this representation is that

translation becomes achievable by multiplying a vector by a matrix, and so becomes

composable. A direct benefit from this is the ability to express, as a matrix, a rotation around

an arbitrary point, not just the origin as in the pure 3D case. See homogenous

transformations .

Quaternions Quaternions are an efficient, robust method of representing three-dimensional

rotations. In particular, they are not subject to the undesirable singularities and numerical

instability of rotations represented by orthonormal matrices and Euler angles. Please visit this

introduction to quaternions to see how they relate to homogenous transformations. In this class quaternions will be used for the optimal structural alignment of two proteins and it is

recommended that the reader familiarizes him/herself with the concept of quaternions as soon

as possible.

A more detailed discussion of spatial descriptions and transformations can be found in chapter 2

of [link]. The most widely used transformations to manipulate protein chains are rotations.

Several representations are possible for rotations:

Euler angles: The orientation of an object is given as three rotations about set axes. For

example, in the ZXZ convention, the angles specify a rotation about the global z-axis, followed

by one about the global x-axis, and finally, one more about the global z-axis. The use of Euler

angles is subject to an undesirable phenomenon called gimbal lock, in which two of the

rotational axes become aligned in such a way that a degree of freedom is lost.

Cardan angles: The orientation is specified as a set of three rotations about axes defined by the

object. The typical example is the pitch-roll-yaw set of rotations for an aircraft. Pitch

corresponds to a rotation about the axis from wingtip to wingtip. Roll corresponds to a rotation

about an axis from the nose to the tail, and yaw corresponds to rotation about a third "vertical"

axis through the center of the plane, and roughly corresponds to a notion of horizontal heading.

This method is also subject to gimbal lock.

Axis-angle representation: It can be proven that any three-dimensional rotation can be

represented as a single rotation about an axis, represented by a unit vector.

Rotation matrices: A rotation matrix is an orthonormal matrix that represents a rotation.

Rotation matrices are discussed later in the module. Applying the matrix to a vector yields the

rotated vector. Given two rotations represented by matrices A and B, the result of applying both

rotations in sequences is given by the matrix product AB.

Unit quaternions: A rotation of angle theta about the axis represented by the unit vector v =

[x, y, z] is represented by a unit quaternion. Quaternions are described in this module.

Forward Kinematics

As stated earlier, a common operation when manipulating proteins in silico is to retrieve the

Cartesian coordinates of each atom in the protein from our knowledge of its dihedral angles and

rotations applied to them. For simplicity, assume we have an anchor atom and we are modeling

the protein backbone only, that is, the protein consists of a serial linkage composed of consecutive

backbone atoms, as shown in Figure 5.

A Simple Approach

The simplest way to represent a protein chain is to store the Cartesian (x,y,z) coordinates of each

atom at all times. These coordinates are relative to some global coordinate frame which is

unimportant, for example that in which the atomic positions were obtained by X-Ray

crystallography and which are typically read from the PDB files. These coordinates can be

changed if so desired. Common changes are to remove the center of mass (thus centering the

protein at the global origin), subtract the position of the anchor atom (to center the protein at this

atom), etc.

index-36_1.jpg

index-36_2.jpg

But it was discussed earlier that the "natural" degrees of freedom for kinematic manipulations are

usually the dihedral angles alone. This means that algorithms that operate on dihedral angles to

achieve their goals will normally require a way to modify the Cartesian coordinates when dihedral

rotations are performed, to reflect the new atomic positions. This can be easily done with rotation

matrices as follows.

Figure 26.

A protein backbone as a serial linkage.

When a rotation of θ degrees around bond i is performed, one can think of all atom positions

starting at i+2 rotating around the axis defined by bond i, and all other atoms (from anchor to

atom i+1 inclusive) remaining stationary. Thus, upon such a rotation, the Cartesian coordinates of

the atoms after the bond need to be updated, and their new values are given by:

Where [x,y,z,1] is the position of a generic atom in homogeneous form, [x',y',z',1] is its position

after the rotation (T is the transpose operator), and R(i,θ) is a 4x4 matrix that encodes a rotation of

θ degrees around an axis coinciding with bond i that passes through atom ai, and is given in

homogeneous form as:

index-37_1.jpg

index-37_2.jpg

index-37_3.jpg

In the above formula, T(x) is a translation by the vector x and R0(axis,θ) is a rotation around an

axis that goes through the origin of the specified coordinate system. As can be seen, this rotation

around an arbitrary point is realized by translating the point to the origin, rotating the target atom

around the axis through the origin, and then translating it back (the composition of these 3

transformations yields a unique 4x4 homogeneous matrix that achieves the same effect). The axis

of rotation can easily be computed from the positions of atoms i and i+1 and must have unit norm.

To perform successive rotations about different bonds, this procedure can be repeated, updating

the Cartesian coordinates for each rotation. Note that the convention used for matrix-vector

multiplication is to multiply column vectors by matrices on the left, so the rightmost

transformation gets applied first, and so on. This is the convention used in most of the literature,

but the alternate convention is possible (multiplying row vectors with matrices on the right; these

matrices are the transpose of the column-vector convention).

Alternatively, if many rotations need to be performed at the same time (and the intermediate

Cartesian coordinates are not needed), these rotations could be sorted by bond number and applied

simultaneously, by noting that rotations can be performed in a cumulative way as the backbone is

traversed from anchor to end atom. The ability to chain rotations around arbitrary vectors in space

(i.e. not through the origin) is one of the main benefits of homogeneous transformations. For

example, if two rotations need to be applied at the same time, one around bond 3 by 30 degrees

and another around bond 7 by 15 degrees, the atoms between bonds 3 and 7 get updated by:

But the atoms after bond 7 are updated by:

In the above, bond n is the unit vector defined along bond n, easily computed by subtracting the

coordinates of atoms n+1 and n, and then dividing by its norm. The chaining of transformations as

explained above is very useful to achieve arbitrary rotations of bonds within a protein. Sections of

the protein (i.e. atoms belonging to certain residues) can be updated when a dihedral rotation is

performed simply by constructing the overall matrix that should affect them.

Denavit-Hartenberg Local Frames

The previous approach, while simple and intuitive, has some shortcomings:

The accumulation of math operations in the rotation matrices is prone to numerical instability.

After only a couple of hundred rotations of a point, each accumulating on the other, the final

position of the point may start differing significantly from its actual, intended position. As a

consequence, the relative position and orientation of atoms in the protein chain will no longer

be in agreement with the protein structure. In particular, bond lengths and angles will begin

stretching and deviating from their physically acceptable values.

The actual values of the Cartesian coordinates are always stored in a particular, arbitrarily

chosen frame of reference. For example, if we wanted to translate the protein, we would need to

modify the Cartesian coordinates stored.

Once a rotation is applied, the method "forgets" the current values of the dihedral angles, which

would need to be re-computed if needed. What is stored is a snapshot of the current Cartesian

coordinates of each atom.

The original definition of Forward Kinematics, however, is a method to obtain the Cartesian

coordinates of each atom from the current values of the internal degrees of freedom (dihedral

angles in our case) at any time. In such an approach, the Cartesian coordinates need not be

recomputed after every change in the dihedral angles; rather, the idea is to store the current values

of the dihedral angles, and to have a procedure to reconstruct the atomic positions when needed.

The advantages of this approach are:

A more compact representation of the variables of the problem, since the dihedral angles

require less space than the (x,y,z) coordinates of each atom (the protein topology requires the

values of the bond lengths and angles anyway, so the total amount of numbers to store is

comparable).

It is not prone to numerical instability since the number of rotations performed to position an

atom is always its sequence number in the chain. (Actually if the chain is thousands of residues

long, some uncertainty could arise in the position of atoms far along the chain, but the relative

position of consecutive atoms can still be kept under control, avoiding bond stretching).

Performing a dihedral rotation consists simply of adding/subtracting the rotation angle from the

stored value for each angle. In particular, simultaneous rotations (i.e. rotating more than one

dihedral angle at a time) which consists of multiplying many 4x4 matrices in the global

method, reduces to modifying the angle values.

There is no explicit global coordinate frame for the protein. It can be positioned arbitrarily by

prepending a position/orientation matrix to the forward kinematics computation.

The only preprocessing step that is necessary to start working with this method, however, is to

perform an initial pass on the protein to extract the initial values of the dihedral angles and the

constant bond lengths and angles, from the Cartesian coordinates available from PDB files, if the

intention is to start from the protein's native state. This is easily done; bond lengths can be

obtained by computing the distance between the bonded atoms, and bond angles by computing the

index-39_1.jpg

index-39_2.jpg

angle between the vectors formed by two consecutive bonds (recall that the dot product of two

vectors yields the product of their lengths times the cosine of the angle between them). Next, we

present the transformations required for the Denavit-Hartenberg method.

Consider three consecutive bonds as in the figure below. Suppose that a local coordinate frame is

attached at the beginning of each bond. For example, local coordinate system x i − 1 , y i − 1 , z i − 1 is centered at atom A i − 1 . Therefore, imagine that the position of each atom in three-dimensional

space is specified in terms of a frame that is centered at the previous atom. Given the frames at

atom A i − 2 , and atom A i − 1 , one can determine how the frames at atoms A i and atom A i − 1 will change in space as a consequence of a rotation around the bond that connects atoms A i − 1 and

A i with the dihedral angle [link]. The correct transformation can be computed in terms of three primitive operations: two rotations and one translation. The two rotations are a rotation around the

dihedral bond by the dihedral angle and a rotation around an axis perpendicular to the bond angle,

by the bond angle. The translation refers to the fact that the origins of the frames are on the

respective centers of the atoms connected by the bond, thus separated by bond lengths.

Figure 27. The Denavit-Hartenberg convention

To describe the position of atom i in terms of the coordinate frame centered at atom i-1, two rotations and a translation are composed.

The order in which to compose these 3 transformations, to obtain the total transformation that

expresses the position of atom i in terms of frame i-1, is the following:

where the rotation axes are the usual x (1,0,0) and z (0,0,1),

not to be confused with the DH Local Frames. The resulting homogenous transformation is shown

below.

index-40_1.jpg

index-40_2.jpg

index-40_3.jpg

Figure 28. Transformation

Homogeneous transformation to express the coordinates of atom i in terms of the frame centered at i-1

Note that θ i is the dihedral angle on bond b i and α i − 1 is the bond angle between bonds b i − 1 and b i . d i is the length of bond b i . For a more detailed derivation of this transformation, please read the included material in required readings. The position of any atom in the molecule can be

determined by chaining matrices of the form given above. For example, suppose that b i , b i − 1 ,

..., b 1 , represents the sequence of bonds on the path from a particular atom a to the anchor atom

a anch . Then, for atom a, its Cartesian coordinates with respect to the frame attached to the anchor

atom is given by:

Figure 29. Equation 1

The coordinates of atom a with respect to the local frame attached to it are 0, 0, 0.

To complete the description, one can allow for rotations or translations of the local frame attached

to the anchor atom with respect to some global frame. Rotations of the anchor atom with respect

to a global frame cause a rigid rotation of the entire polypeptide chain. To do so, one can define

the rotation frame as the Euler matrix defined by the Euler angles of the local frame of the anchor

atom to the global frame. As discussed before, there are many conventions to define the Euler

matrix. One of them, the X-Y-Z convention, defines the Euler matrix as the product of three

rotation matrices: rotation around the z axis by angle α; rotation around the y axis by angle β;

rotation around the x axis by the angle γ. The order of performing these three rotations in the X-Y-

Z conventions is: rotation around x axis first, then around y axis second, and around z axis last.

The resulting Euler matrix according to this convention is given below:

Figure 30. Euler Matrix

α, β, γ are the so-called Euler angles - the angles with respect to each of the Cartesian axes. The convention used here is the XYZ

convention. cα and sα denote cos( α) and sin( α) respectively.

The Euler matrix can be applied last to the accumulating dihedral rotations in order to allow the

anchor atom to move with respect to a global frame. For a more detailed explanation, please read

the included material in required readings.

Required Reading

Zhang-Kavraki 2002 [PDF] Zhang, M. and L. E. Kavraki, "A New Method for Fast and Accurate Derivation of Molecular Conformations". Journal of Chemical Information and

Computer Sciences, 42:64-70, 2002.

Stamati-Shehu-Kavraki. Computing Forward Kinematics for Protein-like linear systems using

Denavit-Hartenberg Local Frames [PDF]

Resources

1. VMD Visual Molecular Dynamics, is an excellent tool for visualization and scripted

manipulation of protein structures that uses Tcl scripting - Humphrey, W., Dalke, A. and

Schulten, K., "VMD - Visual Molecular Dynamics", J. Molec. Graphics, 1996, vol. 14, pp. 33-

38.

2. RasMol is mostly a viewer, but has some built-in tools. - Roger Sayle and E. James Milner-

White. "RasMol: Biomolecular graphics for all", Trends in Biochemical Sciences (TIBS),

September 1995, Vol. 20, No. 9, p. 374.

3. Chimera is a very powerful visualizer that handles huge structures easily. - Pettersen, E.F.,

Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E.

"UCSF Chimera - A Visualization System for Exploratory Research and Analysis." J. Comput.

Chem. 25(13):1605-1612 (2004).

4. InsightII, Cerius2 and Catalyst are products for simulation, discovery and analysis that

recently became commercial, and can be found here.

5. CHARMM is a simulation package based on the CHARMM force field. Brooks BR,

Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983). "CHARMM: A

program for macromolecular energy, minimization, and dynamics calculations". J Comp Chem

4: 187217.

6. NAMD is another popular simulation package and can be obtained here. James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe

Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics

with NAMD. Journal of Computational Chemistry, 26:1781-1802, 2005.

7. Amber is one of the most widel