Design a site like this with
Get started

Protein Structure Prediction

A computational approach to the holy grail of biochemistry: structure determination

As a child, armed with a glue stick and a pair of scissors, I would spend hours cutting bright coloured paper into long uniform strips. The ends of each rectangle were secured tightly with a dollop of glue to form rings. Slowly, one ring interlinked with another to create a long paper chain worthy of display in the living room. When fully extended, the careful arrangement of colours revealed itself. Each paper chain was different to the previous one and on occasion less colourful, having to make do with flimsy white A4 paper smuggled out of the printer drawer.

Proteins are large macromolecules responsible for major biological functions. Its structure is made up of a combination of twenty naturally occurring molecules called amino acids. Each amino acid is distinguishable by name, three-letter code or one-letter code. Below is an example of a protein sequence, each letter corresponds to an amino acid. This sequence codes for the structure beneath it.

Sequence and cartoon model of 3e9s.

Much like the paper decoration, each amino acid can be thought of as a ring able to connect to the next. Once the long chain is formed, it is flexible – it can fold, turn, and loop in different directions. The long ordered list instructing which molecules succeed each other doesn’t inform us on how the overall structure moves. If the structure were to be stretched out from both ends, untangling each loop, this is what the sequence of amino acids would show. However, to put sense to the sequence and to study the protein interactions, a picture of its 3D structure must be established.

The collection of smaller molecules to assemble one larger structure, opens the possibilities of many different reactions. Due to their size, proteins hold the prize for one of the most complex biological systems.

As seen above, proteins travel round in blob like structures. The chain loops and tangles to ensure it’s in the most energetically favourable state. It demands much energy to unravel and untangle the long paper chain. To mount a chair and reach up to each corner of the room to stretch out the decoration. It’s much easier to discard the multicoloured ornament on the floor to slowly gather dust. Perhaps this metaphor has reached its end.

Molecular model of 3e9s represented as molecular surface.

When dealing with new proteins, computers can be the key to unlock their molecular functions. Proteins can be organised into families. Those with common ancestors are classified as homologues.

Comparative modelling utilizes the amino acid sequence of a protein in which the structure we wish to identify. This protein sequence is blasted through a computer program containing entries from all over the world. It scans the database for other proteins to find the closest match to our target one. In a matter of minutes, you obtain a second protein sequence and its three-dimensional structure indicated to be the closest match.

By alignment of both protein sequences, one can pinpoint which letters have changed. These two sequences have common ancestry, thus the replacement of one amino acid for another is proof of evolutionary change. More than that, by locating which area of the 3D model has succumbed to this mutation, a new structure of the target molecule is proposed. Of course, this is an experimental process. The mutated letter doesn’t simply leave a vacancy for the new one to fill; there is a new piece to the puzzle. The chemical reactions which come with the new amino acid suggest different folds and loops. Regions of accuracy in the proposed model will vary.

Nevertheless, this is a good starting point to work in conjunction with other methods. Obtaining the exact three-dimensional structure of a protein is the holy grail of biochemistry. Several methods are used in combination to gather pieces of information to propose a final structure. Comparative modelling offers a new elegant way of approaching the problem.


RCSB Protein Data Bank,

S. D. Lam, S. Das, I. Sillitoe, C. Orengo, Acta Crystallogr D Struct Biol., 2017, 73, 628-640.

B. Kuhlman, P. Bradley, Nature Reviews Molecular Cell Biology, 2019, 20, 681-697.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close