Protein Structure

What are the structures and chemical properties of all 20 naturally-occurring amino acids and how are they organized into polypeptides ?

• Proteins are polypeptides consisting of amino acids covalently linked together by peptide bonds. Peptide bonds are formed by a condensation reaction between the carboxylic acid group of one amino acid and the amine group of a second amino acid.

• Twenty different amino acids are commonly found in proteins. Their stereochemistry is that of L-amino acids with a Cα chiral center, except for glycine, the smallest of the 20 amino acids, which has a Cα that is not chiral.

• Seven amino acids have ionizable R groups (Asp, Glu, His, Cys, Tyr, Lys, Arg), of which four are charged at pH 7; Asp and Glu are negatively charged, whereas Lys and Arg are positively charged. His has a pKa of 6.0 and can be positively charged or neutral, depending on the chemical environment, such as the active site of an enzyme.

• Free amino acids at pH 7 have a positively charged amino group (pKa of 8.0) and a negatively charged carboxyl group (pKa of 3.1). If the R group is also charged at pH 7, then the free amino acid has three ionizable groups (for example, Asp, Glu, Lys, and Arg).

• The isoelectric point (pI) of a molecule containing two or more ionizable groups is defined as the pH at which the molecule carries no net charge, also called the zwitterion state.

• Approximate pI values are calculated as the arithmetic mean of the pKa values that govern formation of the neutral zwitterion state. A rule of thumb is that if the R group is negatively charged at neutral pH, then the pI = (pK₁ + pK₂)/2, whereas if the R group is positively charged at neutral pH, then the pI = (pK₂ + pK₃)/2.

• The chemical properties of the amino acid side chains determine the structure and function of proteins. The 20 amino acids can be divided into four subfamilies on the basis of shared chemical properties: charged, hydrophilic, hydrophobic, and aromatic.

• Some amino acids can fit into more than one subfamily when considering other chemical attributes. For example, the charged amino acids are hydrophilic; similarly, the aromatic amino acids have hydrophobic properties.

• Glycine does not fit easily into any of the four subfamilies because its hydrogen side chain is chemically inert; Gly is put into the hydrophobic subfamily because it is not a hydrophilic, charged, or aromatic amino acid.

• Green fluorescent protein (GFP) is an autofluorescent protein from jellyfish that undergoes a spontaneous cyclization reaction involving the adjacent amino acids Ser65, Tyr66, and Gly67, which generates an intrinsic chromophore. Recombinant GFP is used as a fluorescent molecular marker in live cells that is excited by blue light at 470 nm and emits green light at 509 nm.

• The peptide bond is rigid with partial double bond characteristics and defines a flat plane containing six atoms. Rotation around the N–Cα and the Cα–C bonds is defined by the ϕ (phi) and ψ (psi) torsional angles, respectively.

• The allowable ϕ and ψ angles for any two amino acids in a dipeptide can be calculated within the limits of steric interference by using the van der Waals radii of atoms in the respective residues.

• The Ramachandran plot displays the combinations of ϕ and ψ angles that do not result in steric hindrance between adjacent amino acids and is characteristic for each protein structure. The ϕ and ψ angles of dipeptides containing proline or glycine residues are unusual and often not included on Ramachandran plots.

• Genetic mutations in DNA can alter the amino acid sequence of a protein and lead to defects in protein structure and function, resulting in disease.

What are the four hierarchical structures of proteins (1º, 2º, 3º and 4º)?

• The three-dimensional structure of a protein is defined by four hierarchical levels: primary (amino acid sequence), secondary (α helix, β sheet, β turn), tertiary (positions of all atoms within the protein), and quaternary (subunit interactions).

• Alpha (α) helices are stabilized by intrastrand hydrogen bonding between N–H and C=O groups along the polypeptide backbone. Beta (β) sheets are stabilized by interstrand hydrogen bonding between N–H and C=O groups along the polypeptide backbone.

• Many α helices are amphipathic with one side of the helix being hydrophobic and the other side being hydrophilic. An amphipathic α helix has hydrophilic or hydrophobic amino acids positioned every three to four residues along the polypeptide backbone.

• Tertiary structures describe the positions of all the atoms within the polypeptide and contain α helices, β sheets, β turns, and polypeptide loops.

• Ramachandran plots of protein tertiary structure reveal that α helices, β strands and β sheets, and β turns all fall within allowable ϕ and ψ angles for dipeptides that minimize steric hindrance between amino acid R groups.

• Protein domains are regions of tertiary structures that fold independently and may encode discrete functions such as the catalytic activity of an enzyme or the sequence-specific DNA binding function of a transcription factor.

• Protein folds are the structural components of protein domains and are described by the topological arrangement of secondary structures defined by the path of the polypeptide chain.

• Examples of protein folds are four-helix bundles, the Greek key fold, and the Rossmann fold. Some protein folds are very large and encompass an entire protein domain such as the ~200-amino-acid TIM barrel fold in the enzyme triose phosphate isomerase.

• Tertiary structures are stabilized by weak noncovalent interactions. Some tertiary structures are also stabilized by disulfide bridges between cysteine residues or are stabilized by metal ions such as zinc and iron that coordinate with residues in the protein.

• Quaternary structures consist of two or more protein subunits that can be identical or different. Quaternary structures provide increased structural integrity (as demonstrated in fibrous proteins), regulatory functions commonly found in large protein complexes, and increased enzyme efficiency provided by nearby catalytic sites.

What are the energetics of protein folding and what happens when they misfold?

• High-fidelity protein folding is critical to protein structure and function and is governed by three principles: (1) protein folding must follow a preferred path of energy minimization; (2) the change in Gibbs energy between the folded and unfolded states must be favorable (ΔG < 0) for folding to occur; and (3) mechanisms of in vitro and in vivo folding may be different, because chaperone proteins are often required for in vivo protein folding to occur.

• Protein folding in vitro is a cooperative process based on protein-unfolding curves, which show a sharp transition between the number of molecules in the folded and unfolded states as a function of increased temperature or denaturant concentration. The transition curve midpoint, Tm, corresponds to the temperature where 50% of the molecules are folded and 50% are unfolded.

• Christian Anfinsen demonstrated that RNaseA could be denatured by urea and then refolded under appropriate conditions to regain enzymatic activity. These protein-refolding experiments were the first to show that all of the biochemical information required for protein folding resides in the primary amino acid sequence.

• A protein “folding funnel” illustrates a theoretical framework that describes the varied paths and energetic states in protein folding. The width of the funnel describes the entropy of the polypeptide chain and the height of the funnel represents the overall energy difference between the folded and unfolded states.

• Chaperone proteins function in vivo to assist in de novo protein folding, rescue unfolded proteins, and disrupt nonfunctional protein aggregates. The two major types of ATP-dependent chaperone proteins are the clamp type (for example, Hsp70) and the chamber type (for example, the bacterial GroEL–GroES complex).

• Defects in protein folding have been associated with increased rates of protein degradation (loss of function) and the generation of large protein aggregates (gain of function). Numerous examples of human diseases caused by protein misfolding have been characterized.

• Transmissible spongiform encephalopathies (TSEs) are neurologic diseases caused by the accumulation of misfolded proteins that function as infectious particles in the absence of DNA or RNA. The best-studied misfolded protein of this type is prion protein (PrP).

• There are three types of prion-related diseases in humans: (1) infectious prion-related proteins as found in kuru and mad cow disease; (2) spontaneous prion-related diseases such as Creutzfeldt–Jakob disease; and (3) familial prion-related disease caused by DNA mutations that are genetically inherited.

• Structural data obtained by cryo-electron microscopy indicate that extended β-sheet structures contribute to the aggregation of misfolded proteins, such as amyloid-β fragment, tau protein, and PrP, and lead to the formation of large protein aggregates that are toxic to neurons in the brain.

• The intrinsically disordered N-terminal region of the normal cellular PrP^C may be responsible for the structural switch from PrP^C to the misfolded PrP^Sc. Fred Cohen and Stanley Prusiner proposed that the N-terminal region changes from a mostly α-helical structure in PrP^C to a stacked β sheet in the PrP^Sc form.

Everyday example: CO-4: Jellyfish Green Fluorescent Protein