Structural and Biochemical Proteomics

Lead Principal Investigators: Alexander Iakounine (Yakunin), Alexei Savchenko, Cheryl Arrowsmith, Aled Edwards

The main objective of this project is structural and biochemical characterization of unknown yeast proteins and domains with the aim to identify their biochemical activity and possible molecular mechanisms. Our work is focused on uncharacterized proteins from Saccharomyces cerevisiae.

In all sequenced genomes, a large fraction of predicted genes (up to 50% in some genomes) encodes proteins of unknown function. Even in yeast, a major model organism, there are over 1,000 uncharacterized genes (Pena-Castillo, L., and Hughes, T., 2007, Genetics 176:7-14). Many global strategies are already being used to infer function (protein interactions, gene expression analysis, protein localization, and gene knock-outs).

This project is devoted to structural and biochemical characterization of unknown yeast proteins and domains. Structural proteomics is focused on the determination of three-dimensional (3D) structures of annotated and un-annotated proteins, whereas biochemical proteomics represents a straightforward approach to find the biochemical function of unknown genes by direct analysis of biochemical activity of the purified protein. Both approaches have already revealed many unexpected functional inferences, and their combined use (including work in our laboratory) for functional annotation has been especially effective.


Project outline


This activity involves three components: (1) cloning and purification of hypothetical proteins; (2) determination of their 3D structures; and (3) biochemical characterization of proteins (enzymatic activities, ligand binding).

  1. Protein purification. Rapid cloning, expression, and purification of large numbers of recombinant proteins in parallel have been developed to produce proteins for structural proteomics efforts and protein microarray applications. These protocols employ recombinant expression and affinity purification based on the fusion of a tag, usually a peptide or small protein, to the target protein. Since 6His-tags have small size and rarely affect the biological properties of the expressed protein, they remain a popular choice and are used in our work. As an expression host, we use E. coli due to the convenience and economy of working with bacterial cultures. To purify proteins, we are using small-scale semi-automated and large-scale parallel manual protocols already developed in our labs. The semi-automated protocol using an 8-tip liquid handling robot included cell lysis, filtration, incubation with Ni-beads, wash steps, and elution (Kuznetsova et al., 2005, FEMS Microbiol. Rev., 29: 263-279). This protocol produces an impressive throughput; in three hours, 96 proteins can be purified in 100-150 μl aliquots containing 10-100 μg of protein (Fig. 1). To generate milligram quantities of purified proteins for structural studies and in-depth biochemical characterization, we have developed the manual parallel purification protocol, which produces 8-12 purified proteins/person/day with the yield of 2-250 mg for each protein. Our Structural Proteomics lab already has over 2,000 yeast genes cloned into expression vectors for over-expression in E. coli (Table 1). As targets, we select soluble hypothetical proteins, and for structural studies, those that have no homologues in the PDB database (30% sequence identity cut-off). Our aim here is to purify 250 hypothetical yeast proteins for structural studies and biochemical characterization.
  2. Structural Proteomics. Structural proteomics emerged from the simultaneous developments of rapid and parallel methodologies in gene cloning, protein purification, and 3D structure determination and recent results have demonstrated the feasibility and importance of this approach for functional annotation. The classic work by Zarembinski et al. (1998, PNAS, 95: 15189-15193) represents one possible outcome of the structural analysis of an unknown protein, in which a protein-bound ligand or cofactor was discovered. Such information is the most useful for functional annotation because it identifies the nature of the ligand, the ligand-binding site and the disposition of catalytic residues, from which a catalytic mechanism can be postulated. Other sources of structure-derived information come by identifying structural homologues in databases or local structural motifs or putative catalytic sites. Our strategy involves the use of X-ray crystallography and NMR spectroscopy to determine the structures of hypothetical proteins.

    All purified proteins are submitted to crystallization trials and/or tested for amenability to NMR analysis by heteronuclear single quantum coherence (HSQC) spectroscopy. Successfully crystallized proteins are produced as Se-Met or heavy metal (Hg, Pt) derivatives, and their multi-wavelength anomalous diffraction (MAD) datasets are produced at the synchrotron radiation source (Advanced Photon Source, Structural Biology Center, Argonne National Laboratory, USA). Small hypothetical proteins (<23 kDa) with good HSQC spectra are labeled with 13C and 15N, and resonances are assigned by using conventional triple resonance techniques. Obtained 3D structures of hypothetical proteins are analyzed for the presence of putative active sites and are compared (DALI search) with the structures of known proteins available in the PDB to identify structural homologues and to generate hypotheses about possible biochemical function of these proteins. Our Structural Proteomics lab (Savchenko, Arrowsmith, Edwards) has gained significant experience in structural characterization of unknown proteins and produced 17 structures of yeast proteins (including 12 structures from this project) (Table 2). Our goal here is to generate structures and hypotheses for at least 25 uncharacterized yeast proteins representing different protein families (Table 3).

  3. Biochemical Proteomics. In a pioneering study that laid the basis of biochemical proteomics approach, Phizicky and co-workers screened pools of purified yeast proteins for specific enzymatic activities (Martzen et al., 1999, Science, 286: 1153-1155; Phizicky et al., 2003, Nature, 422: 208-215). We have developed a complementary approach that is based on the use of general enzymatic assays to screen individually purified proteins for enzymatic activity (Kuznetsova et al., FEMS Microbiol. Rev., 2005, 29: 263-279). We have designed 20 general assays (Fig. 2) that have relaxed substrate specificity and are intended to identify the subclass or sub-subclasses of enzymes (phosphatase, phosphodiesterase, protease, esterase, dehydrogenase, and oxidase) to which the unknown protein belongs.

    Proteins that have detectable activity are further characterized using secondary screens with natural substrates (substrate profiling) (Fig. 3). Specifically, phosphatases are screened for activity against a panel of 91 phosphorylated substrates (Table 4), phosphodiesterases with 22 substrates (Table 5), and esterases with 37 substrates (Table 6). The spectrophotometric assays are developed for 96-well plates, can be performed quickly and require only a few micrograms of protein. We demonstrated the feasibility and merits of this approach for hydrolases and oxidoreductases, two very broad and important classes of enzymes, and identified over 200 new enzymes.

    General and specific enzymatic assays are also used to rapidly test the hypotheses generated by the Structural Proteomics and other groups about the biochemical function of particular proteins. All proteins with identified enzymatic activity are characterized biochemically (substrate and metal profiles, reaction products, cofactors, inhibitors) and their kinetic parameters (Km, kcat, kcat/Km) are obtained to determine if they correlate with the in vivo concentrations of these metabolites. In combination with information produced by other groups of this project (genetic arrays, protein interactions), the obtained biochemical data are used to generate models about cellular roles of these proteins. In this project, we identified enzymatic activity in over 30 yeast proteins and biochemically characterized all soluble members of the HAD-like phosphatase family (Table 7, Table 8).

    We invite all researchers wishing to check their proteins for catalytic activity to send them to us for screening (conditions and terms of collaboration, Fig. 4).