The identification and quantitation of S. cerevisiae proteins in proteomics experiments

Một phần của tài liệu Proteomic analysis of saccharomyces cerevisiae KAY446 under very high gravity conditions (Trang 43 - 57)

2.2. Literature review of Saccharomyces cerevisiae proteomic analysis

2.2.3. The identification and quantitation of S. cerevisiae proteins in proteomics experiments

2.2.3.1. How tandem mass spectrometry and protein identification work?

How tandem MS works?

Proteins from samples are firstly proteolysed with proteases (trypsine, chymotrypsine, ect.) to become peptides before introduced to mass spectrometry for analysis. Briefly, the operation of tandem MS instrument is performed as following steps:

i) Ionization

The peptides generated from a protein digestion step are (normally) not submitted to the mass spectrometer all at once, since these peptides are ionized before transferred to MS analyzer by magnetic or electric field. There are two techniques often used for peptide ionization including electrospray ionization (EIS) [30] and matrix-assisted laser-desorption ionization (MALDI) [31]. For ESI-MS, peptide solutions are injected (usually by an autosampler) onto a trap column and then they are subsequently eluted onto a nano-HPLC column directly coupled to the mass spectrometer (see Figure 2.5.A), and then eluted from the column using a solvent gradient of an organic compound (acetonitrile), the peptide elution is based on the peptide hydrophobicity. Therefore, very hydrophilic peptides might be poorly to retained on the column and elute immediately, however, very hydrophobic peptides might not be eluted when a standard gradient is applied. The peptides are ionized by ESI at the end of the analytical column, and the peptide ions are then transferred to the vacuum of mass spectrometry for further analysis.

Figure 2.5. The combination of nano-LC-MS/MS (A) or MALDI-MS (B) [32].

Peptide ionization can also be achieved using MALDI (see Figure 2.5.B). For MALDI, the peptides are mixed with a large excess of ultraviolet-absorbing matrix, which is normally a low molecular weight aromatic acid [31]. On irradiation with a focused laser beam, the excess matrix molecules sublime and transfer the embedded non-volatile molecules into the gas phase. After numerous ion molecule collisions, protonated analyte ions are formed, which are accelerated by electric potentials into the mass spectrometer. The signal intensity in the spectrum is proportional to the peptide concentration.

ii) Inside tandem MS

After ionization by ESI or MALDI, ionized peptides go into the MS through a small hole to a vacuum environment, and then these peptides are guided and manipulated by electric fields. Finally, these peptides are determined by measuring the mass-to-charge (m/z) ratios of the peptides. In brief, there are many types of mass spectrometers applied for proteomic analysis, including Quadrupole MS, Time of flight (TOF) MS, and Quadruplole ion trap, MALDI-TOF-TOF, Orbitrap. In practice, the combination of these instruments can be applied such as the Quadrupole-TOF-Tandem-MS/MS. The main function of MS instrument is to record the signal intensity of the ion at each value of the m/z scale.

In ESI, peptides usually become doubly protonated and then they become designated as (M + 2H)2+, where M is the mass of peptide and H is the mass of a proton. The total ion chromatography is shown in Figure 2.6.A. Since MS measures the m/z value, a peptide with a mass of 1342.7932 amu will be recorded at (1342.7932 + (2 x 1.0073))/2 = 672.4039 amu in the mass spectrum (see Figure 2.6.B). However, some peptides might have a higher charge if they contain more than 15 amino acids in their sequence or contain further basic amino acids, as for example histidine can be ionized as well. Since each peptide consists of an isotope cluster of peaks, this cluster can be used to determine the charge state of each peptide. Many peaks are separated by 1 amu because of the difference in the 13C isotope instead of the usual 12C. If there is 1 amu difference in the m/z between the first 13C isotope

peak and 12C peak, the charge state of that peptide ion will be 1. If the difference between the first and second peaks is 0.5 amu (627.4039 and 627.9033 in the inset picture in Figure 2.6.B), the charge of this peptide will be 2. Therefore, if the difference is 0.3 amu, the peptide charge will be 3.

Figure 2.6. Illustration of the total ion intensity (A), the mass spectrum of isolated peptide determined by MS (B), the figure inset illustrates the mass-to-charge values around the ion peptide of interest (see the text for details), and the fragments of the selected peptide (C).

(A)

(B)

(C)

The information for the primary structure of the peptide is obtained via the recoding of the m/z and intensities of all the peaks in the spectrum. This can be achieved with high quality using tandem MS (MS/MS or MSn) by coupling more than two MS stages. In tandem MS, the ionized peptide is selected (isolated) in the first MS, and then cleaved by collisions with an inert gas (nitrogen) (collision cell), as a result this peptide is broken in to small species.

A second MS will record the mass/charge ratio of the fragment derived from the isolated peptide (see Figure 2.6.C). The ion (in MS mode) that is fragmented is known as a

“precursor ion”, and the ions in the MS/MS mode are known as “product ions”. Coupled with the HPLC chromatography running, the MS/MS will cycle through a sequence that consists of a mass spectrum followed by mass spectra of the most abundant peaks within the spectrum.

There are many types of fragment ions obtained in a MS/MS spectrum depending on many factors such as primary structure, internal energy and charge state. The fragmentation of peptide ions (see Figure 2.7) was proposed by Roepstoff and Fohlman [33], and then modified by Johnson et al. [34]. The fragmented ions will be detected if they contain at least one charge. If the charge is retained on the N-terminal fragment, the ion will be known as either a, b, or c, if the charge is retained on C-terminal fragment, the ion will be x, y, or z. The subscript is representative for number of residues in the fragment. These types of ions are shown in Figure 2.7.

Figure 2.7. The fragmentation of protonated peptide ions, and correlated ions that can be formed in tandem MS/MS.

How is protein identification performed?

Due to the large amount of sequence data available, protein identification can be completed based on a database, rather than more traditional sequence determination. A match of tryptic (protease) peptides masses with those predicted from the theoretical digestion of each protein in the database (peptide mass mapping) serves to identify the proteins (see Figure 2.8.A). If a more complex protein mixture is analysed or peptide mass mapping does not provide a conclusive match, MS/MS for the peptides in the mixture can be used to search the database. Proteins are also often derived from bands gels or spots, when MALDI-TOF is used, proteins are identified by “mass fingerprinting”, which matches the tryptic peptide masses in the mass spectrum to the calculated tryptic peptide masses for each protein in the database. In many cases, MALDI-TOF fingerprinting offers a good

Figure 2.8. Illustration of the matching of mass spectra data with protein (peptide) database. While (A) is used for MALDI-TOF instrument, (B) for LC-MS/MS instrument.

option for protein identification, however, peptide sequencing is a more specific and sensitive identification method. Therefore, for proteins (peptides) sequenced using LC- MS/MS, the each MS/MS spectrum is searched in the database using one of the algorithms as shown Figure 2.8.B. The peptide identification is reported in the term of a probability score using, for example, the Mascot search engine [35] and a recently modified version of the Sequest algorithm [36].

The first algorithm (known as Peptide Sequence Tags), which was first applied in the PeptideSearch, uses the fragmentation spectra that must contain at least a small series of easily interpretable sequence [37] to form an amino acid tag. While the lowest mass in the series contains information about the distance (in mass units) to one terminus of the peptide, the highest mass contains information about the distance to the other peptide terminus. Moreover, the peptide sequence tag includes three sections, the amino-terminal mass (m1), a short amino acid sequence (-C-A-) and the carboxyl terminal mass (m3) (see Figure 2.9.A). This model can be matched against sequences in a database, where peptide identified is compiled with the cleavage event of the proteolytic enzyme used.

The second algorithm, known as Sequest algorithm, uses a signal processing technique called autocorrelation and is applied to mathematically determine the overlap between a theoretical spectrum that is derived from every sequence in the database and the experimental one in the question [38] (Figure 2.9.B). The overlap is given in the form of a score. This technique has been proven to be quite robust for low signal-to-noise spectra, and it is used for low resolution data.

The third example algorithm is implemented with the Mascot search engine [35], which involves a calculation of the theoretically predicted fragments for all the peptides in the database. The predicted fragments are matched to the experimental fragments in a top-down approach that starts with the most intense b- and y-ions (see Figure 2.9.C). The number of

fragment matches is random and calculated, and then the negative logarithm of this number (multiplied by 10) is the identification score.

Figure 2.9. Illustration of algorithm searching for peptide (protein) identification [32].

2.2.3.2. Number of publication

Using the search engine Scrius (http://www.scirus.com/), which links to BioMed Central, ScienceDirect, MEDLINE/PubMed, and Pubmed Central etc, with the keywords: (with journal source option) Saccharomyces cerevisiae and alternatively with iTRAQ, ICAT, SILAC, DIGE, 18O, 15N, 13C, and DIEG, the numbers of publications as a function of year (from 2000 to 2007) are shown in Figure 2.10. In 2006, there was a boom in the publication of proteomic quantitation methods, due to the appearance of 7 main methods found in that

year. The highest numbers of publications mentioning iTRAQ, SILAC, and 18O were also found in that year. ICAT showed a maximum in 2003. However, it is necessary to point out that these publications are not only research articles, but review articles are also included, since we wished to gauge the relative numbers of review and research articles. For example, there are 20 articles found by searching with keywords “S. cerevisiae and iTRAQ”, but amongst these results, only 9 original research articles applied this method for S. cerevisiae. Therefore, the application of iTRAQ to the study of the S. cerevisiae proteome is still rare, although this method has many advantages, as summarised in Table 2.1.

0 5 10 15 20 25

2007 2006 2005 2004 2003 2002 2001 2000

Year

iTRAQ 18O ICAT SILAC 15N 13C DIGE

Number of publication

Figure 2.10. Number of publications related to the proteomic analysis of S. cerevisiae.

2.2.3.3. Identification of the S. cerevisiae proteomes

S. cerevisiae has been widely used as a model organism in proteomic studies. 2-DE is known as a powerful tool to detect hundreds of proteins by combining this technique with tandem MS/MS for identification. At the dawn of 2-DE applications for S. cerevisiae, the number of proteins identified in a single study based on 2-DE were more than 400 proteins, leading to a construction the yeast gel reference maps [39-43], as well as gaining an

overview of global proteins changed in response to stress conditions, such as cadmium [44], lithium [45], H2O2 [46], and sorbic acid [47]. However, this technique is time consuming because of the process of spot by spot analysis and it is biased against low abundance proteins, integral membrane proteins, and proteins with extremes in pI or MW, leading to the development new methods such as shotgun proteomics to address these issues [25]. The application of shotgun proteomics has been used successfully to study S. cerevisiae proteomes. The complete genome of S. cerevisiae contains approximately 6,300 genes (from NCBI database). Recently, 1,504 [48] and 1,484 [25] proteins were found in offline (combining SCX (strong cation exchange) and RP (reversed-phase)) or online (LC-MS) systems, respectively. Moreover, 3,019 proteins were also found using a three-dimensional LC-MS/MS system [49].

For the metabolic labeling technique, the use of 2-DE gels coupled with MS/MS is still the preferred choice. Application of [2H]-leucine is a widely used metabolic labeling technique, since this amino acid is abundant in S. cerevisiae, with theoretically 65% of the trypsin digested peptides containing at least one leucine residue [50]. Moreover, the use of this labeled amino acid leads to a reduction in the search space for protein identification [50].

To overcome the problems of classical proteomics in that the comparison is only based on comparing the abundance of proteins in cells in two different states, which does not provide information about the dynamic mechanisms when the system changes from one state to another, Bratt et al. [51] applied [2H]-leucine metabolic labeling to determine the dynamics of protein turnover. Metabolic labeling can be also performed with [13C6]-lysine. The combination of this labeled amino acid and data-dependent multiplexed MS/MS for identification and characterization in S. cerevisiae was reported by Berger et al. [52].

Additional mixtures of labeled amino acids [2H3]-methionine, [2H3]-serine, [2H2]-tyrosine have also been demonstrated as having utility for S. cerevisiae proteomics [53].

2.2.3.4. Localization of proteins

As mentioned above, one of the main aims of proteomics is the identification of proteins, followed by the determination of their location in the cell. Protein location in the cellular microenvironment is very important for the understanding of protein functions and their interactions. One of the most interesting foci for this type of investigation is the study of mitochondria in S. cerevisiae, with various membrane associated proteins also being of strong interest. More details on progress in these areas are detailed in the following sub- sections.

Proteomics of mitochondria

Mitochondria play a central role in bioenergenetics, apoptosis, and the metabolism of lipids, amino acids, and iron [54]. Many studies have been based on genome and functional systematics to investigate S. cerevisiae mitochondria, but the identification and characterization of mitochondrial proteins has not been completed yet [55]. How many distinct proteins can be found in a mitochondrion? To answer this question, many studies were carried out to localize the yeast proteome [56]. Using large-scale analysis of protein localization by immunolocalizing 2,744 enpitope-tagged yeast proteins (covering 45% of the yeast theoretical proteome), Kumar et al. determined that 47% of yeast proteins are cytoplasmic, 13% are mitochondrial, 13% are exocytic, and 27% are nuclear/nucleolar associated [57]. From the 13% of mitochondrial-located proteins (332 proteins), it can be estimated that the yeast mitochondria contains approximately 800 different proteins [57].

Studies of yeast mitochondria were also performed using protein-protein interaction [58], mass spectrometry (based on 2-DE) of mitochondria [59], and computational predictions of mitochondrial proteins [60]. Recently, studies were carried out to narrow the gap of missing mitochondrial proteins in S. cerevisiae. The reducing of the gap to 10%, with the identification of 749 proteins was reported, in which 436 (58.1%) proteins are mitochondrial proteins, 208 (27.7%) proteins have not been localized so far, and 106 (14.1%) are localized in other cellular compartments [54]. The identification of 527

mitochondrial proteins by green fluorescent protein (GFP) tagging was also carried out [61].

The prefractionation of cellular components has been a very important step for the study of organelles [62], since this helps to reduce sample complexity, granting access to low abundant proteins, and gaining additional information concerning protein localization [63].

Many techniques have been used to isolate distinct cell compartments for examples free- low-electrophoresis, gradient centrifugation, and immunoprecipitation. However, proteomes of entire organelles are too complex for sufficiently separating using one- dimensional methods. Therefore, many differences approaches have been used as an inventory of minochondrial proteome [54, 55] to facilitate the analysis on a molecular level.

Recently, a total of 851 proteins were identified using multidimensional LC-MS/MS, 1DE- SDS-PAGE combined with nano-LC-MS/MS and 2-DE-PAGE with MALDI-mass fingerprinting [63]. Furthermore, a comparison of these methods was also made, while 2- DE-PAGE has an advantage in the separation of protein isoforms and quantitative profiling, 1DE-PAGE with nano-LC-MS/MS and multidimensional LC-MS/MS are suitable for efficient protein identification since they are less biased against distinct classes of proteins [63].

Proteomics of nuclear membrane and plasma membrane

Since the nucleus is separated from the cytosol by a double membrane, the exchange of molecules between the cytosol and the inner nucleus happens via nuclear pores. Moreover, the nuclear membrane has a complex structure as well as a high dynamic behaviour.

Therefore, the understanding of nuclear membrane structure is very important. First genome-wide screen to identify inner nuclear membrane proteins was performed by Murthi and Hopper [64].

Due to the relatively poor solubility of proteins during the isoelectric focusing process, most inter-membrane proteins are not detected on 2-DE-gel maps. With the aim of analyzing yeast plasma membrane proteins, various kinds of procedures have been used, including optimization the purification protocol to reduce contaminating membranes and cytosolic proteins, as well as to improve 2-DE-gels using the cationic detergent cetyl trimethyl ammonium bromide, coupled with sodium dodecyl sulphate for the first and second dimensions [65]. As a result, proteins in 50 spots were identified, in which both known and unknown plasma proteins were discovered [65]. Following that, based on 2-DE gels, Delom et al. used ion-exchange chromatography/lithium dodecyl sulphate -PAGE to investigate plasma membrane proteins in response to the antifungal agent calcofluor, where approximately 90 proteins were identified and clustered [66]. Therefore, the combination of subcellular fractionation coupled with 2-DE-gels was successfully applied in identifying the plasma membrane proteins. Since some proteins are only expressed under specific conditions, it is highly unlikely that all proteins are expressed simultaneously under single growth condition. As a result, it is very difficult to identify 100% of the proteins of an entire cell organelle by a proteome analysis [54].

2.2.3.5. The quantitative proteome of S. cerevisiae

Recently, many studies using metabolic labeling as well as stable isotopic tagging have been deployed to determine quantitative protein expressions changes in S. cerevisiae.

[2H10]-leucine was used as a labeled amino acid in synthesis complete media, and the relative comparison between [1H10] and [2H10] cultures provided reliable data for identification and quantitation of S. cerevisiae proteome from wild-type to nonessential- gene-null-mutant strains [67]. In terms of the quantitative proteome, the comparison between wild-type strains and mutant strains was necessary to characterise protein function of S. cerevisiae under different states or conditions. In another study, ICAT was applied to characterise proteomic changes in S. cerevisiae between the wild-type HFY1200 strain and a mutant HFY871 strain (Upf1 knock-out) [68]. In this study, 1,029 distinct proteins were

detected, and most of these proteins did not change significantly in expression as a consequence of the Upf1 knock-out. An increase in expression of proteins involving arginine biosynthesis was found [68].

The most significant application of S. cerevisiae in industry is based on fermentation, especially for the generation of ethanol. To date, most S. cerevisiae proteomic analyses are based on 2-DE, with many studies investigating protein expressions under glucose exhaustion conditions. One of the pioneering studies was performed by Bouncherie [69], where 6 new proteins were found in glucose limited conditions, and the synthesis of ca.

95% of the proteins synthesised in the log phase were turned off [69]. Similar studies were applied to investigate the response of S. cerevisiae under stress conditions, such as cadmium stress [44], oxidative stress [46], and hyperosmolarity stress [70]. In the case of cadmium stress, there were 54 up- and 43 down-regulated proteins detected, with the expression of these proteins being related to the biosynthesis of sulphur amino acids, and the data reported that the glutathione and thioredoxin played important roles in the thiol redox system in response to this cadmium stress [44]. In the study of H2O2-mediated oxidative stress, the synthesis of at least 115 proteins was stimulated, while the expression of 52 proteins was repressed. There was a decrease in protein synthesis and an increase in protein degradation pathways [46]. In the study of hyperosmolarity stress (from 0.7 to 1.4 M NaCl), 73 proteins were differentially expressed by more than 3 fold in 1.4 M NaCl, 40% of these proteins were down-regulated, and the expression of these proteins was related to the dissimilation of dihydroxyacetone [70].

As mentioned before, although the more recent development of iTRAQ, has many advantages, its application for the study of the S. cerevisiae proteome is still rare. The first application of iTRAQ in S. cerevisiae was published by Ross et al. (2004) [11]. A proteomic comparison of 3 strains (wild-type, upf1U andxrn1U strains) was performed in this study. A total of 1,217 proteins were identified, of which 685 were identified with > 2

Một phần của tài liệu Proteomic analysis of saccharomyces cerevisiae KAY446 under very high gravity conditions (Trang 43 - 57)

Tải bản đầy đủ (PDF)

(340 trang)