Simple wins the day

So it’s often temping to use the newest and the latest technologies for separating proteins and peptides prior to LC-MS/MS, but it’s often the case that tried and true technologies often win the day.

For example in this recent paper

A draft map of the human proteome

The authors used relatively simple methods to identify over 17,000 proteins . These represent over 80% of the  human protein coding genes currently thought to  exist (this number is surely going to change )

Lets look at the methods they used

The first one is a simple Gel-C MS/MS experiment. In essence

  • Run your proteins on a SDS PAGE gel
  • Chop up the gel into 10-20 sections (usually 0.5-1 cm per section)
  • Perform an Ingel digestion on every gel section
  • Analyze Each gel section by LC-MS/MS

This technique is not new and I remember first doing it in 2001. It usually works great. Some downsides are the tendency to over alkylate, the small amount of protein you can start with (usually 50-200ug’s per sample) and the increased variability of digesting (almost always by hand) hundreds of gel sections.

The second technique is doing an in solution digestion and separating the peptides using high/basic  pH (usually around pH 10) reverse phase chromatography. I first learned of this from the classic Gilar 2005 paper . It also works well, but can seriously damage your HPLC if the fittings/seals  not made to withstand pH 10 (learned this the hard way!) . Currently I’m experimenting with ERLIC which does not need pH 10 solvents, but that is a topic for another post.

So you basically (get the pun I did there)

  • Digest your sample in solution
  • Separate you peptides by pH 10 reverse phase HPLC using a big column (2.1mm-4.6mm)
  • collect fractions
  • Combine fractions into a smaller number of fractions
  • Analyze each fraction by regular LC-MS/MS

With this approach you can start with a much larger amount of material (mg’s) and you can digest the entire sample at once (decreasing variability), but collecting fractions and then combining them is far more prone to error and variable  than you would think, even with an automated fraction collector. HPLC’s using large columns (column ovens help here too) can be very reproducible, but that doesn’t happen without careful attention to detail and lots of testing to make sure it is reproducible and your HPLC is working correctly. All this takes time,effort and a good QC protocol. Unfortunately those things are rarely published, even though it is critical to the results. This means you need to reinvent the wheel a lot of times…

Their data analysis is pretty simple as well, Mascot & SEQUEST plus Percolator + spectral counting. Their q values are set at 1% which is pretty typical. The only thing I would have liked to see is the FDR on the protein level which can be 10 times or so greater than on the peptide level. I haven’t finished reading the entire paper yet, so maybe it’s buried in there somewhere. I’ll update this when I find out!

This paper also acquired about 2,000 LC-MS/MS runs which would cost you about 140,000 by my rates. Not too bad huh… And averages about 1000 proteins id’d per LC-MS/MS run which is not too shabby.

Overall this is a really excellent example on how powerful proteomics can be when you have enough resources. I wish I could work on a project like this. Anyone want to submit 2000 LC-MS/MS samples and have 140,000 to spend?   I hope as I wade through it I can learn more about how they did their quality control over such a large number of samples and controlled for sample prep variability which can be a big part of your total variability seen in a proteomics study. I have a lot of questions about controlling varibility on a project this large. How did they control and get a measure of their variability? How do they know that their variability is the same between sample 1-10 or 1-1000….is the variability different for samples run one week apart vs samples run 1 month apart? Does normalization always fix that? How can you be sure?

Impressive study, I have lots of questions, but impressive nonetheless


Just an Update (July 16th 2014) . There has recently been some criticism of these two studies.

I think it is right to worry about really large scale proteomics studies, especially when it is unclear how to correct for multiple testing when you deal with millions and millions of spectra. At the same time no one really has any idea what what the best way is to handle this amount of data.



Posted in Proteomics Blog

Leave a Reply

Your email address will not be published. Required fields are marked *