Original Source: This guide is an adaptation of the comprehensive tutorial by the Jesse G. Meyer Lab.

Bottom-Up Proteomics

A comprehensive guide to understanding proteins, mass spectrometry, and the "shotgun" approach to biology.

Quick Navigation

Vs. RNA-Seq The Workflow 1. Extraction 2. Digestion 3. LC Separation 4. Ionization 5. Mass Spec 6. Data Analysis New: AI & DIA New: Single Cell

Proteomics is the large-scale study of protein structure and function. While the genome contains the instructions (DNA), proteins are the actual machines that do the work in a cell. To truly understand biology, we must measure the proteins.

This guide covers the "Bottom-Up" approach (often called Shotgun Proteomics), where we chop proteins into small pieces (peptides), analyze them, and puzzle them back together to identify the original proteins.

Proteomics vs. RNA-Seq

Crucial Concept

Why measure proteins if RNA is easier?

Many new researchers are familiar with RNA-seq (Transcriptomics). It is powerful, cheaper, and offers deep coverage. However, RNA is just the "recipe." Proteins are the "cake."

RNA-Seq (The Recipe)

Measures mRNA abundance. It tells you what the cell is planning to make.

✅ Pros: Technically easier, captures nearly every gene, standardized analysis.
❌ Cons: Correlation to protein levels is often only ~0.5. Does not show active degradation or final protein stability.

Proteomics (The Result)

Measures actual protein abundance. It tells you what is actually present and working.

✅ Pros: Measures the functional actors, detects Post-Translational Modifications (phosphorylation), sees protein interactions.
❌ Cons: More complex hardware, expensive, "dropout" of low-abundance proteins.

The General Workflow

Bottom-up proteomics follows a specific pipeline. Think of it as an assembly line in reverse: taking a complex car (the cell), shredding it into parts (peptides), and weighing every part to figure out what cars were there.

Extraction → Digestion → Separation (LC) → Ionization → Mass Spec → Data Analysis

1. Protein Extraction

First, proteins must be released from the cell. This involves three key chemical steps:

Lysis: Breaking the cell wall/membrane. This can be done mechanically (bead beating) or chemically (detergents).
Denaturation: Proteins are naturally folded balls. We use Urea or SDS to unfold them into straight chains so enzymes can reach the inner amino acids.
Reduction & Alkylation: We break the disulfide bonds (sulfur bridges) that hold protein shapes together to ensure the protein stays unfolded during digestion.

2. Proteolysis (Digestion)

Mass spectrometers struggle with huge whole proteins. We use molecular scissors (proteases) to chop them into manageable peptides.

The most common enzyme is Trypsin. It cuts specifically after Arginine (R) and Lysine (K). This specificity is crucial because it allows computer algorithms to predict exactly what the peptides should look like based on the genome sequence.

3. Liquid Chromatography (LC)

If we injected the whole sample at once, the machine would be overwhelmed. We use High-Performance Liquid Chromatography (HPLC) to separate peptides over time.

Most commonly, we use Reversed-Phase LC. Hydrophobic ("greasy") peptides stick to the column longer, while hydrophilic ("water-loving") peptides wash out faster. They enter the mass spec one by one over a gradient (e.g., 60 to 120 minutes).

4. Peptide Ionization

To analyze molecules with electric fields, they must be charged (ions) and in the gas phase. The standard method for proteomics is Electrospray Ionization (ESI).

Liquid from the LC is pushed through a tiny needle with high voltage. This creates a fine mist of charged droplets. As the liquid evaporates, the peptides are left flying as gas-phase ions. This is a "soft" technique that doesn't destroy the peptides.

5. Mass Spectrometers

The heart of the operation. The mass spec measures the mass-to-charge ratio (m/z) of the ions. It performs two types of scans in rapid succession:

MS1 (Survey Scan)

The machine scans all ions entering at that moment. It sees their mass and intensity. It then picks the most intense ions to analyze further.

MS2 (Tandem MS)

To identify the peptide sequence, the machine isolates a specific ion, blasts it with gas (collision), and breaks it into fragments. By measuring the mass of the fragments (b-ions and y-ions), we can read the amino acid sequence.

6. Data Analysis

A single experiment generates thousands of spectra. We cannot read them manually. We use search engines (like Spectronaut, MSFragger, or DIA-NN) to compare our experimental spectra against a Protein Database (like a digital dictionary of all possible human proteins).

If the experimental spectrum matches the theoretical spectrum from the database, the peptide is identified. We then infer which proteins were present based on the list of identified peptides.

The Frontier: AI & DIA

Machine Learning Revolution

1. The Shift to DIA (Data-Independent Acquisition)

In the standard workflow (DDA), the mass spec picks the "loudest" peptides to analyze and ignores the quiet ones. This causes "missing values."

Enter DIA: Instead of picking peptides one by one, the machine isolates wide windows of mass (e.g., everything from 500-525 m/z) and fragments everything at once.
Result: A digital archive of every peptide in the sample, not just the abundant ones.

2. AI & Deep Learning

The problem with DIA is that the spectra are messy. This is where AI steps in.

Predicting Reality

Tools like Prosit and AlphaPeptDeep use Deep Learning to predict exactly how any peptide sequence should look in the mass spec.

Library-Free Search

Software like DIA-NN uses these AI predictions to untangle messy DIA data, allowing us to identify thousands of proteins without ever running a physical reference standard.

Single-Cell Proteomics (SCP) vs. scRNA-Seq

Emerging Technology

For years, researchers have relied on Single-Cell RNA-seq (scRNA-seq) to understand cellular heterogeneity. While instrumental, scRNA-seq offers limited molecular insights because it only measures the "recipe" (transcripts), not the "result" (proteins). We are now entering the era of Deep Single-Cell Proteomics (SCP).

scRNA-Seq (Established)

Throughput: Can analyze thousands of cells per experiment.
Depth: Comprehensive coverage of the transcriptome.
Limitation: Poor correlation to actual protein levels; misses protein turnover and post-translational modifications.

Deep SCP (New Frontier)

Throughput: Currently ~50 to 120 cells per day.
Depth: Can now quantify ~50% of the expressed proteome (~6,500 proteins) in a single HeLa cell.
Advantage: Measures the functional actors. Capable of detecting phosphosites (signaling) and protein turnover rates.

Why the sudden boom?

Since 2020, proteomics has achieved a >100-fold gain in sensitivity. This is driven by two key innovations:

New Hardware: The Orbitrap Astral mass spectrometer combines high resolution with incredible speed (up to 200 Hz), allowing deep coverage of miniscule samples.
Miniaturization: New liquid handling robots (like cellenONE) perform sample prep in sub-microliter volumes to prevent proteins from being lost to plastic surfaces.

Ready to dive deeper?

This page is a high-level summary. For the deep technical details on specific mass analyzers (Orbitrap vs TOF), statistical models for False Discovery Rates, and network analysis, please consult the full tutorial.

Read Full Documentation

Proteomics Overview 2026