Bioinformatics protocols

1. Chromosome 1

2. Chromosome 2

We are developing https://www.nextprot.org/(external link), the reference knowledgebase for C-HPP projects contact : lydie.lane at sib.swiss
We have set up data mining workflows to prioritize missing proteins for targeted studies contact : lydie.lane at sib.swiss or paula.duek at sib.swiss

3. Chromosome 3

4. Chromosome 4

5. Chromosome 5

  • We dispose of quantitative LC-MS(/MS) workflows, time alignment methods, singel stage MS quantification and identification approaches based on TTP and SpectraST.

6. Chromosome 6

7. Chromosome 7

  • Chromosome 7 has created The Proteome Browser Project (TPB)(external link), which enable chromosome based browsing of proteomics LC-MS/MS data. On January 29, 2013 the team updated the status of TPB, which is available here including PowerPoint slides. A MissingProteinPedia is currently being developed

8. Chromosome 8

9. Chromosome 9

10. Chromosome 10

11. Chromosome 11

12. Chromosome 12

13. Chromosome 13

Reference Databases (version or release date)
  • neXtProt (Newest Release): for protein information
  • Ensembl (Newest Release): for gene information
  • Guide to the Human Proteome (Newest Release) from the Global Proteome Machine database (GPMdb): for MS information.
  • PeptideAtlas (Newest Release): for MS information.
  • Human Protein Atlas (HPA; Newest Release): referenced for information regarding antibody availability and tissue expression.
  • Online Mendelian Inheritance in Man (OMIM): for disease-related information
  • the Cancer Gene Census (Newest Release): oncogene product information
Protein identification quantification
  • The tandem mass spectrometry (MS/MS) spectra were extracted and searched using MASCOT software (version 2.6.0, http://www.matrixscience.com/)(external link) against human sequences from NextProt (Newest Release).
  • The search parameters were:
    1. enzyme specificity: trypsin
    2. two maximum missed cleavages
    3. carbamidomethyl (C) as fixed modification
    4. acetyl (K), acetyl (protein N-term), and oxidation (M) as variable modification
    5. peptide mass tolerance of 10 ppm; (6) MS/MS mass tolerance of 1.2 Da
    6. PeptideProphet and ProteinProphet were used to estimate the false discovery rate (FDR).
    • We identified proteins using two or more unique peptides with an FDR < 1% at the protein level
Protein quantification
  • ProteomeDiscoverer software (version 1.3; Thermo Fisher) was used for protein identification and quantification.
  • For TMT-labeled peptides, TMT6 modification was added at peptide N termini (+229 Da) and at lysines (+229 Da) for fixed modification.
  • Quantification was performed by calculating the ratio between the peak areas of the TMT reporter groups.
    • To eliminate masking of changes in expression due to peptides that are shared between proteins, we calculated the protein ratio using only ratios from the spectra that are distinct to each protein.
    • All quantitative results were normalized using protein medians (minimum protein count: 20).
    • If all the quant channels were not present, the quant values were rejected.

14. Chromosome 14

We are developping Proline a data integration framework and a software suite for mass spectrometry based proteomics (software availability: http://proline.profiproteomics.fr/)(external link) . Proline algorithms focus on: result validation (using custom filters and target decoy analysis), merge/comparison of datasets and label-free quantification (spectral count and LC-MS analyses).
Contact: christophe.bruley at cea.fr

15. Chromosome 15

16. Chromosome 16

17. Chromosome 17

Provided by Emma (Yue) Zhang.
Peptide sequences were identified using Thermo Proteome Discoverer 1.3 from a human database SP.human.56.5 with full trypsin specificity and up to three internal missed cleavages. The tolerance was 50 ppm for precursor ions and 0.8 Da for product ions. Dynamic modifications were deamidation of asparagine, and static modification was carbamidomethylation for cysteine. Peptides were identified with Xcorr scores above the following thresholds: ≥3.8 for 3+ and higher charge state ions, ≥2.2 for 2+ ions, and ≥1.9 for 1+ ions.

18. Chromosome 18

SRM data processing
SRM spectra are processed manually using MassHunter Data Analysis (Agilent) to annotate the peak groups and to assign them to peptides. Protocol of data processing is available on request. Raw data is also exported and analyzed using the modified version of mQuest\mProphet (to avoid usage of decoy transitions) and also by geometrical analyzer in proprietary SRM2Prot software. Annotated peak groups are uploaded to Panorama for browsing and to PASSEL for quality assignment and long-term storage.
We provide following quality ranking of SRM data:
  • «green» data are of highest quality; protein detected using 2 peptides with protein copies range variance ≤ 1 order
  • «yellow» for protein detected using 2 peptides with protein copies range variance > 1 order
  • «red» for protein detected using 1 peptide

Gene-Centric Knowledgebase (kb18.ru(external link)) – the information collected from C-HPP recommended resources is processed relatively to chromosome 18 genes and presented as color-coded Web-matrix.
SRM Registry (pikb18.ru(external link)) – raw data on SRM measurements is stored and displayed on Web as a temporary point before transmitting the spectra to PASSEL. Currently collates 1568 spectra of endogenous and synthetic peptides.

19. Chromosome 19

20. Chromosome 20

21. Chromosome 21

22. Chromosome 22

23. Chromosome X

24. Chromosome Y

25. Mitochondrial Chromosome