HPP Scientific Terms, Definitions & Abbreviations

Definitions of terms commonly used by HPP researchers, based on past literature or consensus reached by the HPP community (e.g. HPP publications, HPP guidelines, neXtProt, PeptideAtlas, Human Protein Atlas, GPMdb, ProteomeXchange) is available in pdf format here, (Updated: Aug 30, 2018).

Term Definition Additional Information, Web links and References
HPP The Human Proteome Project (HPP) is an international project organized by the Human Proteome Organization (HUPO) that aims to map, annotate, and functionally characterize the entire human proteome in a systematic way using mass spectrometry complemented by antibody and affinity-based techniques and many other protein methods. The HPP extends and is a direct counterpart to the Human Genome Project. HPP annotation of the human genome gene products adds significant value and insights about human biology. The HPP is composed of two complementary initiatives: The Chromosome-centric HPP (C-HPP) and Biology/Disease HPP (B/D-HPP). The former focuses on the completion of the “parts list” for proteins and their proteoforms whereas the latter aims to make proteomics an integral part of multi-omics research throughout the life sciences and biomedical research communities. Both initiatives are supported by 4 resource pillars: (i) mass spectrometry (MS), (ii) affinity reagents (Ab), (iii) knowledge base (Kb), and (iv) pathology. www.hupo.org

Note: Completion of the HPP will generate a protein-based map of the molecular architecture of human cells and the human body, enhance our understanding of human biology at the cellular level and lay a foundation for the development of novel diagnostic, prognostic, therapeutic, and preventive medical applications. The HPP is governed by the HPP Executive Committee.

• Legrain P, Aebersold R, Archakov A et al., The human proteome project: current state and future direction. Mol Cell Proteomics 2011Jul 10(7):M111 009993. doi: 10.1074/mcp.M111.009993.
• Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, van Eyk, JE, Liu S, Snyder M, Baker MS, Deutsch EW. Progress on identifying and characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. J Proteome Res 2018, doi:10.1021/acs.jproteome.8b00441.
C-HPP The Chromosome-Centric HPP (C-HPP) is an international collaborative initiative of the HPP that aims to map, annotate, and characterize the human proteome on a chromosomeby-chromosome basis. The 25 international teams from 20 countries use various proteomics technologies to study how the proteome is encoded in Chr 1 – 22, X, Y, and mitochondrial DNA. Currently, major foci of the C-HPP are to map all remaining missing proteins (PE2,3,4 proteins in neXtProt 2018-1-17 = 2,186) and characterize 1,260 PE1 proteins with no function annotated in neXtProt 2018-1-17 (uPE1). www.c-hpp.org

Note: The initial goal of C-HPP is to identify at least one representative protein with three posttranslational modifications (PTMs) (phosphoryl, -glycosyl-, acetyl-) and alternative splicing isoform encoded by each of c.a. 20,300 human protein-encoding genes with their tissue localization and quantitative studies using MS and/or antibody reagents.

• Paik YK, Jeong SK, Omenn GS et al., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012 Mar 7;30(3):221-3. doi:10.1038/nbt.2152.
• Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec 14(12):1059-1071. doi:10.1080/14789450.2017.1394189.
B/D-HPP The Biology/Disease HPP (B/D-HPP) is an international collaborative initiative of the HPP that focuses on mapping, annotating, and characterizing the proteome using proteomics technologies in relation to human biology and/or diseases. The B/D-HPP provides a framework for the coordination of 19 initiatives. A popular proteins strategy has been developed to stimulate use of targeted proteomics by each of the B/D initiatives and throughout the life sciences and biomedical community. Combining C-HPP, B/D-HPP, and Resource Pillar, the HPP has 50 international teams. www.hupo.org/B/D-HPP

Note: The goals of B/D-HPP are to conduct experimental studies of specific organs and biofluids in health and disease and to assemble publicly accessible prioritized panels of proteins relevant to biological processes, organs (e.g., cardiovascular, cerebral, hepatic, renal, pulmonary, and intestinal systems) and organelles (e.g. mitochondria). More broadly, it aims to develop standardized methods for protein detection and quantification by proteomics to promote translation into clinical settings.

• Van Eyk JE, Corrales FJ, Aebersold R et al., Highlights of the Biology and Disease-driven Human Proteome Project, 2015-2016. J Proteome Res 2016 Nov 4;15(11):3979-3987. doi:10.1021/acs.jproteome.6b00444.
PE Protein existence (PE) levels indicate the degree of evidence for the existence of a human protein based on curated information. The levels PE1 to PE5 are assigned by UniProtKB/SwissProt and neXtProt as follows.
• PE1: evidence at the protein level (identified by mass spectrometry (MS) according to HPP guidelines, or curated from multiple other experimental protein methods).
• PE2: evidence at the transcript level (detection by RNAseq or presence of expressed sequence tag).
• PE3: inferred by gene homology (assigned membership of a defined protein family).
• PE4: predicted protein (not yet assigned membership of a defined protein family).
• PE5: uncertain or dubious sequences (such as erroneous translation products or pseudogenes). In 2013, the HPP excluded PE5 entries from the search for missing proteins.
Note: www.nextprot.org/about/protein-existence; www.uniprot.org/help/protein_existence; https://hupo.org/Guidelines.
The HPP publishes annual HPP Metrics for worldwide progress identifying and characterizing the Human Proteome, based on the PE levels, e.g., Lane et al and Omenn et al (see HPP box above).

• Lane L, Bairoch A, Beavis RC et al., Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. J Proteome Res 2014 Jan 3;13(1):15-20. doi:10.1021/pr401144x.
• Deutsch EW, Overall CM, Van Eyk JE et al., Human Proteome Project Mass Spectrometry Data Interpretation Guidelines v2.1. J Proteome Res 2016 Nov 4;15(11):3961-3970. doi:10.1021/acs.jproteome.6b00392.
uPE1 Proteins Uncharacterized PE1 proteins (uPE1s) devoid of any functional annotation in neXtProt or annotated only with broad Gene Ontology Molecular Function/Biological Process terms not linked to any specific function. As of neXtProt release 2018-01-17, there were 1,260 uPE1 proteins. The current list of uPE1 proteins can be retrieved at: https://www.nextprot.org/proteins/search?mode=advanced&queryId=NXQ_00022
* Currently neXtProt excludes these 11 broad Gene Ontology (GO) function terms: GO:0005509, calcium ion binding; GO:0008270, zinc ion binding; GO:0005515, protein binding; GO:0042802, identical protein binding; GO:0051260, protein homooligomerization; GO:0005524, ATP-binding; GO:0000287 magnesium-binding; GO:0003676 nucleic acid binding; GO:0003824 catalytic activity; GO:0007165 signal transduction; GO:0035556 intracellular signal transduction. In the next neXtProt release this query will be refined by adding three terms: GO:0046914 transition metal ion binding; GO:0046872 metal ion binding; and GO:0035556 intracellular signal transduction.

• Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec;14(12):1059-1071. doi:10.1080/14789450.2017.1394189.
Missing Proteins Missing proteins (MPs) are defined as those protein entries that belong to categories PE2,3,4 in neXtProt. They correspond to confidently predicted proteins that lack sufficient experimental data from mass spectrometry or other direct protein methods to qualify as PE1. The annual HPP Metrics papers in the special issues of the Journal of Proteome Research assess progress on identifying MPs and strategies needed to enrich and detect MPs. The current list of MPs, based on neXtProt curation, can be retrieved at: https://www.nextprot.org/proteins/search?mode=advanced&queryId=NXQ_00204.

• Paik YK, Jeong SK, Omenn GS et al., The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat Biotechnol 2012 Mar 7;30(3):221-3. doi:10.1038/nbt.2152.
• Lane L, Bairoch A, Beavis RC et al., Metrics for the Human Proteome Project 2013-2014 and strategies for finding missing proteins. J Proteome Res 2014;13(1):15-20. doi: 10.1021/pr401144x.
• Baker, MS, et al. Accelerating the search for the missing proteins in the human proteome. Nat Commun 2017; 8, 1471 doi: 10.1028/ncomms14271.
HPP Guidelines The HPP Mass Spectrometry Data Interpretation Guidelines version 2.1.0 (“the Guidelines”) provide a set of expectations for data interpretation of MS data that is contributed to the HPP. There are broadly two sections, one that applies to all datasets, including data deposition requirements and false discovery rate thresholds, and a second that provides enhanced expectations for evidence of detections of missing proteins or translation products not currently listed in neXtProt. https://hupo.org/HPP-Data-Interpretation-Guidelines

• Deutsch EW, Overall CM, Van Eyk JE, Baker MS, Paik YK, Weintraub ST, Lane L, Martens L, Vandenbrouck Y, Kusebauch U, Hancock WS, Hermjakob H, Aebersold R, Moritz RL, Omenn GS, Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1., J Proteome Res. 2016 Nov 4;15(11):3961-3970.
Dark Proteome The dark proteome is a colloquial term that includes missing proteins (PE2 – PE4), uncertain/dubious predicted proteins (PE5), uPE1 proteins, smORF (small proteins), and any proteins translated by long non-coding RNAs or uncharacterized transcripts including those arising from non-coding regions of DNA and/or novel alternative splicing.
Proteoforms Alternative protein products from the same gene resulting from genomic sequence alterations, alternative splicing, RNA editing, post-translational modifications of amino acid side chains, and proteolytic processing events. http://repository.topdownproteomics.org/

• Smith LM, Kelleher NL,Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat Methods 2013 Mar 10(3):186-7. doi: 10.1038/nmeth.2369.
• LeDuc RD, Schwammle V, Shortreed MR, et al., ProForma: A Standard Proteoform Notation. J Proteome Res 2018 Mar 2;17(3):1321-1325. doi: 10.1021/acs.jproteome.7b00851.
• Aebersold, R, Agar, JN, et al. How Many Proteoforms are there? Nature Chemical Biology 2018;14:206-214. doi: 10.1038/nchembio.2576.
neXt-MP50 A specific two-year C-HPP initiative, announced in September 2016, that aims to accelerate the identification and validation of the existence of 50 currently missing proteins per chromosome team while incorporating progress from the entire international proteomics community. • Paik YK, Omenn GS, Hancock WS et al., Advances in the Chromosome-Centric Human Proteome Project: looking to the future. Expert Rev Proteomics 2017 Dec 14(12):1059-1071. doi:10.1080/14789450.2017.1394189.
• Omenn GS, Lane L, Lundberg EK, Overall CM, Deutsch EW. Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project. J Proteome Res. 2017 Dec 1;16(12):4281-4287. doi: 10.1021/acs.jproteome.7b00375. Epub 2017 Oct 9.
neXt-CP50 A specific C-HPP initiative, announced in September 2017, that aims to characterize some cellular function/s of 50 uPE1 proteins within 3 years by >14 C-HPP working groups. • Paik YK, Overall CM, Deutsch EW et al., Progress and Future Direction of Chromosome-Centric Human Proteome Project. J Proteome Res 2017 Dec 1;16(12):4253-4258. doi:10.1021/acs.jproteome.7b00734.
Popular Proteins Popular Proteins is a BD-HPP initiative to define the most-cited proteins based on health and diseases as found in PubMed and thereby stimulate wide use of targeted proteomics in the life sciences/biomedical community http://tinyurl.com/proteinpurpose

Note: There are two algorithms which can be used to assist B/D-HPP initiatives and those with particular interest in a disease, state or organ. The development of mass spectrometry-based assays to allow easier and accurate quantification across all fields of science for those proteins which are currently most studied. New expansion will be in popular PTMs and identification of proteins and specific amino acid residues which are most cited in PubMed for particular PTMs.

• Lam MP, Venkatraman V, Xing Y et al., Data-Driven Approach To Determine Popular Proteins for Targeted Proteomics Translation of Six Organ Systems. J Proteome Res 2016 Nov 4;15(11):4126-4134. doi: 10.1021/acs.jproteome.6b00095.
• Yu KH, Lee TM, Wang CS et al., Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining. J Proteome Res 2018 Apr 6;17(4):1383-1396. doi:10.1021/acs.jproteome.7b00772.

SOPS for missing proteins

SOPs describing the process for discovery of missing proteins and poorly identified protein.

  • Human Proteome Project Data Interpretation Guidelines (Version 2.1.0 - July 28, 2016) is available in docx word format here in pdf format here. Paper published in Journal of Proteome Research related to the guideline is available here.
  • Human Proteome Project Data Interpretation Guidelines (Version 2.0.1 - December 7, 2015) is available here. This guidleine provide check list for LC-MS/MS (DDA, SRM, DIA) data interpretation, management and guidelines for extraordinary detection claims (e.g., missing proteins, novel coding elements). The statement for establishing new HPP guidelines (Dec 7, 2015) is available here.

Data sharing


SOPs to describe process for data deposition in ProteomeXchange.

  1. Get informed on new datasets available at ProteomeXchange (Juan Antonio Vizcaíno on February 11, 2013).
  2. Document describing data submission procedure to ProteomeXchange (Juan Antonio Vizcaíno on March 15, 2013).
  3. Tutorial on submission of MS/MS datasets to ProteomeXchange via PRIDE is acceible in the document px_submission_tutorial2.pdf or can be accessible at ProteomeXchange site under the following link (version December 13, 2013). This document is an advanced tutorial, and contains the summary of steps that the Spanish C-HPP team has established to perform ProteomeXChange submissions. Some of the tools described in this tutorial are part of the Spanish C-HPP workflow, but the PRIDE team is not responsible for maintaining them. For contact on more information please email to Juan Antonio Vizcaíno. Added on Spetember 8, 2013 and modified on February 4, 2014.

Protein ID convertion to gene ID using Gene a la Cart

Power point slides showing protocol on how to convert list of Uniprot protein ID to gene ID.

HPP Chromosome Browsers

  1. The Proteome Browser at Monash University. The Proteome Browser
  2. Chromosome-Assembled human Proteome browsER at BPRC(Beijing). CAPER
  3. GenomewidePDB at Yonsei University. GenomewidePDB
  4. H-InvDB at Biomedicinal Information Research Center (Japan). H-InvDB

Chromosome-specific knowledgebases

  1. Chromosome 18 Knowledgebase at Institute of Biomedical Chemistry (Andrey Lisitsa, Russia). Chromosome 18 Knowledgebase
  2. All of the phosphorylation site information in GPMDB for human proteins, annotating ENSEMBL v. 70 protein sequences is available at ftp://ftp.proteomecentral.org/modifications/phosphoryl/. The README.txt explains the used file formats. Fairly strict set of conditions were used to make the list, therefore the list should be considered to be a "minimum" set of well-founded site assignments, rather than an attempt to find as many sites as possible. Any comments on this resource sould be addressed to Ron Beavis (University of British Columbia, Canada).

Nomenclature of biological objects

Nomenclature of human biological "objects" is availabel in Nomenclature_issues_HPP.docx.