The processed forms of proteins inside of cells constitutes the mature proteome. Verification of the mature proteome involves the experimental determination of the N-terminal residues of the mature protein.


The Verified Set is a collection of proteins that have had their matured N-terminal residues sequenced using Edman degradation, as extracted from the biomedical literature. This Verified Set subtopic geneset contains all the proteins in the Verified Set. A table of the Verified Set can also be downloaded from the Download Utilities page and from the Verified Set pop-up window on the GenePages for genes that have had their mature protein starts verified. C-terminal residue sequencing is rarely done but also possible.

Although Fes is reported to have been N-terminally protein sequenced, an additional 26 aa have been added to the Fes start; Fes has been removed from the Verified Set since the start site previously identified now seems unlikely (Pettis, 1988). A peptide encoded in the 26 aa N-terminal extension was identified in a shotgun MS experiment, supporting the upstream start site for Fes (Krug, 2013). Although neither of the proposed start sites for Fes have good RBS sites, the downstream ATG is not conserved whereas the upstream start site and amino acid sequence are conserved.

There is no sub-Topic for type II cleavage Verifications by Edman degradation since modification of the N-terminal cysteine residue blocks protein sequencing. Most lipoproteins are confirmed by globomycin sensitivity of the cleavage reaction and C14 labeling of the covalently attached acyl chains; there is usually only one cysteine residue compatible with lipoprotein maturation. Nonetheless there are two lipoproteins in the Verified Set. The start site of translation for apo-Lpp was determined by Edman degradation (Inouye, 1977). In an lnt mutant, the Cys residue was unmodified by acyl chains which allowed for N-terminal protein sequencing Verifying the type II cleavage site (Narita, 2011).

Although the N-terminal sequence of FucK was reported to be MLXG, consistent with the annotated start MLSGYIAGAIMK (Tonella, 1998), FucK has been removed from the Verified Set and its start site has been reset to M11-K12, removing the first 10 amino acids, based on three strong lines of bioinformatics evidence: (1) all protein sequence alignments to FucK orthologs show that the MK- start is conserved but not the upstream MLSG- start, (2) the conserved start has a very likely RBS (AGGAGCGATT-ATG-AAA) and (3) it removes a 12 base overlap with REP204, which was a unique and unlikely scenario. The source of this likely error is unknown but the 2D gel spot identified as FucK may need further verification, i.e. more unique N-terminal sequence data, to resolve the discrepancy.