[MCP] Sequencing Imperfection

Reconstruction of the aBTLA heavy and light chains from protein-template databases with GenoMS.

Database search algorithms are the primary means of identifying mass spectra data. However, these methods are limited to spectra whose peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. For example, antibodies confound standard identification techniques, because they are products of somatic hypermutations and large-scale genome rearrangements. A variety of search methods has been developed to allow for sequence variations, but even those tools still require a homologous peptide as a template. Another approach is de novo identification of peptide sequences,which does not require a protein database, but may have lower accuracy. In this study, the authors present a novel approach, called GenoMS, that draws on the strengths of both methods. Protein-sequence templates first are identified using a database search tool, and the templates are then used to recruit, align and sequence the regions of the target protein that are either missing or divergent from the database. The authors used the approach to reconstruct the full protein sequence for the antibody raised against the B- and T-cell lymphocyte attenuator molecule (aBTLA) using both protein and genomic templates; in each instance, the sequence was more than 97 percent accurate.

Template Proteogenomics: Sequencing Whole Proteins Using an Imperfect Database

Natalie E. Castellana, Victoria Pham, David Arnott, Jennie R. Lill and Vineet Bafna

Mol. Cell Proteomics, published online Feb. 17, 2010