ASBMB Today Opinions Cataloging itty-bitty proteins in large numbers

Feature

Cataloging itty-bitty proteins in large numbers

By Laurel Oldach

Dec. 7, 2022

The gap-free human genome was at last officially completed this year after researchers pieced together the final highly repetitive regions. It includes about 19,400 protein-coding genes in its 3 billion base pairs — but more may be as yet undescribed.

Initial bioinformatics limited protein-coding gene status to open reading frames, or ORFs, 100 amino acids or longer, on the grounds that there are millions of potential ORFs in the transcriptome, and it is likely that many of the shortest appear by chance.

However, ribosome profiling, a technique to sequence messenger RNA captured in the act of being translated, has, over the years, identified thousands of shorter translation products. Many, currently annotated as long noncoding RNAs or untranslated regions of coding mRNAs, have been found in unexpected parts of the genome.

To study the effects of a microprotein, researchers first have to know it exists. A new project aims to develop a catalog of small open reading frames that will be available through bioinformatics archives such as GenBank.

Research on a few of these translated sequences suggests that some of them play important regulatory roles. Nick Ingolia, co-developer of ribosome profiling techniques, said, “We now have several nice examples from a number of (research) groups” of translated products that currently are not annotated as protein-coding genes. “The whole field is trying to sort out order of magnitude; it could be five or 5,000. It’s probably somewhere in between.”

This year, a team of 35 investigators announced plans to survey the landscape of small translation products. Examining seven recent ribosome profiling studies, they found 7,264 small translation products ranging from 100 to 16 amino acids long. Roughly half of those appeared in multiple data sets. According to Ariel Bazzini, a Stowers Medical Institute researcher who, like Ingolia, collaborated on this project, there could be many more; the study used conservative numbers and omitted many high-quality ribosome profiling data sets that could have been surveyed for small translation products.

Having assembled this data set, the team now hopes to start probing evolutionary conservation of these small ORFs and whether their associated proteins appear in cells. Validating these small translation products will not be without challenges. The smaller a protein is, the more difficult it is to detect using mass spectrometry and the harder it is to make alterations such as affinity tagging without dramatically altering the end product.

The next step is to understand why these ORFs are translated and whether their products are stable in the cell. Bazzini said that some translation products from so-called untranslated regions of mRNAs act irrespective of their own sequence to regulate the abundance of the main coding protein in the transcript. Small ORFs also perhaps could be translated at random or only in disease contexts such as cancer that disrupt many regulatory pathways. Ascertaining that these proteins really are translated and looking to understand their functions is the consortium’s next planned step.

Meanwhile, the known short proteins can be difficult to find out about, since their identification often is buried in supplementary data sets. Databases including Ensembl, the Human Genome Organization, the Human Proteome Organization, Uniprot and Protein Atlas are working to standardize nomenclature and annotations for these small proteins so knowledge of their existence can reach beyond functional geneticists.

Enjoy reading ASBMB Today?

Become a member to receive the print edition four times a year and the digital edition monthly.

Learn more

Laurel Oldach

Laurel Oldach is a former science writer for the ASBMB.

Contribute your story

In memoriam: Michael J. Chamberlin

Jessica Desamero

Using 'nature’s mistakes' as a window into Lafora disease

Courtney Chandler

Mutant RNA exosome protein linked to neurodevelopmental defects

Meric Ozturk

When ribosomes go rogue

Tim Vernimmen

New discovery enables gene therapy for muscular dystrophies, other disorders

Emily Boynton

Where do we search for the fundamental stuff of life?

C. Brandon Ogbunu

Get the latest from ASBMB Today

Enter your email address, and we’ll send you a weekly email with recent articles, interviews and more.

Latest in Opinions

Opinions highlights or most popular articles

Show more Opinions

Essay

Women’s health cannot leave rare diseases behind

Feb. 4, 2026

A physician living with lymphangioleiomyomatosis and a basic scientist explain why patient-driven, trial-ready research is essential to turning momentum into meaningful progress.

Essay

Making my spicy brain work for me

Jan. 20, 2026

Researcher Reid Blanchett reflects on her journey navigating mental health struggles through graduate school. She found a new path in bioinformatics, proving that science can be flexible, forgiving and full of second chances.

Essay

The tortoise wins: How slowing down saved my Ph.D.

Jan. 6, 2026

Graduate student Amy Bounds reflects on how slowing down in the lab not only improved her relationship with work but also made her a more productive scientist.

Essay

How pediatric cataracts shaped my scientific journey

Dec. 30, 2025

Undergraduate student Grace Jones shares how she transformed her childhood cataract diagnosis into a scientific purpose. She explores how biochemistry can bring a clearer vision to others, and how personal history can shape discovery.

Essay

Debugging my code and teaching with ChatGPT

Oct. 16, 2025

AI tools like ChatGPT have changed the way an assistant professor teaches and does research. But, he asserts that real growth still comes from struggle, and educators must help students use AI wisely — as scaffolds, not shortcuts.

Essay

AI in the lab: The power of smarter questions

Oct. 14, 2025

An assistant professor discusses AI's evolution from a buzzword to a trusted research partner. It helps streamline reviews, troubleshoot code, save time and spark ideas, but its success relies on combining AI with expertise and critical thinking.

Cataloging itty-bitty proteins in large numbers

Enjoy reading ASBMB Today?

Related articles

Get the latest from ASBMB Today

Women’s health cannot leave rare diseases behind

Making my spicy brain work for me

The tortoise wins: How slowing down saved my Ph.D.

How pediatric cataracts shaped my scientific journey

Debugging my code and teaching with ChatGPT

AI in the lab: The power of smarter questions