What more powerful form of study
of mankind could there be than to
read our own instruction book?
– Francis S. Collins
White House press conference, June 26, 2000
Earlier this year, we celebrated the 10th anniversary of a historic moment for humankind: In February 2001, Nature and Science published papers on the first draft version of the human genome. Sequencing of the human genome was completed in the most efficient way available at that time both in terms of time and costs (1). This efficient approach is also characteristic of the other goals articulated in the Human Genome Project and would not have been possible without strong collaborations between different groups, institutes and international consortia (Table 1). According to the third and final five-year plan of the HGP, one-third of the human genome was to be sequenced by the end of 2001 and the entire genome by the end of 2003 (1). However, in June 2000, the International Human Genome Sequencing Consortium announced the completion of a rough-draft sequencing of the entire human genome – an astounding achievement.
Much progress has been made between the pre- and post-genomic eras. Here, I will attempt to touch on some of the most important milestones achieved so far. In order to appreciate fully the evolution of technology and our knowledge, let us compare where we stood before the launch of the HGP with where we stand now.
Advances in technology have made sequencing more time- and cost-effective, more accurate and easier. Sequencing capacity has increased more than 1012-fold (2) since the pre-genomic era, and the cost-effectiveness associated with increased sequencing has improved at least 15,000-fold (3). Developments in sequencing technologies have outstripped Moore’s Law and outpaced progress in computational performance. Progress in sequencing even has made it possible for new disciplines like metagenomics to be born.
Human Genome Sequencing Consortium
•The Whitehead Institute/MIT Center for Genome Research, Cambridge, Mass., U.S.
•The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, U.K.
• Washington University School of Medicine Genome Sequencing Center, St. Louis, Mo.
• United States DOE Joint Genome Institute, Walnut Creek, Calif.
• Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, Houston, Tex.
• RIKEN Genomic Sciences Center, Yokohama, Japan
• Genoscope and CNRS UMR-8030, Evry, France
• GTC Sequencing Center, Genome Therapeutics Corporation, Waltham, Mass.
• Department of Genome Analysis, Institute of Molecular Biotechnology, Jena, Germany
• Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing, China
• Multimegabase Sequencing Center, The Institute for Systems Biology, Seattle, Wash.
• Stanford Genome Technology Center, Stanford, Calif.
• Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, Calif.
• University of Washington Genome Center, Seattle, Wash.
• Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan
• University of Texas Southwestern Medical Center at Dallas, Dallas, Tex.*
• University of Oklahoma’s Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, Norman, Okla.
• Max Planck Institute for Molecular Genetics, Berlin, Germany
• Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, Cold Spring Harbor, N.Y.
• GBF German Research Centre for Biotechnology, Braunschweig, Germany
*Sequencing center is no longer in operation.
Reprinted from http://www.genome.gov/11006939 (Accessed October, 2011)
Questions that cannot be answered by human research because of either ethical or technological limitations can still be posited and addressed through the study of model organisms. Sequencing the genomes of model organisms also has been of the utmost importance. When we understand how a given species’ genes function, this information becomes very helpful when attempting to predict how genes of other species function. Indeed, as Jacques Monod said, “Once we understand the biology of Escherichia coli, we will also understand the biology of an elephant.” The successful completion of the sequencing of the entire genome of a live organism – Haemophilus influenzae (1.8 Mb) – for the first time in 1995 marked a new era in the evolution of the biomedical field. Up until then, only several viral and organellar genomes had been sequenced, including bacteriophage ΦX174 (5,368 bp), which was the first DNA-based genome to be sequenced, as well as bacteriophage I (48,502 bp), cytomegalovirus (229 kb), vaccinia (192 kb), mitochondrion (187 kb), chloroplast (121 kb) and smallpox (186 kb).
At the turn of the millennium, before the sequencing of the human genome, the genomes of four eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana) and a few dozen prokaryotes were sequenced. The size of the sequenced genomes combined was less than 500 Mb. Nonetheless, at that point, only about five years had passed since the completion of the first sequencing of a live organism’s genome. Yet 10 years later, we now have sequenced more than 250 eukaryotic and 4,000 prokaryotic and viral genomes, the total size of which is greater than 130 Gb!
The successful sequencing of small genomes gave the HGP several advantages. For example, improved sequencing techniques finally made the HGP feasible and brought its completion before the originally planned deadline. But more importantly, sequencing of the genomes of various organisms has allowed us to address questions relevant for both biology and medicine. Although it is important to identify those genes that are conserved, much may also be gleaned by studying gene divergence between species. Comparative genomics continues to provide helpful information about the structure, function and regulation of genes and how they relate to disease susceptibility and other issues by comparing the genomes of different species, whether they are evolutionarily distant or closely related like humans and Neanderthals.
When the HGP first launched, humans were thought to have nearly 100,000 genes. In 2001, it was clear that the actual number was much lower, and it was estimated to be between 30,000 and 40,000 genes. We now know that the actual number is even lower: approximately 20 to 25 percent of the originally predicted amount. This finding has sparked a renewed interest in the study of alternative splicing. We now know that, even though many eukaryotic genes operate according to the one gene, one protein scenario, 94 percent of human genes undergo alternative splicing, a very effective tool that allows human genes to make up at least three times as many proteins.
We have discovered that less than 10 percent of the human genome encodes proteins and that what we have called “junk DNA” carries out important functions. We have gained an appreciation for the importance of noncoding RNAs, including piRNAs, microRNAs and lincRNAs. With regard to mutations, researchers have identified approximately 4,000 genes that cause genetic diseases. Among them are not only single-gene Mendelian disorders but also complex diseases such as cancer.
Progress in genome sequencing has not benefitted only researchers; farmers and pet owners have profited as well. Besides making a difference in agriculture, sequencing the genomes of domestic animals enriches our knowledge of conserved evolutionary pathways and genetic mechanisms of disease in those animals and in humans. Also, sequencing the genomes of disease-causing organisms is very important for the medical and veterinary fields.
Many of the achievements we have accomplished over the last decade were inconceivable at its beginning. However, many projects once considered overly ambitious now appear reasonable. For example, for years the thousand-dollar genome project sounded like science fiction; yet now the expectations are even higher, and we look to the day when we can sequence our own personal genomes for a more affordable price.
There remains a huge amount of work ahead of us. Although it may have seemed that the final sequence of the human genome had been determined in 2006, it is not yet complete. Human DNA fragments are still being sequenced, resequenced and analyzed. The genomic databases will be updated with the revised sequences. And there are many projects to complete, including characterization and cataloging of all transcript variants and epigenomic modifications as well as all intermolecular interactions between DNA, RNA and proteins.
We have made great progress in understanding the molecular mechanisms of diseases and developing diagnostic tools and effective treatments. Thanks to advances in genomic sequencing, we are moving away from the chemotherapy era and toward personalized medicine. But we cannot rest comfortably on our laurels, because, the more we learn in the post-genomic era, the more we realize how much more there is to know and explore in our instruction book.
- 1. The Human Genome Project Completion: Frequently Asked Questions. http://www.genome.gov/11006943 (Accessed October 2011).
- 2. Mardis, E.R. A decade’s perspective on DNA sequencing technology. (2011) Nature. 470, 198 – 203.
- 3. Lander, E.S. Initial impact of the sequencing of the human genome. (2011) Nature. 470, 187 – 197.
Roza Selimyan (firstname.lastname@example.org) is a research scientist at the National Institute on Aging.