With the massive quantities of -omics data being produced today, how should they be validated?
Genomics, transcriptomics, proteomics -- the list of fields with “-omics” as the suffix has ballooned and so has the excitement and anticipation of what these fields can deliver. When so many biomolecules are tracked at once, scientists can get more detailed and complete pictures of the complex connections between different molecular pathways, cellular and tissue conditions, and pathologies. With the more detailed pictures, researchers can deepen our understanding of biology and even develop novel clinical diagnostic tests or therapeutic treatments to improve public health.
But in the excitement over the promise of -omics technologies, “the issue of validation, an important one, has been a bit neglected,” says James P. Evans at the University of North Carolina at Chapel Hill. He and other researchers, whose expertise range from fundamental research to clinical epidemiology, are worried that if data validation is not properly done, discoveries from -omics endeavors will be pointless.
The notion of validation is not anything new. “The process of replication is a hallmark of science,” says John Ioannidis of Stanford University. Scientists “don’t just blindly trust results, because trust belongs to dogma.”
But experts say that validation of -omics data is a different beast. “For -omics research, the complexity is so immense that we cannot really afford to just go for discovery without validation,” says Ioannidis. “Validation should be built into the process of discovery.”
Hypothesis-generated research -- when one or two variables are tested against one or two others -- tends to produce a few results, which are relatively easy to validate with simple statistical tests. But -omics data sets contain thousands, even millions, of molecules. Because of the sheer quantity of data, Keith Baggerly at the University of Texas MD Anderson Cancer Center says, “I no longer believe that we have good intuition about what makes sense.” Because of this lack of intuition to grasp what large data sets are revealing, Baggerly says these data sets need to be independently verified and checked in multiple ways.
The need for validation is growing increasingly urgent, especially when a significant number of -omics studies are targeted for medical applications. “There is plenty of research that focuses on the initial discovery phase but not enough research on replication, validation and translation,” argues Muin Khoury at the Centers of Disease Control and Prevention, who with Ioannidis recently made some recommendations about the validation of -omics data for clinical studies (1).
Experts all brought up the two cautionary tales of what can go wrong when -omics data are not scrutinized: Correlogic’s OvaCheck test of 2004 and Anil Potti and Joseph Nevins’ clinical trials at Duke University (see Rough patches article). The Institute of Medicine has reviewed how -omics data should be validated for clinical trials (see http://iom.edu/Activities/Research/OmicsBasedTests.aspx).
Much of the emphasis has been on validating -omics data relevant for clinical applications, because patient safety is of utmost importance. But Ruedi Aebersold at the Swiss Federal Institute of Technology in Zurich points out that validation also has significant repercussions in fundamental research. “True, patients aren’t hurt if someone misassigns a protein in a yeast project,” he says. “But it’s still an enormous waste of resources and effort. It’s generally bad for science if the data are poorly reproducible or misassigned.”