April 2012

Cover story: Valid -omics data

Who’s responsible?
Given the magnitude of -omics studies, the responsibility for ensuring that data are valid involves everyone, says Omenn. He doesn’t let anyone off the hook: Students, postdoctoral fellows, principal investigators, departmental heads, institutional review boards, journal editors and funding agencies all have to take their roles seriously to ensure that data are sound.

But in discussing responsibilities, points of contention arise. To validate data, researchers need access to data collected by others. What kinds of data should researchers make available to others? It is important to note, says Robert Chalkley at UCSF, that not every researcher likes the idea of releasing his or her data. It’s not just the risk of scrutiny that alarms these researchers but the worry that someone else may discover something novel in the data that they missed, which can easily happen with -omics research, because the data sets are so large.

But even if researchers see the need for releasing the data, what should they release? It shouldn’t be just raw data, argues Baggerly. He says researchers also should release the algorithms and codes of bioinformatics tools as well as the metadata, the types of information that denote which samples belonged to which groups and how researchers selected those samples. Baggerly explains that, with -omics information, “The data are subject to several different types of pre-processing … In many of these pre-processing steps, any one of several different algorithms could be employed. There is not yet a consensus as to which one is best.” Because there isn’t a consensus, Baggerly argues researchers have to be explicit in stating which ones they used.

Then comes the big question: Who should bear the responsibility of collecting, housing and making accessible all that data? In Baggerly’s view, journals should house the bioinformatics scripts through which researchers ran their data sets for a given publication, because those codes don’t take up much server room. But what about raw -omics data files, which can be gigabytes, even going onto terabytes, in size?

Raw data access
Access to raw data is a thorny subject. One way to illustrate why is to look at proteomics. “Over the years, [raw] data have never left the laboratory in which they were collected,” explains Bradshaw. “It has been clearly the opinion of a lot of people in the proteomics field, and certainly the opinion of the editors of MCP, that these data need to be put somewhere where they can be interrogated by others.”

Websites like PRIDE collect processed proteomics data. But processed data, as Baggerly and Bradshaw are keen to emphasize, are not the same as the raw data spat out by analytical instruments.

So in 2010, MCP made it mandatory for its authors to deposit their raw data files in a repository designed specifically for the purpose. One example of a raw data repository is TRANCHE (https://proteomecommons.org/tranche/), operated by the laboratory of Philip C. Andrews at the University of Michigan.

“For some time, TRANCHE was basically the only show in town,” says Bradshaw. “The problem was that TRANCHE’s funding line eventually was dependent on a [federal] grant, which ultimately was not renewed.”

A genome, organized into chromosomes that are condensed into nucleosomes, is expressed by the action of enhanceosomes, transcriptosomes and splicosomes as a transcriptome, and with the help of ribosomes the transcriptome is turned into a proteome. Chromosomes, consisting mostly of autosomes, but also X and Y chromosomes, are duplicated by a replisome. Members of the proteome are organized in higher order structures such as peroxisomes, lysosomes, endosomes, etc. Undesirable members of the proteome are attacked by proteasomes (degradomes). Syndromes arise when specific members of the proteome are absent or misbehave. A large number of metabolomes are responsible for providing the required energy and raw materials, while signalosomes or kinomes regulate the creation of order out of chaos. Under certain conditions, constituents of the signalosome activate the apoptosome to organize the return to chaos. Somewhere in all of this MITOCHONDRIA play an absolutely essential role. Immo Scheffler



