Over the past year, TRANCHE has struggled, because it hasn’t had funding to hire software engineers who are needed to maintain it. Because of TRANCHE’s technical and financial problems, MCP had to put a moratorium on its requirement for depositing raw data there.
The lack of federal support for publicly accessible repositories for raw data has researchers vexed. TRANCHE isn’t the only example; Omenn, Baggerly and others also point to the Sequence Read Archive, a repository for next-generation sequencing data, which had its funding cut off by the National Center for Biotechnology Information at the National Institutes of Health last year because of budget constraints.
“Funding agencies wish to fund the initial discoveries,” says Evans. For research projects that aim to benefit patients, just producing those first discoveries doesn’t cut it. “You have to spend some time and money ensuring that validation can be done,” he explains. “It isn’t as sexy as funding discovery, but I think funding agencies do have a responsibility to encourage and enable validation. Otherwise, we’re never going to really know which of these discoveries will pan out.”
And unlike funding discovery-driven research, points out Aebersold, it’s not going to cost federal agencies millions of dollars to build and maintain repositories for raw data. Creating infrastructure for data deposition is “not cheap but it’s also not astronomical,” he says. “It’s certainly a serious effort, but it’s not something that would bankrupt the NIH.”
A great example that benefited from public access to data is the Human Genome Project. The organizers of the federally funded project “demanded that data be uploaded, even at a time when the data were riddled with errors,” says Omenn. “It helped clean up the data, because people weren’t hiding it in their own computers!” Because other researchers were able to examine, test and validate the data, genomics has been able to move forward onto whole-genome sequencing, genomewide association studies and other endeavors.
When asked to respond to these views of academic researchers, Lawrence Tabak, a co-chair of the NIH Data and Informatics Task Force and the Advisory Committee to the Director, NIH Data and Informatics Working Group, provided a statement. “Data sharing is critically important to the advancement of biomedical research, and NIH is committed to supporting the collection, storage and sharing of biomedical research data. The astonishing increase in the amount of data being generated through NIH-funded research is an indicator of the extraordinary productivity of the research enterprise,” he said. “Yet with this astonishing increase, the agency is facing significant data management challenges. Given how extremely beneficial the availability of large datasets is to advancing medical discoveries, ensuring its continued availability is a high priority for NIH.”
Tabak, who is also the NIH principal deputy director, went on to say that the NIH director has formed an internal working group as well as a working group to the Advisory Committee to the Director to help inform NIH policy on data management. The committee is expected to make its recommendations in June of this year.
But Bradshaw cautions that having access to the raw data won’t be the entire solution to validation. Raw data access is “not a panacea, but it will make it easier to go in and look at what different people collected under different conditions,” says Bradshaw.