ASBMB recommends clearer guidance on metabolomics data sharing
The American Society for Biochemistry and Molecular Biology sent a letter to the National Cancer Institute on Dec. 30 regarding how to support privacy, reproducibility and harmonization of metabolomics data in alignment with the new National Institutes of Health Data Management and Sharing Policy.
When the NCI published a request for information titled “Soliciting Input on the Use and Reuse of Cancer Metabolomics Data” in October, the ASBMB was eager to ask its members about their experiences and to share their concerns about the NIH’s data-management and -sharing policy.
Briefly, in its letter, the ASBMB told the NCI that (1) -omics research produces large, complex data sets that threaten to burden many scientists under the NIH policy; (2) the diversity of metabolomics research necessitates both standardization and flexibility to be maximally effective; and (3) high variability in sample preparation, data collection, software, metabolite nomenclature and more makes the reuse and integration of metabolomics data extremely difficult.
Compliance must not burden investigators
The NIH data-management and -sharing policy, effective Jan. 25, aims to enable validation, promote data reuse and provide public access to NIH-funded research.
Rick Page of Miami University, chair of the ASBMB Public Affairs Advisory Committee, said that this effort is “noble and laudable but has the potential to require onerous data annotation efforts in order to yield useful publicly shared data.”
His concern arises from the NIH policy’s broad definition of scientific data, which must be of “sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications.” You can read the full policy and definition here.
For scientists who perform metabolomics and -omics research, this could be a tall order.
The society recommended that the NIH issue more guidance on what level of data and information are required to be compliant. The organization noted that it is imperative that the clarifications be “sufficiently flexible” to accommodate the diverse methods used in metabolomics research and their technical limitations.
Standardize, but stay flexible too
For data-management and -sharing to be maximally effective, there must be some standardization in terms of data formats, nomenclature and metadata information. For metabolomics data sets, standardization will be a challenge.
Metabolomics research most commonly is conducted using either mass spectrometry or nuclear magnetic resonance. These techniques are highly sensitive to experimental parameters. This means that variations in media, sample preparation, instrumentation and more can affect the results of the experiment significantly. Without sufficient metadata to instruct other scientists on how the data was collected and analyzed, the datasets are of little use.
To ensure data have proper metadata without undue burden, the society recommended that repositories require “a reasonable degree of metadata that is standardized in format and interoperable with international standards.”
Metabolomics involves collecting a snapshot of millions of molecules varying in chemical structures, chimeric states and chemical modifications. To distinguish one unique molecule from others is no small task, let alone naming them.
Many different formats and styles exist to name molecules, including InChIKey, SMILES, PubChem, ChemSpider, CHEBI and several others, but molecules still can have multiple names that complicate the deposition and retrieval of metabolomics data.
The ASBMB said that standardization of nomenclature across scientific fields would be beneficial, but interoperability of naming formats should be prioritized.
Additionally, metabolomics is a rapidly evolving field, and standardizations run the risk of being outdated quickly. The society called for the NCI and the NIH to structure data repositories to “accommodate new technologies and incorporate new functionalities with ease.”
Data diversity and complexity hinder metabolomics reuse
Andrew Lane, a professor at the University of Kentucky, agreed with the NIH’s goal of the data-sharing policy, stating that data in metabolomics should be “easily retrievable and understandable to nonexperts.” But he had some concerns about how to achieve successful transformation of metabolomics data to biological significance.
Pathway analysis and enrichment software can be “unnecessarily reductive in its assumptions,” Lane said. This type of software is designed to analyze complex data sets and output the metabolic pathways that may be upregulated or downregulated. However, the results may be misleading. To help ensure metabolomics data are shared and reused responsibly, the society recommended that these software packages clearly communicate to users that their outputs require additional validation.
The NCI requested feedback on researchers’ experiences integrating metabolomics data into multiomics studies, to which Lane said: “It is critical that the data are carefully managed and highly interoperable between multiple -omics data streams to ensure the output isn’t misleading or overly reductive.”
He clarified that to do this for each tissue, cancer type and specialized metabolism within an organism, the NIH must be prepared for “horrendous complexity.”
To increase reuse of metabolomics data by nonexperts, the society recommended that repositories be required to provide thorough instructions on how to properly retrieve, process and analyze metabolomics data sets to ensure they are utilized correctly.
Lane explained that the complexity of metabolomics makes standardization and centralized deposition a challenge.
“Developing a system that is effective for everyone is actually very difficult,” Lane said. He applauded the efforts of researchers at the University of California, San Diego, in developing a workable databank system for metabolomics, the Metabolomics Workbench, but noted that some issues related to depositing tracer data remain.
Let’s stay in touch
The society credited the NCI for soliciting input from the scientific community on cancer metabolomics data-management and -sharing but encouraged continued engagement.
“Decisions on these policies must consider both the utility of deposited data and the financial and time costs associated with meeting the final requirements” and should not be rushed, the society wrote.
The society asked the NCI to convene a summit to provide direct and candid discussions with investigators, journals and industry for setting standards and implementing those standards into research workflows.
This will ensure policymakers strike a balance between delivering on goals for the new NIH data-management and -sharing policy and implementing it in a way that is amenable to current scientific infrastructure.
The ASBMB and its members also hope to gain clarity on how federal science agencies and research institutes plan to support the infrastructure necessary for effective data-management and -sharing, such as funding for repositories. This type of support is critical for public access to scientific data but has yet to be defined clearly by policymakers.
Enjoy reading ASBMB Today?
Become a member to receive the print edition monthly and the digital edition weekly.Learn more
Get the latest from ASBMB Today
Enter your email address, and we’ll send you a weekly email with recent articles, interviews and more.
The society states that increasing student debt and financial strain are hurting the U.S. research enterprise and federal agencies must do more to ease this burden.
These funding mechanisms have been underutilized. The ASBMB public affairs staff offers recommendations to change that.