Data from the 1000 Genomes Project, the world’s biggest resource on human genetic variation, are now available on the Amazon Web Services cloud. The collaboration between the National Institutes of Health and AWS to store the data will let any researcher obtain and study them at a fraction of the cost that it would be for his or her institution to host the information.
“The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation’s health and economy,” said NIH Director Francis S. Collins at an event at the White House to announce the collaboration.
The four-year-old 1000 Genomes Project now stands at 200 terabytes of data. That’s the same amount of information in more than 30,000 DVDs. Because of the massive size, few researchers have sufficient computing power to mine the data, so AWS is hosting the information as a free public data set. The data can be accessed through high-performance computing services such as Amazon Elastic Compute Cloud and Amazon Elastic MapReduce. Researchers pay only for additional AWS resources if they need to further process or analyze the data.
“Improving access to data from this important project will accelerate the ability of researchers to understand human genetic variation and its contribution to health and disease,” said National Human Genome Research Institute Director Eric D. Green. NHGRI is a major funder of the 1000 Genomes Project, along with Wellcome Trust and BGI-Shenzhen. Having the data on the cloud also means users can analyze them much more quickly. They need not be downloaded, and analyses can be run over many servers simultaneously.
THERE ARE MULTIPLE WAYS TO ACCESS THE 1000 GENOMES PROJECT DATA
Rajendrani Mukhopadhyay (firstname.lastname@example.org) is the senior science writer for ASBMB Today and the technical editor for the Journal of Biological Chemistry.