Good reads and the power of data

Curling up with a good book is one of life’s great pleasures. Two books that I have greatly enjoyed over my time as president of the American Society for Biochemistry and Molecular Biology are Nate Silver’s “The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t” and Siddhartha Mukherjee’s “The Emperor of All Maladies: A Biography of Cancer.”
Although they are very different, these books share three major features. First, each takes a largely historical approach to analyze progress in its respective field. Second, each addresses the roles of careful data collection and analysis in allowing fields to move past strongly held but often incorrect beliefs. Finally, each emphasizes the importance of understanding mechanism to place empirical observations in a robust context that can be extended. These features, of course, are of central importance in biochemistry and molecular biology and also in science advocacy.

‘The Signal and the Noise’

Silver is best known for his success in predicting the outcome of recent presidential and senatorial elections based on aggregation and analysis of polling data on his blog FiveThirtyEight. In his book, Silver describes the history and bases for predictions in a range of areas including politics but also finance, sports, gambling and earthquakes.
I found the section on weather prediction particularly illuminating. It includes a discussion of the discovery by Edward Lorenz at the Massachusetts Institute of Technology of so-called “chaotic” behavior in computer-based simulations of weather. Lorenz was dismayed when apparently identical runs of the simulations with the same data produced vastly different results. This is due to the fact that these (and many other) simulations are very sensitive to apparently trivial differences in the data on which they are based. Silver goes on to describe how computer simulations, in conjunction with judgments from human meteorologists, have steadily improved the quality of weather predictions since the 1970s.
This conclusion, of course, depends on tracking the accuracy of predictions. One of the sections that I found most intriguing involves calibration of measures of prediction accuracy, which Silver entitles “How to Know if Your Forecasts Are All Wet.” This section highlights the importance of access to many predictions and subsequent outcomes and calibration of predictions to judge how well they do. The predictions of the likelihood of rain by the National Weather Service are remarkably well calibrated; when the NWS forecasts rain with a 50 percent probability, it really does rain approximately 45 percent of the time.
He also presents calibration curves for the Weather Channel and for local television forecasts. Both of these groups have access to the National Weather Service predictions, yet their calibration curves are much worse. This is particularly true for the local television forecasts: They substantially overpredict the probability of rain. This tendency gets at a key point. What is the best measure of the validity of a forecast?
Citing a study by Allen Murphy, Silver notes three possible measures:
  1. 1. the “quality” or “accuracy” (How well does the forecast match the actual outcome?)
  2. 2. the “consistency” or “honesty” (To what extent was the prediction as accurate as it could be?)
  3. 3. the “economic value” (How useful was the prediction in making good policy decisions?)
In this light, it seems that some forecasters decrease accuracy and consistency to increase the economic value of their predictions. They get in less trouble with their audiences if they predict rain and it doesn’t occur (and the event is moved to an indoor venue) than if they don’t predict rain and all of the guests get soaked.

‘The Emperor of All Maladies’

This delightful and thought-provoking book by Mukherjee tracks our understanding of cancer and the development of cancer treatments from ancient times through the present “genomic revolution.” Major steps along this path include the appreciation of the nature of cancer as a disease of poorly controlled cell growth, the development of surgical approaches for treatment (including highly intrusive, radical surgeries), the introduction and refinement of chemotherapies based on killing rapidly dividing cells, the elucidation of cancer as a genome-based disease of cell-growth control, and recent advances in the development of specifically targeted anticancer agents.
In this context, I will highlight the development of the radical mastectomy for breast cancer by surgeon William Halsted and the implications of studies of its effectiveness. Moving past surgical treatments that focused primarily on the identifiable tumor, Halsted developed more aggressive surgical approaches that removed considerable additional tissue based on the concept that removing all of the “roots” of a tumor would save more lives than more localized surgeries.
Halsted analyzed the outcomes of radical mastectomy in 1907. Mukherjee writes:
In the summer of 1907, Halsted presented more data to the American Surgical Association in Washington, D.C. He divided his patients into three groups based on whether the cancer had spread before surgery to lymph nodes in the axilla or the neck. When he put up his survival tables, the pattern became apparent. Of the sixty patients with no cancer-afflicted nodes in the axilla or the neck, the substantial number of forty-five had been cured of breast cancer at five years. Of the forty patients with such nodes, only three had survived.
The ultimate survival from breast cancer, in short, had little to do with how extensively a surgeon operated on the breast; it depended on how extensively the cancer had spread before surgery. As George Crile, one of the most fervent critics of radical surgery, later put it, “If the disease was so advanced that one had to get rid of the muscles in order to get rid of the tumor, then it had already spread through the system,” making the whole operation moot.
But if Halsted came to the brink of this realization in 1907, he just as emphatically shied away from it. He relapsed to stale aphorisms. “But even without the proof which we offer, it is, I think, incumbent upon the surgeon to perform in many cases the supraclavicular operation,” he advised in one paper. By now the perpetually changing landscape of breast cancer was beginning to tire him out. Trials, tables, and charts had never been his forte; he was a surgeon, not a bookkeeper.
This passage reveals several points. First, the collection and analysis of the long-term outcomes demonstrated a clear but surprising pattern. These observations had implications both for treatment (more and more radical surgery was not likely to lead to improvements) and for the understanding of cancer (it can be a systemic rather than a localized disease). Second, rather than embracing the insights from the analysis, a leading expert applied his tools to other fields; the data had provided the “wrong” answer.
We must all be mindful of our own prejudices and our tendencies to see what we want to see in data or to dismiss data and analyses that come to conclusions inconsistent with our goals as flawed.

The importance of mechanism and rich data sources

Both books highlight the role of mechanistic understanding in driving progress.
Weather forecasting has improved steadily because the basic physical mechanisms of air and temperature flow and related phenomena are reasonably well understood so that models can be based on these mechanisms, even though considerable simplifications and approximations are necessary to produce manageable models (even with the most powerful supercomputers). In contrast, Silver argues that earthquake prediction remains much more problematic because of limited knowledge of mechanisms that promote earthquakes or fault stability. Furthermore, earthquakes are (fortunately) relatively rare events (in contrast with weather changes) so that limited data are available to test and calibrate predictions.
Mukherjee tracks the mechanistic understanding of cancer throughout his book, ending with the modern discoveries of cancers as diseases of the genome with changes in uncontrolled growth-promoting oncogenes and growth-controlling tumor suppressors. This mechanistic understanding has transformed some aspects of cancer treatment and prevention but, of course, much remains to be done.
Of course, these mechanistic insights come largely from studies of molecular biology and biochemistry. Progress in both basic science and its applications depends on pushing toward mechanistic rather than merely empirical understanding and on dispassionate and ruthless analysis of data.
As one might expect based on this discussion, I strongly believe that the same principles apply to policy and advocacy. For example, the ASBMB has helped frame discussions of the impact of the sequester with data collection and analysis including surveys and quantitative analysis of available data. These efforts should continue as we strive to help develop a more sustainable framework for our enterprise.
Jeremy BergJeremy Berg ( is the associate senior vice-chancellor for science strategy and planning in the health sciences and a professor in the computational and systems biology department at the University of Pittsburgh.