February 2013

On deck chairs and lifeboats

Frequently, when relatively modest actions are proposed in the face of serious adversity, skeptics compare these adjustments to “rearranging the deck chairs on the Titanic.” The implication is that the actions are pointless in the face of the larger challenge. Despite the saying, we know of no actual rearranging of deck chairs on the Titanic; however, we do know that there was a lifeboat policy (women and children first) that had tremendous impact on who survived. Even seemingly modest policies can influence events greatly.

The rearranging the deck chairs analogy has been raised repeatedly regarding the NIH policy of not allowing second (A2) amendments (i.e., second resubmissions) to grant applications, hereafter referred to as the No A2 policy. This policy grew out of the National Institutes of Health’s Enhancing Peer Review initiative (1). The driving issue was that the percentage of funded R01 applications awarded upon initial submission (A0) had dropped from about 60 percent during the NIH budget-doubling period (1999 – 2003) to less than 30 percent in fiscal 2007 (2). Concomitantly, the percentage of grants that were funded in response to A2 applications increased from 10 percent during the doubling to more than 30 percent. The conclusion (supported by many specific anecdotes) was that study sections were queuing applications, providing outstanding scores to A2 applications while downgrading A0 applications, because the latter would have additional opportunities for funding. A consequence of this behavior was that outstanding research projects were being put into a holding pattern while they waited their turn to be given top scores.

The recommendation in the initial Enhancing Peer Review report (3) was that the NIH should “consider all applications as being new.” The goal of this recommendation was to allow study sections to focus on the merits of a proposal without consideration of whether it would have additional chances for submission. Many in the scientific community reacted negatively to this recommendation (4), in part because it included the provision that reviewers would not receive access to comments about earlier submissions for the same or similar projects.

In response to this feedback, NIH leaders elected not to implement this recommendation. Instead, they decided to address the concern that outstanding projects were taking too long to be funded by reducing the number of allowable amendments from two to one (5). While shortening the time to funding for some outstanding applications, the No A2 policy also has the potential to eliminate applications (and applicants) from consideration if they are not successful after the first and final amendment.

This policy change also was met with considerable resistance from the scientific community. A petition containing more than 2,000 signatures (6) was submitted to the NIH expressing concern about the potential impact of the policy on investigators who submit applications that score very well but not well enough to be funded at the A1 stage. Key to the argument in the petition is the question of the level of discrimination of which the NIH peer-review system is capable. The petition claims that peer review cannot distinguish between a fifth percentile application and a 20th percentile application.

Fig. 1 
Figure 1. Research productivity as a function of percentile score for more than 400 competing renewal R01 NIGMS grants funded in FY06.

Based on an analysis I initiated while I was at the NIH (7), it is possible to evaluate this assertion quantitatively. Using about 400 competing renewal (type 2) R01 grants funded in fiscal 2006, I examined subsequent productivity as a function of the percentile scores given to the applications. To quantify productivity, I used the number of citations from 2007 to 2010 for research papers (as opposed to review papers), corrected for the time dependence for citations.

The data based on individual grants show considerable scatter with a correlation coefficient of r = −0.09, indicating a modest decrease in the productivity metric as the percentile score increases. The scatter may be influenced by many factors, including the limitations of this metric for determining true scientific merit, different publication and citation patterns for different research fields and, of course, limitations of the peer-review system in predicting subsequent productivity. Some of the scatter can be reduced through the use of running averages; i.e., by averaging productivity over grants with similar percentile scores. Using a running average over 10 grants reduces the scatter and produces a correlation coefficient of r = −0.30. Using a running average over 100 grants produces a nearly straight line with a correlation coefficient of r = −0.91. The reduction in the productivity metric from the best-scored grants to the worst-scored grants (in this group, in the 15th to 20th percentile) is about 20 percent.

Is there any model for the uncertainty in scoring that accounts for the scatter observed in these data? In an attempt to answer this question, two options were considered. In the simpler model, the grants were ranked according to the productivity metric, and these rankings were converted to the effective percentile scores within the population. A random adjustment was made using a normal distribution with a given standard deviation, and the observed correlations between the rankings determined by these adjusted percentiles and the rankings determined by peer review were calculated. Calculations with standard deviations up to 30 percentile points revealed that this model underestimated the level of scatter in the data, indicating that other sources of scatter are important. The second adds an additional source of productivity differences, dividing the applications into two classes, the more productive half and the less productive half. A constant percentile adjustment was included to account for the difference in the expected productivity between these two classes. Simulations showed that this model could account for the observed scatter with a standard deviation of 10 percentile points (as shown in Figure 2).

Fig. 2 
Figure 2. A model with two classes and a scoring uncertainty associated with a standard deviation of 10 percentile point accounts for the observed scatter in productivity.

Thus the latter model suggests the uncertainty in scoring a competing renewal application is associated with a standard deviation of about 10 percentile points. Scores for new (type 1) applications are substantially more uncertain based on similar analyses.

With these estimates available, we now can model the potential impact of the No A2 policy in quantitative terms. Suppose that the overall standard deviation in scoring applications is 10 percentile points and that applications are funded up to the 12th percentile. What percentage of the applications that are truly in the top 12 percent will receive scores that are worse than the12th percentile? The results of simulations for standard deviations in scoring ranging from zero to 20 percentile points and funding cutoffs from 8 percent to 20 percent are shown in Figure 3.

These simulations reveal the answer to be 29 percent; that is, 29 percent of the applications that are actually in the top 12 percent of all applications would not be scored well enough to be funded. If we further assume the same scoring behavior applies to the unfunded applications resubmitted as A1 applications, this still leaves 8 percent of the top applications unfunded after two submissions. For a standard deviation of 15 percentile points and a funding cutoff of the 10th percentile — assumptions that are perhaps more likely, given the inclusion of new in addition to competing renewal applications and the current fiscal situation — the percentage of top applications still expected to be unfunded after two submissions is 14 percent.

Fig. 3 
Figure 3. The results of simulations showing (via a gray scale) the percentage of applications that are actually better than the funding cutoff but are anticipated to score worse than the funding cutoff. The contour line shows the points where 25 percent of the top applications are expected to score worse than the cutoff.

On her Rock Talk blog, NIH Deputy Director for Extramural Research Sally Rockey recently posted some data regarding the impact of the No A2 policy (8). The data presented demonstrate that the fraction of R01 applications funded as A0 applications has increased. More specifically, the pool of awards that previously would have been expected to fund A2 applications is now about equally divided between A0 and A1 awards. Thus, while it is true based on simple arithmetic that the fraction of A0 and A1 grants had to increase (because funding at the A2 stage is no longer an option), the distribution of these additional awards between the A0 and A1 pools could have been less favorable if all the decrease in A2 awards had been reflected in an increase in A1 awards. Further, the blog post presents data indicating the time to funding for new investigators (whose grants were funded) is the same as that for established investigators whose new (as opposed to competing renewal) grants were funded. Nonetheless, based on the comments on Rock Talk and other blogs (9, 10, 11) many members of the scientific community remain concerned about the implications of the policy. Some have stated that going back to allowing A2 applications would increase the amount of meritorious science funded. However, the laws of arithmetic still apply: modifying the No A2 policy would not increase the total number of applications that could be funded.

Another compelling question is what happens to the applicants who have applications that are not funded after the A1 level. There are two schools of thought supported by anecdotes but (as of yet) little data. The first school believes that it should be relatively easy for any competent investigator to recraft his or her application so it passes the NIH filter used to determine if an application is sufficiently different to be counted as new. The NIH has provided some guidance about this filter (12) but, to my knowledge, has not provided the fraction of applications that have been returned because they are deemed to be too similar to a previous application. This school also believes that some fraction of the new applications being funded at the A0 stage are, in fact, appropriately recrafted proposals that were not funded previously. The second school believes many investigators whose applications are not funded after A1 submission are dropping out of academic research because they are not able to (or do not wish to) develop research projects sufficiently distinct to be considered new. I have no doubt that both events are occurring in specific cases, but data are needed. To that end, I have written to the NIH on behalf of the American Society for Biochemistry and Molecular Biology’s Public Affairs Advisory Committee encouraging the NIH to do such analyses and make the results available.

Remarkably, the journal Nature published an editorial on this subject entitled “An Unhealthy Obsession” (13). The editorial, which acknowledges that the concerns about the impact of the No A2 policy are well-founded, suggests that the U.S. biomedical research community is unwise to continue to push for further reconsideration of the policy when the real issue relates to the historically low pay lines. The editorial is correct that the community must advocate as effectively as possible for growth in the NIH budget. We have been reaching out to ASBMB members to encourage them to contact their members of Congress and have provided (via an email to members) a useful tool that makes it very convenient for them to do so.

Thanks to the nearly 1,400 members who have participated, this effort has resulted in more than 4,400 letters to 349 members of the U.S. House and Senate. We need even greater participation in the future, and there simply is no good reason for any appropriate ASBMB member not to join the effort. With that said, this is not an excuse for not challenging the NIH to develop policies that best support the biomedical research enterprise.

To return to the Titanic analogy, we are certainly in waters full of icebergs in this current climate, and we need to do everything we can to help the captain and crew steer clear of them. That does not mean, however, that we should neglect to urge careful examination of the policies that determine how access to the limited number of seats in the lifeboats is determined. The long-term health of the biomedical research enterprise depends on it.

  1.   1. http://enhancing-peer-review.nih.gov/ 
  2.   2. http://nexus.od.nih.gov/all/2012/11/28/the-a2-resubmission-policy-continues-a-closer-look-at-recent-data/ 
  3.   3. http://enhancing-peer-review.nih.gov/meetings/NIHPeerReviewReportFINALDRAFT.pdf 
  4.   4. http://www.faseb.org/portals/0/pdfs/opa/2008/NIHPeerReviewSelfStudy.pdf 
  5.   5. http://grants.nih.gov/grants/guide/notice-files/NOT-OD-09-003.html 
  6.   6. http://www.asbmb.org/asbmbtoday/asbmbtoday_article.aspx?id=12158 
  7.   7. https://loop.nigms.nih.gov/index.php/2011/06/10/productivity-metrics-and-peer-review-scores-continued/ 
  8.   8. http://nexus.od.nih.gov/all/2012/11/28/the-a2-resubmission-policy-continues-a-closer-look-at-recent-data/ 
  9.   9. http://www.nature.com/news/2011/110329/full/471558a.html 
  10. 10. http://scientopia.org/blogs/drugmonkey/2012/10/16/return-of-the-a2-revision-for-nih-grants/ 
  11. 11. http://writedit.wordpress.com/2012/11/28/a2-nevermore/ 
  12. 12. http://public.csr.nih.gov/aboutcsr/NewsAndPublications/PeerReviewNotes/Documents/PRNMay20125302012.pdf 
  13. 13. http://www.nature.com/news/an-unhealthy-obsession-1.11953

Photo of Jeremy BergJeremy Berg (jberg@pitt.edu) is the associate senior vice-chancellor for science strategy and planning in the health sciences and a professor in the computational and systems biology department at the University of Pittsburgh.

found= true2169