Does the Journal System Distort Scientific Research? — Confessions of a Supply-Side Liberal

A big theme of Stephen Buranyi's Guardian article "Is the staggeringly profitable business of scientific publishing bad for science?" is the enormous profits made by the scientific publishing industry, and whether the services they provide merit that kind of return. But I am more concerned about whether the journal system we know so well distorts scientific research. Stephen's article is long enough that the small percentage addressed to the possible distortion of scientific research by the journal system adds up to quite a bit. Below are the 4 key passages, with my comments.

Spectacular—and Often False—Results vs. Often Boring True Results

Stephen: Journals prize new and spectacular results – after all, they are in the business of selling subscriptions – and scientists, knowing exactly what kind of work gets published, align their submissions accordingly. This produces a steady stream of papers, the importance of which is immediately apparent. But it also means that scientists do not have an accurate map of their field of inquiry. Researchers may end up inadvertently exploring dead ends that their fellow scientists have already run up against, solely because the information about previous failures has never been given space in the pages of the relevant scientific publications. A 2013 study, for example, reported that half of all clinical trials in the US are never published in a journal.

My coauthor Dan Benjamin (see "My Experiences with Gary Becker" and these other posts: 1, 2, 3, 4, 5) is an advocate, along with others, for raising the standard for using the words "statistically significant" to a p-value of .005 or lower instead of .05 or lower. (Results with a higher p-value could only be called "statistically suggestive.")

Part of the argument is evidence that so many undesired statistical results remain unpublished, and the truth is often boring enough to be an undesired result, that most things that are said to be "statistically significant" at the 5% level are in fact false. By contrast, results that are said to be statistically significant at the 1% or .5% level might have, say an 80% probability of actually being true. In an early outline for this advocacy piece, Dan and coauthors write:

There is some empirical evidence from the recent replication projects in psychology and experimental economics. In both fields, the replication record is roughly double for initial studies with P < 0.005 relative to initial studies with 0.005 < P < 0.05: 50% versus 24% for psychology (OSC, 2015), and 85% versus 44% for experimental economics (Camerer et al., 2016). These numbers are based on relatively small samples of studies (93 in psychology, 16 in experimental economics) but are suggestive of the gains in replicability that may occur.

I think a rule reserving the words "statistically significant" for p-values less than .005%, with "statistically suggestive" for any higher p-value would be very helpful—and interestingly, would use the power of the journals to enforce a salutary rule—but the problem is created in the first place by people's preference for something interesting but likely false over something boring but true.

In a related 2015 paper, Dan, M.J. Bayarri, James O. Berger, and Thomas M. Sellke coauthors argue for the importance of statistical power in "Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses":

Abstract: Much of science is (rightly or wrongly) driven by hypothesis testing. Even in situations where the hypothesis testing paradigm is correct, the common practice of basing inferences solely on p-values has been under intense criticism for over 50 years. We propose, as an alternative, the use of the odds of a correct rejection of the null hypothesis to incorrect rejection. Both pre-experimental versions (involving the power and Type I error) and post-experimental versions (depending on the actual data) are considered. Implementations are provided that range from depending only on the p-value to consideration of full Bayesian analysis. A surprise is that all implementations — even the full Bayesian analysis — have complete frequentist justification. Versions of our proposal can be implemented that require only minor modifications to existing practices yet overcome some of their most severe shortcomings.

At the 2017 Russel Sage Foundation Summer Institute on Social Science Genomics that Dan organized, I got as swag this tote bag, reflecting Dan's views:

The Social Science genetics team in which Dan is a leading light has done a lot to foster higher power and greater attention to multiple hypothesis testing, in part because the history of genetics research is littered with nonreplicable results of doubtful truth. But Genetics is only one of several disciplines with a problem of results that don't replicate and so are of doubtful truth. Here are some links to articles discussing "replication crises" for each discipline:

Not too long ago, I attended a conference where I talked to a young untenured psychologist, who suggested that tenured psychologists harping on the importance of practices that would lead to more replicable results were being self-righteous and hypocritical because they would cut the very same corners if they didn't have tenure yet. But the point of high scientific standards is not comparative moral preening or accusations of self-righteousness—it is a practical matter: if we don't do it, we don't get to know the truth! If standards weren't imposed in the past, a lot of scientific effort was wasted; the misdirection of scientific effort should stop—even if it means that people need to be given tenure for establishing more boring true results with larger sample sizes instead of claiming flashy false results based on small sample sizes.

Nassim Taleb goes to far when he writes in his Medium post "An Expert Called Lindy"

If you hear advice from a grandmother or elders, odds are that it works at ninety percent. On the other hand, in part because of scientism and academic prostitution, in part because the world is hard, if you read anything by psychologists and behavioral scientists, odds are it works at less than ten percent, unless it is also what has been covered by the grandmother and the classics, in which case why would you need a nerd-psychologist?

But Carl Sagan has it right when he says "Extraordinary claims require extraordinary evidence." Prosaically, this means at a minimum that surprising and thereby interesting results require small p-values (certainly 1% or below) to be credible.

Styles of Research: High-Concept Scientists vs. Fabian Scientists

The Wikipedia article "High-concept" gives this definition:

High-concept is a type of artistic work that can be easily pitched with a succinctly stated premise.[1] It can be contrasted with low-concept, which is more concerned with character development and other subtleties that are not as easily summarized.

This is a useful analogy for understanding this passage in Stephen Buranyi's article:

Stephen: Today, every scientist knows that their career depends on being published, and professional success is especially determined by getting work into the most prestigious journals. The long, slow, nearly directionless work pursued by some of the most influential scientists of the 20th century is no longer a viable career option. Under today’s system, the father of genetic sequencing, Fred Sanger, who published very little in the two decades between his 1958 and 1980 Nobel prizes, may well have found himself out of a job.

Economic historian David Galenson provides an important perspective on "high-concept" research vs. gradually figuring things out, which I will call the "Fabian" style of research after the five-time Roman Consul Quintus Fabius Maximus Verrucosus who defeated Hannibal with guerilla warfare. David Galenson's work on the careers of artists has obvious parallels to scientific researchers. He uses the terms "conceptual artist" and "experimental artist" to refer to individuals with the corresponding work styles, but the phrases "conceptual economist" and "experimental economist" have other meanings that conflict with the style-of-work categories David Galenson is pointing to, so let me use the terminology "high-concept economist" and "Fabian economist." (Being a Fabian economist has no inherent association with being a "Fabian socialist" other than the adjective "Fabian" indicating a gradual, persistent mode of addressing issues.)

So the analogy is

conceptual artist: experimental artist :: high-concept economist: Fabian economist.

The corresponding terminology works for scientists in general:

conceptual artist: experimental artist :: high-concept scientist: Fabian scientist.

David Galenson has written many books, but I have mostly read David's work in the form of National Bureau of Economic Research Working Papers. You can see his NBER working papers listed here. Let me copy here the abstracts of some papers on this contrast between "high-concept" work styles and "Fabian" work styles. Using this analogy, you will learn a lot of you have the patience to work your way through this long list of abstracts. If you have less patience, read one or two and skip to the end of the list of abstracts.

From the New Wave to the New Hollywood: The Life Cycles of Important Movie Directors from Godard and Truffaut to Spielberg and Eastwood
with Joshua Kotin

Two great movie directors were both born in 1930. One of them, Jean-Luc Godard, revolutionized filmmaking during his 30s, and declined in creativity thereafter. In contrast, Clint Eastwood did not direct his first movie until he had passed the age of 40, and did not emerge as an important director until after 60. This dramatic difference in life cycles was not accidental, but was a characteristic example of a pattern that has been identified across the arts: Godard was a conceptual innovator who peaked early, whereas Eastwood was an experimental innovator who improved with experience. This paper examines the goals, methods, and creative life cycles of Godard, Eastwood, and eight other directors who were the most important filmmakers of the second half of the twentieth century. Francis Ford Coppola, Stanley Kubrick, Stephen Spielberg, and François Truffaut join Godard in the category of conceptual young geniuses, while Woody Allen, Robert Altman, John Cassavetes, and Martin Scorsese are classed with Eastwood as experimental old masters. In an era in which conceptual innovators have dominated a number of artistic activities, the strong representation of experimental innovators among the greatest film directors is an interesting phenomenon.

From "White Christmas" to Sgt. Pepper: The Conceptual Revolution in Popular Music

Irving Berlin, Cole Porter, and other songwriters of the Golden Era wrote popular songs that treated common topics clearly and simply. During the mid-1960s Bob Dylan, John Lennon, and Paul McCartney created a new kind of popular music that was personal and often obscure. This shift, which transformed popular music from an experimental into a conceptual art, produced a distinct change in the creative life cycles of songwriters. Golden Era songwriters were generally at their best during their 30s and 40s, whereas since the mid-'60s popular songwriters have consistently done their best work during their 20s. The revolution in popular music occurred at a time when young innovators were making similar transformations in other arts: Jean-Luc Godard and his fellow New Wave directors created a conceptual revolution in film in the early '60s, just as Andy Warhol and other Pop artists made painting a conceptual activity.

Innovators: Architects

Frank Lloyd Wright, Le Corbusier, and Frank Gehry were experimental architects: all worked visually, and arrived at their designs by discovering forms as they sketched. Their styles evolved gradually over long periods, and all three produced the buildings that are generally considered their greatest masterpieces after the age of 60. In contrast, Maya Lin is a conceptual architect: her designs originate in ideas, and they arrive fully formed. The work that dominates her career, the Vietnam Veterans Memorial, was designed as an assignment for a course she took during her senior year of college. The dominance of a single early work makes Lin's career comparable to those of a number of precocious conceptual innovators in other arts, including the painter Paul Sérusier, the sculptor Meret Oppenheim, the novelist J.D. Salinger, and the poet Allen Ginsberg.

Two Paths to Abstract Art: Kandinsky and Malevich

Wassily Kandinsky and Kazimir Malevich were both great Russian painters who became pioneers of abstract art during the second decade of the twentieth century. Yet the forms of their art differed radically, as did their artistic methods and goals. Kandinsky, an experimental artist, approached abstraction tentatively and visually, by gradually and progressively concealing forms drawn from nature, whereas Malevich, a conceptual innovator, plunged precipitously into abstraction, by creating symbolic elements that had no representational origins. The conceptual Malevich also made his greatest innovations considerably earlier in his life than the experimental Kandinsky. Interestingly, at the age of 50 Kandinsky wrote an essay that clearly described these two categories of artist, contrasting the facile and protean young virtuoso with the single-minded individual who matured more slowly but was ultimately more original.

Analyzing Artistic Innovation: The Greatest Breakthroughs of the Twentieth Century
David W. Galenson

... great conceptual innovators, like Picasso, Matisse, and Warhol, made their greatest discoveries abruptly, whereas great experimental innovators, like Mondrian, Kandinsky, and Pollock, made their discoveries more gradually. The finding that artists who innovate early in their lives do so suddenly, while those who innovate late do so more gradually, adds an important dimension to our understanding of human creativity.

Late Bloomers in the Arts and Sciences: Answers and Questions

Recent research has shown that all the arts have had important practitioners of two different types -- conceptual innovators who make their greatest contributions early in their careers, and experimental innovators who produce their greatest work later in their lives. This contradicts a persistent but mistaken belief that artistic creativity has been dominated by the young. We do not yet have systematic studies of the relative importance of conceptual and experimental innovators in the sciences. But in the absence of such studies, it may be damaging for economic growth to continue to assume that innovations in science are made only by the young.

Wisdom and Creativity in Old Age: Lessons from the Impressionists

Psychologists have not considered wisdom and creativity to be closely associated. This reflects their failure to recognize that creativity is not exclusively the result of bold discoveries by young conceptual innovators. Important advances can equally be made by older, experimental innovators. Yet we have had no examination of why some experimental artists have remained creative much later in their lives than others. Considering the major artists who worked together during the first decade of Impressionism, this paper compares the attitudes and practices of two important experimental innovators who made significant contributions after the age of 50 with two of their colleagues whose creativity failed to persist past 50. Unlike Pissarro and Renoir, who reacted to adversity in mid-career by attempting to emulate the methods of conceptual artists, Cézanne and Monet adopted elements of other artists' approaches while maintaining their own experimental methods and goals. For both Cézanne and Monet, recognizing how they themselves learned was a key to turning experience into wisdom. Their greatness in old age appears to have been a product of their understanding that although the improvement in their art might be painstaking and slow, over long periods its cumulative effect could be very great.

Conceptual Revolutions in Twentieth-Century Art

Art critics and scholars have acknowledged the breakdown of their explanations and narratives of contemporary art in the face of what they consider the incoherent era of "pluralism" or "postmodernism" that began in the late twentieth century. This failure is in fact a result of their inability to understand the nature of the development of advanced art throughout the entire twentieth century, and particularly the novel behavior of young conceptual innovators in a new market environment. The rise of a competitive market for advanced art in the late nineteenth century freed artists from the constraint of having to satisfy powerful patrons, and gave them unprecedented freedom to innovate. As the rewards for radical and conspicuous innovation increased, conceptual artists could respond to these incentives more quickly and decisively than their experimental counterparts. Early in the twentieth century, the young conceptual genius Pablo Picasso initiated two new practices, by alternating styles at will and inventing a new artistic genre, that became basic elements of the art of a series of later conceptual innovators. By the late twentieth century, extensions of these practices had led to the emergence of important individual artists whose work appeared to have no unified style, and to the balkanization of advanced art, as the dominance of painting gave way before novel uses of old genres and the creation of many new ones. Understanding not only contemporary art, but the art of the past century as a whole, will require art scholars to abandon their outmoded insistence on analyzing art in terms of style, and to recognize the many novel patterns of behavior that have been created over the course of the past century by young conceptual innovators.

I see myself as a Fabian economist. That is true for the way I worked even in getting to my early successes. I see Robert Lucas as a high-concept economist, but Milton Friedman as a Fabian economist. Though I feel less sure here, among my professors in graduate school, I see Greg Mankiw as more high-concept than Larry Summers.

Both high-concept scientists and Fabian scientists are important for scientific progress. Stephen Buranyi's point above is that the journal system handicaps Fabian scientists, and so throws the mix of contributions out of balance.

The Rise in the Power of Referees and Journal Editors and Disempowerment of Authors, with the Associated Bias Toward Novelty and Ideas that Can Be Neatly Tied Up With a Bow

Who could be against polished perfection and novelty in science? But asking for novelty and polished perfection with all the loose ends tied up has the cost of turning our attention away from old ideas and messy aspects of the truth. When papers pointing to puzzles do get published, they often stimulate a fascinating scientific debate. But how many papers pointing to puzzles, paradoxes, contradictions or serious difficulties never get published? (As economists try to square this circle, it has become common in recent years for economists to write papers with excellent, deep empirical work brilliantly laying out a surprising or puzzling stylized fact, combined with a shallow, unpersuasive model supposedly resolving that puzzle tacked on at the end.)

Much less obviously, an emphasis on polished perfection has shifted the balance of power toward referees and journal editors at the expense of authors. Stephen gives a fascinating historical account of how this happened in biology.

Stephen: In the mid-1970s, though, publishers began to meddle with the practice of science itself, starting down a path that would lock scientists’ careers into the publishing system, and impose the business’s own standards on the direction of research. One journal became the symbol of this transformation.

“At the start of my career, nobody took much notice of where you published, and then everything changed in 1974 with Cell,” Randy Schekman, the Berkeley molecular biologist and Nobel prize winner, told me. ... It was edited a young biologist named Ben Lewin, who approached his work with an intense, almost literary bent. Lewin prized long, rigorous papers that answered big questions – often representing years of research that would have yielded multiple papers in other venues – and, breaking with the idea that journals were passive instruments to communicate science, he rejected far more papers than he published.

What he created was a venue for scientific blockbusters, and scientists began shaping their work on his terms. “Lewin was clever. He realised scientists are very vain, and wanted to be part of this selective members club; Cell was ‘it’, and you had to get your paper in there,” Schekman said. ...

... Almost overnight, a new currency of prestige had been created in the scientific world. (Garfield later referred to his creation as “like nuclear energy … a mixed blessing”.)

It is difficult to overstate how much power a journal editor now had to shape a scientist’s career and the direction of science itself. “Young people tell me all the time, ‘If I don’t publish in CNS [a common acronym for Cell/Nature/Science, the most prestigious journals in biology], I won’t get a job,” says Schekman. He compared the pursuit of high-impact publications to an incentive system as rotten as banking bonuses. “They have a very big influence on where science goes,” he said.

And so science became a strange co-production between scientists and journal editors, with the former increasingly pursuing discoveries that would impress the latter. These days, given a choice of projects, a scientist will almost always reject both the prosaic work of confirming or disproving past studies, and the decades-long pursuit of a risky “moonshot”, in favour of a middle ground: a topic that is popular with editors and likely to yield regular publications. “Academics are incentivised to produce research that caters to these demands,” said the biologist and Nobel laureate Sydney Brenner in a 2014 interview, calling the system “corrupt.”

I'll bet there are similar stories that could be told for other disciplines.

There are many positive aspects of peer review. First, other systems of judging have serious problems as well. Second, tough peer review can often encourage an author to figure things out, understand the issues more deeply, and raise a paper to a higher level than it otherwise would have reached. But there is a dark side to peer review, ably described by Nassim Taleb in "An Expert Called Lindy":

I have had most of my, sort of, academic career no more than a quarter position. ... one (now sacked) department head, one day came to me and emitted the warning: “As a businessman and author you are judged by other businessmen and authors, here as an academic you are judged by other academics. Life is about peer assessment.”

It took me a while to overcome my disgust –I am still not fully familiar with the way non-risk takers work; they actually don’t realize that others are not like them, what makes people in the real world tick. No, businessmen as risk takers are not subjected to the judgment of other businessmen, only that of their personal accountant ...

You can define a free person precisely as someone whose fate is not centrally or directly dependent on his peer assessment

And as an essayist, I am not judged by other writers, book editors, and book reviewers, but by readers. Readers? maybe, but wait a minute… not today’s readers. Only those of tomorrow, and the day after tomorrow. ...

Peers devolve honors, memberships in academies, Nobels, invitations to Davos and similar venues, tea with the Queen, requests by rich name-droppers to attend cocktail parties where you only see people who are famous. Believe me, there are rich people whose lives revolve around these things. They usually claim to be trying to save the world, the planet, the children, the mountains, the deserts –all the ingredients of the broadcasting of virtue. ...

The Ritualistic Publishing Game

Nassim Taleb goes further to write

Academia can become a ritualistic publishing game ...

In some areas ... the ritualistic publishing game gradually maps less and less to real research ... researchers have their own agenda, at variance with what their clients, that is, society and the students, are paying them for. Knowing “economics” doesn’t mean in the academic lingo knowing anything about economics in the sense of the real activity, but the theories produced by economists. And courses in universities, for which hard working parents need to save over decades, easily degenerate into fashion. ...

while Stephen gives a reminder of the extent to which this ritualistic publishing game has been institutionalized:

Stephen: In a sense, it is not any one publisher’s fault that the scientific world seems to bend to the industry’s gravitational pull. When governments including those of China and Mexico offer financial bonuses for publishing in high-impact journals, they are not responding to a demand by any specific publisher, but following the rewards of an enormously complex system that has to accommodate the utopian ideals of science with the commercial goals of the publishers that dominate it. (“We scientists have not given a lot of thought to the water we’re swimming in,” Neal Young told me.)

To add two examples, at the Chicago Fed there is an automatic bonus for publishing in top journals, and at the University of Colorado Boulder where I now teach, there is a raise formula that mechanically includes publications in top journals. But these mechanical formulas are only a small part of the social reward for publishing papers. So much so, that in "Breaking the Chains" I use "publishing papers" as a metaphor for careerism more generally:

For most who go into academia, the salary they will get in academia is lower than they could get outside. So most who go into academia make that choice in part out of the joy of ideas, a burning desire for self-expression, a genuine fascination with learning how the world works, or out of idealism—the hope of making the world a better place through their efforts. But by the time those who are successful make it through the long grind of graduate school, getting a job and getting tenure, many have had that joy of ideas, desire for self-expression, thirst for understanding and idealism snuffed out. For many their work life has become a checklist of duties plus the narrow quest for publications in top journals. This fading away of higher, brighter goals betrays the reasons they chose academia in the first place.

Conclusion

Society has sacrificed quite a bit to put those of us who are academics or relatively highly paid public servants with an opportunity to do research in the positions we are in. We owe them the scientific seriousness to try to figure out how the world works, even at some (often only short-run) sacrifice to our own apparent career success. But in addition to resisting careerist temptations, we should also contemplate how the current journal system creates temptations for us and those around us to do otherwise.