False History and the Number of Genes

Mihaela Pertea and Steven L Salzberg have just published a paper in Genome Biology with an interesting title: Between a chicken and a grape: estimating the number of human genes. Part of their paper covers the history of gene number estimates and it includes the figure shown here.
Figure 2. The trend of human gene number counts together with human genome-related milestones. Individual estimates of the human gene count are shown as blue diamonds. The range of estimates at different times is shown by the two vertical blue dotted lines. Note how this range has narrowed in recent years.
This is really annoying because it perpetuates a myth that needs to be debunked. I've addressed it in an earlier posting [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome].

Mihaela Pertea and Steven L Salzberg have completely ignored a substantial literature on the subject. First, there's the genetic load arguments of King and Jukes from 1969. They estimated that there had to be fewer that 40,000 genes in our genome. Ohno summarized the estimates in a 1972 paper and came up with an estimate based on current knowledge of 30,000 genes (Ohno 1972).

Then there's the substantive literature on expressed sequences from the 1970's These were mostly hybridization experiments showing that human tissues had a core of about 10,000 genes expressed in the most complex tissues. The estimate was that there were probably no more than double that number of genes in total. Benjamin Lewin was the expert on this subject and his early books (especially Gene Expression II) covered all the bases. By 1983, Lewin was able to conclude in Genes II ...
Given some uncertainties about estimating the numbers of genes present in multiple copies, we might say that the mammalian genome looks to be of the order of 30,000 - 40,000 gene functions.
He published the same estimate in Genes IV in 1990.

Lewin was not alone. Most textbooks contained similar estimates in the 1980s. In Molecular Biology of the Cell by Alberts et al. (1983) the estimate was also 30,000 genes (p. 406). These estimates were not dismissed as unreliable. Quite the contrary. In my circles, the general impression was that humans had to have fewer than 50,000 genes and the number was likely to be less than 30,000.

It's true that Walter Gilbert had "guesstimated" 100,000 genes and it's true that the early estimates from the Human Genome Project used a number like this (based on a false assumption). But that doesn't mean that everyone agreed. Indeed, among those who had really studied the problem, a much lower number was preferred.

During the 1990s, the preliminary results from EST cloning and sequencing started to come in and it looked like there were at least 100,000 genes based on this data. However, this was a controversial estimate precisely because so many people knew that it conflicted with a lot of data. Sure, there were those who believed the EST data over everything else but they did not represent everyone who was interested in gene numbers. It is very misleading to suggest that there was a consensus in favor of more than 50,000 genes as the figure implies.

That's false history and it does a great disservice to those who turned out to be correct.


[HatTip: Carl Zimmer]

King,J.L. and Jukes,T.H. (1969) Non-Darwinian evolution. Science 164; 788-798.

Ohno, S. (1972) So much "junk" in our genome. Brookhaven Symp. Biol. 23:366-310.
nature science for kids,nature science definition,nature science articles,nature science jobs,nature science museum,nature science projects,nature science magazine,nature science journal nature science for kids,nature science definition,nature science articles,nature science jobs,nature science museum,nature science projects,nature science magazine,nature science journal