The Nature of Science

Showing posts with label Gene Expression. Show all posts

ENCODE/Junk DNA Fiasco: John Timmer Gets It Right!

John Timmer is the science editor at Ars Technica. Yesterday he published the best analysis of the ENCODE/junk DNA fiasco that any science writer has published so far [Most of what you read was wrong: how press releases rewrote scientific history].

How did he manage to pull this off? It's not much of a secret. He knew what he was writing about and that gives him an unfair advantage over most other science journalists.

Let me show you what I mean. Here's John Timmer's profile on the Ars Technica website.

John is Ars Technica's science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. John has done over a decade's worth of research in genetics and developmental biology at places like Cornell Medical College and the Memorial Sloan-Kettering Cancer Center. He's been a speaker at the annual meeting of the National Association of Science Writers and the Science Online meetings, and he's one of the organizers of the Science Online NYC discussion series. In addition to being Ars' science content wrangler, John still teaches at Cornell and does freelance writing, editing, and programming.

See what I mean? He has a degree in biochemistry and another one in molecular biology. People like that shouldn't be allowed to write about the ENCODE results because they might embarrass the scientists.

Read more »

Ed Yong Updates His Post on the ENCODE Papers

For decades we've known that less than 2% of the human genome consists of exons and that protein encoding genes represent more than 20% of the genome. (Introns account for the difference between exons and genes.) [What's in Your Genome?]. There are about 20,500 protein-encoding genes in our genome and about 4,000 genes that encode functional RNAs for a total of about 25,000 genes [Humans Have Only 20,500 Protein-Encoding Genes]. That's a little less than the number predicted by knowledgeable scientists over four decades ago [False History and the Number of Genes]. The definition of "gene" is somewhat open-ended but, at the very least, a gene has to have a function [Must a Gene Have a Function?].

We've known about all kinds of noncoding DNA that's functional, including origins of replication, centromeres, genes for functional RNAs, telomeres, and regulatory DNA. Together these functional parts of the genome make up almost 10% of the total. (Most of the DNA giving rise to introns is junk in the sense that it is not serving any function.) The idea that all noncoding DNA is junk is a myth propagated by scientists (and journalists) who don't know their history.

We've known about the genetic load argument since 1968 and we've known about the C-Value "Paradox" and it's consequences since the early 1970's. We've known about pseudogenes and we've known that almost 50% of our genome is littered with dead transposons and bits of transposons. We've known that about 3% of our genome consists of highly repetitive DNA that is not transcribed or expressed in any way. Most of this DNA is functional and a lot of it is not included in the sequenced human genome [How Much of Our Genome Is Sequenced?]. All of this evidence indicates that most of our genome is junk. This conclusion is consistent with what we know about evolution and it's consistent with what we know about genome sizes and the C-Value "Paradox." It also helps us understand why there's no correlation between genome size and complexity.

Read more »

More Expert Opinion on Junk DNA from Scientists

The Nature issue containing the latest ENCODE Consortium papers also has a New & Views article called "Genomics: ENCODE explained" (Ecker et al., 2012). Some of these scientist comment on junk DNA.

For exampleshere's what Joseph Ecker says,

One of the more remarkable findings described in the consortium's 'entrée' paper is that 80% of the genome contains elements linked to biochemical functions, dispatching the widely held view that the human genome is mostly 'junk DNA'. The authors report that the space between genes is filled with enhancers (regulatory DNA elements), promoters (the sites at which DNA's transcription into RNA is initiated) and numerous previously overlooked regions that encode RNA transcripts that are not translated into proteins but might have regulatory roles.

And here's what Inês Barroso, says,

The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of 'useless' DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes. In addition, it seems that widespread transcription from non-coding DNA potentially acts as a reservoir for the creation of new functional molecules, such as regulatory RNAs.

If this were an undergraduate course I would ask for a show of hands in response to the question, "How many of you thought that there did not seem to be "defined gene-regulatory elements" in noncoding DNA?"

I would also ask, "How many of you have no idea how evolution could retain "useless" DNA in our genome?" Undergraduates who don't understand evolution should not graduate in a biological science program. It's too bad we don't have similar restrictions on senor scientists who write News & Views articles for Nature.

Jonathan Pritchard and Yoav Gilad write,

One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation.

There is growing recognition of the importance of this regulatory evolution, on the basis of numerous specific examples as well as on theoretical grounds. It has been argued that potentially adaptive changes to protein-coding sequences may often be prevented by natural selection because, even if they are beneficial in one cell type or tissue, they may be detrimental elsewhere in the organism. By contrast, because gene-regulatory sequences are frequently associated with temporally and spatially specific gene-expression patterns, changes in these regions may modify the function of only certain cell types at specific times, making it more likely that they will confer an evolutionary advantage.

However, until now there has been little information about which genomic regions have regulatory activity. The ENCODE project has provided a first draft of a 'parts list' of these regulatory elements, in a wide range of cell types, and moves us considerably closer to one of the key goals of genomics: understanding the functional roles (if any) of every position in the human genome.

The problem here is the hype. While it's true that the ENCODE project has produced massive amounts of data on transcription binding sites etc., it's a bit of an exaggeration to say that "until now there has been little information about which genomic regions have regulatory activity." Twenty-five years ago, my lab published some pretty precise information about the parts of the genome regulating activity of a mouse hsp70 gene. There have been thousands of other papers on the the subject of gene regulatory sequences since then. I think we actually have a pretty good understanding of gene regulation in eukaryotes. It's a model that seems to work well for most genes.

The real challenge from the ENCODE Consortium is that they question that understanding. They are proposing that huge amounts of the genome are devoted to fine-tuning the expression of most genes in a vast network of binding sites and small RNAs. That's not the picture we have developed over the past four decades. If true, it would not only mean that a lot less DNA is junk but it would also mean that the regulation of gene expression is fundamentally different than it is in E. coli.

[Image Credit: ScienceDaily: In Massive Genome Analysis ENCODE Data Suggests 'Gene' Redefinition.

Ecker, J.R., Bickmore, W.A., Barroso, I., Pritchard, J.K. (2012) Genomics: ENCODE explained. Nature 489:52-55. [doi:10.1038/489052a]
Yoav Gilad
& Eran Segal

Splicing Error Rate May Be Close to 1%

Alex Ling alerted me to an important paper in last month's issue of PLoS Genetics. Pickrell et al. (2010) looked at low abundance RNAs in order to determine how many transcripts showed evidence of possible splicing errors. They found a lot of "alternative" spliced transcripts where the new splice junction was not conserved in other species and was used rarely. They attribute this to splicing errors. Their calculation suggests that the splicing apparatus makes a mistake 0.7% of the time.

This has profound implication for the interpretation of alternative splicing data. If Pickerell et al. are correct—and they aren't the only ones to raise this issue—then claims about alternative splicing being a common phenomenon are wrong. At the very least, those claims are controversial and every time you see such a claim in the scientific literature it should be accompanied by a statement about possible artifacts due to splicing errors. If you don't see that mentioned in the paper then you know you aren't dealing with a real scientist.

Here's the abstract and the author summary ..

Abstract

While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.

Author Summary

Most human genes are split into pieces, such that the protein-coding parts (exons) are separated in the genome by large tracts of non-coding DNA (introns) that must be transcribed and spliced out to create a functional transcript. Variation in splicing reactions can create multiple transcripts from the same gene, yet the function for many of these alternative transcripts is unknown. In this study, we show that many of these transcripts are due to splicing errors which are not preserved over evolutionary time. We estimate that the error rate in the splicing of an intron is about 0.7% and demonstrate that there are two major types of splicing error: errors in the recognition of exons and errors in the precise choice of splice site. These results raise the possibility that variation in levels of alternative splicing across species may in part be to variation in splicing error rate.

Pickrell, J.K., Pai, A.A., and Gilad, Y., Pritchard, J.P. (2010) Noisy Splicing Drives mRNA Isoform Diversity in Human Cells. PLoS Genet 6(12): e1001236. doi:10.1371/journal.pgen.1001236

Junk RNA or Imaginary RNA?

RNA is very popular these days. It seems as though new varieties of RNA are being discovered just about every month. There have been breathless reports claiming that almost all of our genome is transcribed and most of the this RNA has to be functional even though we don't yet know what the function is. The fervor with which some people advocate a paradigm shift in thinking about RNA approaches that of a cult follower [see Greg Laden Gets Suckered by John Mattick].

We've known for decades that there are many types of RNA besides messenger RNA (mRNA encodes proteins). Besides the standard ribosomal RNAs and transfer RNAs (tRNAs), there are a variety of small RNAs required for splicing and many other functions. There's no doubt that some of the new discoveries are important as well. This is especially true of small regulatory RNAs.

However, the idea that a huge proportion of our genome could be devoted to synthesizing functional RNAs does not fit with the data showing that most of our genome is junk [see Shoddy But Not "Junk"?]. That hasn't stopped RNA cultists from promoting experiments leading to the conclusion that almost all of our genome is transcribed.

Late to the Party

Several people have already written about this paper including Carl Zimmer and PZ Myers. There are also summaries in Nature News and PLoS Biology.That may change. A paper just published in PLoS Biology shows that the earlier work was prone to artifacts. Some of those RNAs may not even be there and others are present in tiny amounts.

The work was done by Harm van Bakel in Tim Hughes' lab, right here in Toronto. It's only a few floors, and a bridge, from where I'm sitting right now. The title of their paper tries to put a positive spin on the results: "Most 'Dark Matter' Transcripts Are Associated With Known Genes" [van Bakel et. al. (2010)]. Nobody's buying that spin. They all recognize that the important result is not that non-coding RNAs are mostly associated with genes but the fact that they are not found in the rest of the genome. In other words, most of our genome is not transcribed in spite of what was said in earlier papers.

Van Bekal compared two different types of analysis. The first, called "tiling arrays," is a technique where bulk RNA (cDNA, actually) is hybridized to a series of probes on a microchip. The probes are short pieces of DNA corresponding to genomic sequences spaced every few thousand base pairs along each chromosome. When some RNA fragment hybridizes to one of these probes you score that as a "hit." The earlier experiments used this technique and the results indicated that almost every probe could hybridize an RNA fragment. Thus, as you scanned the chip you saw that almost every spot recorded a "hit." The conclusion is that almost all of the genome is transcribed even though only 2% corresponds to known genes.

The second type of analysis is called RNA-Seq and it relies on direct sequencing of RNA fragments. Basically, you copy the RNA into DNA, selecting for small 200 bp fragments. Using new sequencing technology, you then determine the sequence of one (single end) or both ends (paired end) of this cDNA. You may only get 30 bp of good sequence information but that's sufficient to place the transcript on the known genome sequence. By collecting millions of sequence reads, you can determine what parts of the genome are transcribed and you can also determine the frequency of transcription. The technique is much more quantitative than tiling experiments.

Van Bekel et al. show that using RNA-Seq they detect very little transcription from the regions between genes. On the other hand, using tiling arrays they detect much more transcription from these regions. They conclude that the tiling arrays are producing spurious results—possibly due to cross-hybridization or possibly due to detection of very low abundance transcripts. In other words, the conclusion that most of our genome is transcribed may be an artifact of the method.

The parts of the genome that are presumed to be transcribed but for which there is no function is called "dark matter." Here's the important finding in the author's own words.

To investigate the extent and nature of transcriptional dark matter, we have analyzed a diverse set of human and mouse tissues and cell lines using tiling microarrays and RNA-Seq. A meta-analysis of single- and paired-end read RNA-Seq data reveals that the proportion of transcripts originating from intergenic and intronic regions is much lower than identified by whole-genome tiling arrays, which appear to suffer from high false-positive rates for transcripts expressed at low levels.

Many of us dismissed the earlier results as transcriptional noise or "junk RNA." We thought that much of the genome could be transcribed at a very low level but this was mostly due to accidental transcription from spurious promoters. This low level of "accidental" transcription is perfectly consistent with what we know about RNA polymerase and DNA binding proteins [What is a gene, post-ENCODE?, How RNA Polymerase Binds to DNA]. Although we might have suspected that some of the "transcription" was a true artifact, it was difficult to see how the papers could have failed to consider such a possibility. They had been through peer review and the reviewers seemed to be satisfied with the data and the interpretation.

That's gonna change. I suspect that from now on everybody is going to ignore the tiling array experiments and pretend they don't exist. Not only that, but in light of recent results, I suspect more and more scientists will announce that they never believed the earlier results in the first place. Too bad they never said that in print.

van Bakel, H., Nislow, C., Blencowe, B. and Hughes, T. (2010) Most "Dark Matter" Transcripts Are Associated With Known Genes. PLoS Biology 8: e1000371 [doi:10.1371/journal.pbio.1000371]

I Don't Have Time for This!

The banner headline on the front page of The Toronto Star says, "U of T cracks the code." You can read the newspaper article on their website: U of T team decodes secret messages of our genes. ("U of T" refers to the University of Toronto - our newspaper thinks we're the only "T" university in the entire world.)

The hyperbole is beyond disgusting.

The work comes from labs run by Brendan Frey and Ben Blencowe and it claims to have discovered the "splicing code" mediating alternative splicing (Barash et al., 2010). You'll have to read the paper yourself to see it the headlines are justified. It's clear that Nature thought it was important 'cause they hyped it on the front cover of this week's issue.

The frequency of alternative splicing is a genuine scientific controversy. We've known for 30 years that some genes are alternatively spliced to produce different protein products. The controversy is over what percentage of genes have genuine biologically relevant alternative splice variants and what percentage simply exhibit low levels of inappropriate splicing errors.

Personally, I think most of the predicted splice variants are impossible. The data must be detecting splicing errors [Two Examples of "Alternative Splicing"]. I'd be surprised if more than 5% of human genes are alternatively spliced in a biologically relevant manner.

Barash et al. (2010) disagree. They begin their paper with the common mantra of the true believers.

Transcripts from approximately 95% of multi-exon human genes are spliced in more than one way, and in most cases the resulting transcripts are variably expressed between different cell and tissue types. This process of alternative splicing shapes how genetic information controls numerous critical cellular processes, and it is estimated that 15% to 50% of human disease mutations affect splice site selection.

I don't object to scientists who hold points of view that are different than mine—even if they're wrong! What I object to is those scientists who promote their personal opinions in scientific papers without even acknowledging that there's a genuine scientific controversy. You have to look very carefully in this paper for any mention of the idea that a lot of alternative splicing could simply be due to mistakes in the splicing machinery. And if that's true, then the "splicing code" that they've "deciphered" is just a way of detecting when the machinery will make a mistake.

We've come to expect that science writers can be taken in by scientists who exaggerate the importance of their own work, so I'm not blaming the journalists at The Toronto Star and I'm not even blaming the person who wrote the University of Toronto press release [U of T researchers crack 'splicing code']. I'll even forgive the writers at Nature for failing to be skeptical [The code within the code] [Gene regulation: Breaking the second genetic code].

It's scientists who have to accept the blame for the way science is presented to the general public.

Frey compared his computer decoder to the German Enigma encryption device, which helped the Allies defeat the Nazis after it fell into their hands.

“Just like in the old cryptographic systems in World War II, you’d have the Enigma machine…which would take an instruction and encode it in a complicated set of symbols,” he said.

“Well, biology works the same way. It turns out to control genetic messaging it makes use of a complicated set of symbols that are hidden in DNA.”

Given the number of biological activities needed to grow and govern our bodies, scientists had believed humans must have 100,000 genes or more to direct those myriad functions.

But that genomic search of the 3 billion base pairs that make up the rungs of our twisting DNA ladders revealed a meagre 20,000 genes, about the same number as the lowly nematode worm boasts.

“The nematode has about 1,000 cells, and we have at least 1,000 different neuron (cells) in our brains alone,” said Benjamin Blencowe, a U of T biochemist and the study’s co-senior author.

To achieve this huge complexity, our genes must be monumental multi-taskers, with each one having the potential to do dozens or even hundreds of different things in different parts of the body.

And to be such adroit role switchers, each gene must have an immensely complex set of instructions – or a code – to tell them what to do in any of the different tissues they need to perform in.

I wish I had time to present a good review of the paper but I don't. Sorry.

Barash, Y., Calarco, J.A., Gao, W., Qun Pan, Q., Wang, X., Shai, O., Benjamin J. Blencowe, and Frey, B.J. (2010) Deciphering the splicing code. Nature 465: 53–59. [doi:10.1038/nature09000] [Supplementary Information]

What's the Connection between Hpa II and CpG Islands?

Epigenetics is all the rage today but the idea that gene expression could be regulated by modifying DNA and/or chromatin has been around for three decades.

Methylation is one of the ways that DNA can be modified and methylation at specific sites can be heritable. This observation grew out of studies on restriction/modification systems where DNA is protected from the action of restriction endonucleases by methylating the bases.

I didn't realize that the study of restriction enzymes led to the discovery of methylated regions of eukaryotic DNA. Find out how by reading an interview with Adrian Bird in PLoS Genetics: On the Track of DNA Methylation: An Interview with Adrian Bird.

This is also a good example of chance and serendipity in science. You can't plan for this stuff to happen—but that doesn't prevent politicians and administrators from trying.

The Ribosome and the Central Dogma of Molecular Biology

The Nobel Prize website usually does an excellent job of explaining the science behind the prizes. The STRUCTURE AND FUNCTION OF THE RIBOSOME is a good explanation of reasons why the 2009 Nobel Prize in Chemistry was awarded for work on the ribosome.

Unfortunately, the article begins by perpetuating a basic misunderstanding of the Central Dogma of Molecular Biology.

The ribosome and the central dogma. The genetic information in living systems is stored in the genome sequences of their DNA (deoxyribonucleic acid). A large part of these sequences encode proteins which carry out most of the functional tasks in all extant organisms. The DNA information is made available by transcription of the genes to mRNAs (messenger ribonucleic acids) that subsequently are translated into the various amino acid sequences of all the proteins of an organism. This is the central dogma (Crick, 1970) of molecular biology in its simplest form (Figure 1)

This is not the Central Dogma according to Crick (1970). I explain this in a posting from two years ago [Basic Concepts: The Central Dogma of Molecular Biology].

In both his original paper (Crick, 1958) and the 1970 update, Crick made it very clear that the Central Dogma of Molecular Biology is ....

The Central Dogma. This states that once “information” has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.

The diagram that's usually attributed to the central dogma is actually the Sequence Hypothesis. Crick was well aware of the confusion and that's why he wrote the 1970 paper. It was at a time when the so-called "Central Dogma" had been "overthrown" byt the discovery of reverse transcriptase.

Since then the false version of the Central Dogma has been disproven dozens and dozens of times—it's a minor cottage industry.

Here's what Crick says about this false version of the Central Dogma in his 1970 paper—the one quoted at the top of this page.

It is not the same, as is commonly assumed, as the sequence hypothesis, which was clearly distinguished from it in the same article (Crick, 1958). In particular, the sequence hypothesis was a positive statement, saying that the (overall) transfer nucleic acid → protein did exist, whereas the central dogma was a negative statement saying that transfers from protein did not exist.

Let's try and get it right. It will have the great benefit of stopping us from putting up with any new papers that refute the Central Dogma of Molecular Biology!

It will also encourage critical thinking. Haven't you ever wondered why there is a Central Dogma when reverse transcriptase, splicing, epigenetics, post-translational modification, chromatin rearrangements, small regulatory RNAs, and just about everything else under the sun, supposedly refutes it?

Crick, F.H.C. (1958) On protein synthesis. Symp. Soc. Exp. Biol. XII:138-163,

Crick, F. (1970) Central Dogma of Molecular Biology. Nature 227, 561-563. [PDF file]

The Origin of Dachshunds

A draft sequence of the dog (Canis lupus familiaris) genome has been available for several years. One of the reasons for working with dog genes and genomes is the fact that there are many different breeds. Since these breeds differ genetically and morphologically, there's a distinct possibility that the genes for various characteristics can be identified by comparing variants from different breeds.

One of the exciting possibilities is that some interesting behavioral genes could be identified since many breeds of dog are loyal, easy to train, and intelligent.¹

In addition to possible behavioral genes, one can identify many genes affecting morphology. One of them is the gene affecting short legs in various breeds, including dachshunds. Parker et al. (2009) identified an extra gene in short-legged breeds. The extra gene is a retrogene of the normal gene encoding fibroblast growth factor 4 (fgf4).

What is a retrogene? It's a derivative of the mature mRNA of a normal gene. Recall that most mammalian genes have introns and the primary transcript contains extra sequences at the two ends, plus exons that encode the amino acid sequence of a protein, plus intron sequences that separate the exons.

This primary transcript is processed to produce the mature messenger RNA (mRNA) that is subsequently translated by the translation machinery in the cytoplasm. During processing, the intron sequences are spliced out, a 5′ cap is added to the beginning of the RNA, and a string of "A" residues is added to the terminus (= poly A tail).

On rare occasions the mature mRNA can be accidentally copied by an enzyme called reverse transcriptase that converts RNA into single-stranded DNA. (The reverse of transcription, which copies DNA into RNA.) The single-stranded DNA molecule can be duplicated by DNA polymerase to make a double-stranded copy of the original mRNA molecule.

This piece of DNA may get integrated back into the genome by recombination. This is an extremely rare event but over the course of millions of years the genome accumulates many copies of such DNA sequences. In the vast majority of cases the DNA sequence is not expressed because it has been separated from its normal promoter. (Sequences that regulate transcription are usually not present in the primary transcript.) These DNA segments are called pseudogenes because they are not functional. They accumulate mutations at random and the sequence diverges from the sequence of the normal gene from which they were derived.

Sometimes the DNA copy of the mRNA happens to insert near a functional promoter and the DNA is transcribed. In this case the gene may be expressed and additional protein is made. Note that the new retrogene doesn't have introns so the primary transcript doesn't require splicing in order to join the coding regions (exons). The fgf4 retrogene inserted into the middle of a LINE transposable elements and the LINE promoter probably drives transcription of the retrogene.

The short-legged phenotype is probably due to inappropriate expression of the retrogene in the embryo in tissues that generate the long bones of the legs. The inappropriate expression of fibroblast growth factor 4 causes early calcification of cells in the growth plates—these are the cells that regulate extension of the growing bones. The result is short bones that are often curved.

Breeders selected for this anomaly and this is part of what contributed to the origin of dachshunds and other short-legged dogs.

There's a reason why dogs are such good species for discovering the functions of many genes. It's because of the huge variety of different breeds. Is there a reason why the species has more morphological variation than other species of animals? Probably, but we don't know the reason. Here's how Parker et al. begin their paper.

The domestic dog is arguably the most morphologically diverse species of mammal and theories abound regarding the source of its extreme variation (1). Two such theories rely on the structure and instability of the canine genome, either in an excess of rapidly mutating microsatellites (2) or an abundance of overactive SINEs (3), to create increased variability from which to select for new traits. Another theory suggests that domestication has allowed for the buildup of mildly deleterious mutations that, when combined, create the variation observed in the domestic dog (4).

We still have a lot to learn about evolution.

[Photo Credit: Dog Gone Good]

1. You can see why working with the cat genome wouldn't be as productive.

Parker, H.G., Vonholdt, B.M., Quignon, P., Margulies, E.H., Shao, S., Mosher, D.S., Spady, T.C., Elkahloun, A., Cargill, M., Jones, P.G., Maslen, C.L., Acland, G.M., Sutter, N.B., Kuroki, K., Bustamante, C.D., Wayne, R.K., and Ostrander, E.A. (2009) An Expressed Fgf4 Retrogene Is Associated with Breed-Defining Chondrodysplasia in Domestic Dogs. Science. 2009 Jul 16. [Epub ahead of print] [PubMed] [doi: 10.1126/science.1173275]