The impatient London media jumped the gun once again on February 11, 2001, by breaking the embargo on the biggest scientific story of the year—the one everyone knew. However, if the planned commencement ceremony for the Human Genome Project fizzled in the face of journalistic hubris, the findings of the project promise to change the face of medicine and biomedical science in the decades to come.
The completed sequence from Craig Venter’s Celera Corporation appears in the February 16, 2001 issue of Science, and the sequence from the public consortium appears in the February 15, 2001 issue of Nature. However, the 2 teams had planned joint news conferences in Washington, DC on February 12, 2001 to insure that neither could claim first publication. Although Dr Venter and leaders of the public effort sought to claim the “star” position, it was the genome that took center stage.
That is why the planned ceremonies were a “commencement,” as it were. With the tool of the draft genome in hand, the understanding of human heritage and human future begins in earnest. The completion of the genome is merely the beginning of the quest for the genes that guide human development and the mutations that take it off track.
However, the revealed genome raises many questions and answers few. In an article in the February 15, 2001 issue of Nature, David Baltimore, PhD, president of the California Institute of Technology, wrote: “I’ve seen a lot of exciting biology emerge over the past 40 years. But chills still ran down my spine when I first read the paper that describes the outline of our genome” (Nature. 2001;409:814–816).
One of the most interesting features to Dr Baltimore is the number of genes, which is estimated at ≈31 000 by the public project and is far fewer than the guess of 100 000 hazarded when the project began in the late 1980s. It is startling and perhaps a bit deflating in light of the fact that there are 6000 genes in the yeast genome, 13 000 in that of Drosophila, 18 000 in that of the worm, and 26 000 in a plant. However, “unless the human genome contains a lot of genes that are opaque to our computers, it is clear that we do not gain our undoubted complexity over worms and plants by using many more genes. Understanding what does give us our complexity—our enormous behavioral repertoire, ability to produce conscious action, remarkable physical coordination (shared with other vertebrates), precisely tuned alterations in response to external variations of the environment, learning, memory … remains a challenge for the future” said Dr Baltimore.
Dr Baltimore is also intrigued by the notion that only 94 of the 1278 protein families in the human genome appear to be specific to vertebrates. “The most elementary of cellular functions—basic metabolism, transcription of DNA into RNA, translation of RNA into protein, DNA replication, and the like—evolved just once and have stayed pretty well fixed since the evolution of single-celled yeast and bacteria. The biggest difference between humans and worms or flies is the complexity of our proteins: more domains (modules) per protein and novel combinations of domains. The history is one of new architecture being built from old pieces.”
The natural question in light of those findings is what in the genome makes a human? However, Dr Baltimore writes, “The most exciting new vista to come from the human genome is not tackling the question ‘What makes us human?’ but addressing a different one: ‘What differentiates one organism from another?’ The first question, imprecise as it is, cannot be answered by staring at the genome. The second, however, can be answered this way because our differences from plants, worms, and flies are mainly a consequence of our genetic endowments.”
What was once billed as a race of profit against public ended in an agreed-on tie. Although Celera claimed victory several times, it was impossible to ignore that its knowledge base was built on what had gone before and on what was published on the Internet by the public international consortium. And the international consortium, separated by thousands of miles and even oceans, was hampered at first by the unwieldy design of its own collaboration. Only when it had pared down the number of sites and concentrated resources was it able to match the speed of Celera. The work of the Sanger Institute in the United Kingdom, Washington University in St Louis, the Whitehead Institute at the Massachusetts Institute of Technology, and the Baylor College of Medicine Genome Center in Houston, Texas, hastened the project toward the end.
The publications in the 2 journals revealed little because the completion of the genome was actually claimed in a White House ceremony last summer, and much of the public data has been available to the public for years. In addition, the completion of the genome was less a triumph of science than of technology. When the project was first proposed 15 years ago, sequencing DNA was a laborious hand operation that made the notion of determining the order of 3 billion bases a dream. The biological project was envisioned then as single scientists pursuing small projects in their own laboratories with trained scientists working with them. However, the National Institutes of Health announced that it would create a special office for genome research in 1988 and then said it would be headed by James Watson, PhD, the same man who, along with Francis Crick, PhD, published the seminal 1953 article on the structure of DNA. With that, the project began. The challenge of Dr Venter in the early 1990s only spurred the contest along.
The public consortium’s version of the genome is available freely on the Internet. The Celera version is subjected to some restrictions. In an article accompanying the publication of the genome itself, Science editors attest: “Science’s standing policy is that when a paper is published, archival data relevant to its results or methods must be deposited in a publicly accessible database. In compliance with our policy, the entire sequence is available free of charge from Celera’s website. All researchers, whether academic or commercial, may access the human genome data to verify, replicate, or challenge findings published in the Venter et al paper” (Science. 2001;291:1304–1351).
However, academic users may access the sequence and search and download segments of only as much as 1 megabase per week. They may publish their results and seek intellectual property protection if they agree to use the data for research only and if they agree they will not distribute it. Those in academia who want to use longer stretches of the sequence can receive an electronic copy of the Celera data only if they submit a statement cosigned by a representative of their institutions that the data will not be distributed and is to be for research purposes only. Commercial users can use the data only if they execute a Material Transfer Agreement, subscribe to the data for a fee, or seek a license from Celera. Science keeps a copy of the database in escrow in case there are changes in the access granted to the public.
In its statement released with the data, the public consortium states: “The human genome, the common heritage of all humanity, is arguably the most valuable dataset the biomedical research community has ever known. It holds long-sought secrets of human development, physiology, and medicine. The highest priority of the International Human Genome Sequencing Consortium is ensuring that sequencing data from the human genome are available to the world’s scientists rapidly, freely, and without restriction.” The data from the Human Genome Project has been deposited in 5 publicly available databases every 24 hours for the past 5 years. Those databases are located at http://www.ncbi.nlm.nih.gov/Genbank/; http://www.ebi.ac.uk/embl/index.html; and http://www.ddbj.nig.ac.jp/.
However, the challenge lies less in accessing the data than in the finishing touches that will be applied to the final versions of the sequences in the next 2 to 3 years. Then the real work begins. The landscape of the genome is puzzling, with no more than 1.5% of it coding for proteins. The source and function of the DNA that intervenes remains a mystery. The so-called “coding sequences” huddle together in some places, and there are stretches almost devoid of meaning in terms of ordering up the proteins that carry out the tasks of life. In some areas, DNA seems almost to stutter in repeat sequences for reasons that are yet to be delineated.
Already, >1000 disease-causing gene mutations have been identified, and the list is certain to grow more quickly because of the tools the completed sequence provides. Already, access to the genome data has resulted in the identification of at least 30 new disease genes. Yet, identifying more disease-causing mutations will not prove easy, and applying the information from the genome to treatments for diseases as diverse as addiction and cancer will prove even more challenging. It is one thing to know what causes disease and another to know how to convert that information into action.
The as-yet incomplete sequencing of model organisms such as mice and rats will make the annotation that defines the genes and other landmarks of the genome even easier. These animal sequences are crucial to further understand the genome sequence. The polished version of the human genome will probably be complete in the year 2003, a mere 50 years after the publication of the seminal paper by James Watson and Francis Crick describing the double helix that was the structure of DNA and the genesis of so much of this activity.
- Copyright © 2001 by American Heart Association