E. Coli Genome Reported: Milestone of Modern Biology Emerges From Laboratory of Genetics
A team of scientists headed by Frederick Blattner of the E. coli Genome Project in the Laboratory of Genetics at UW–Madison has determined the complete genome sequence of the E. coli bacterium, it was reported in the Sept. 5 issue of the journal Science.
A genome is the sum total of the genes of an organism. Genes are encoded in the sequence of chemical base pairs that make up the intertwining strands of DNA. In the case of E. coli, a total of 4,403 genes have been identified in the 4,639,221 base pairs of DNA sequenced by the Wisconsin team. Of these, one-third are of completely unknown function.
E. coli holds a unique place in modern biology. It is arguably the single most studied cell in all of science. Humans have about 25 times as many genes as E. coli, but in the future a similar complete analysis will be possible for human DNA. For this reason E. coli is considered a model organism in the Human Genome Initiative of the National Institutes of Health (NIH).
For more than 70 years, Escherichia coli has been a mainstay of basic biology, and recent developments in biotechnology and genetic engineering have depended heavily on it. Related strains of E. coli are also responsible for several human diseases. Although not the first bacterial genome to be completed, E. coli is by far the most complex and the most eagerly awaited by scientists around the world.
“Determination of the complete inventory of the genes of organisms is one of the holy grails of biology, analogous to development of the periodic table of the elements in chemistry,” said Blattner. “Once they are all known and relationships between them become evident, a classification system for understanding the basic functions of life can be erected.”
E. coli’s natural habitat is the lower intestinal tract of animals, including humans. Originally isolated in 1922 from a convalescent diphtheria patient, the strain of E. coli sequenced by Blattner’s team rose to prominence as an experimental organism in 1945 when it was used in the discovery of spontaneous gene transfer or bacterial sex. As a result, the strain, known as K-12, was universally adopted for fundamental work in biochemistry, genetics and physiology. In recent years, it has become the workhorse of biotechnology and is used as a living factory to produce human insulin and other medicines.
The most important result of the work reported today is the sequence itself, said Blattner. In January, the data were made freely available through on-line databases such as GenBank to scientists worldwide. The E. coli genome is a huge one, and required an additional nine months to describe in detail in Science. With more than 4.6 million bases, it is two or three times bigger than other bacteria sequenced to date.
Sequencing of the base pairs that make up DNA is analogous to deciphering a language. It is done with the aid of specialized chemical analysis machines, but can only be accomplished with considerable human effort. More than 269 people – including many undergraduates getting their first taste of science – participated in the project at the UW–Madison.
The individual chemical bases that make up the genome correspond to the letters of the genetic alphabet which, grouped into words and paragraphs corresponding to genes, are read by the living cell as the instructions for assembly and function of all of life’s processes.
Knowledge of the genetic code, a major effort of modern biology, permits the scientist to translate the instructions for the purpose of understanding life processes, Blattner said. Knowing the precise order of the chemical base pairs for an entire genome allows the encoded life program to be read in its entirety leading, in principle, to a very complete level of understanding of physiological processes.
The report published in this issue of Science is a global analysis of the data collected by Blattner’s team in collaboration with Monica A. Riley of the Marine Biological Laboratories in Woods Hole, Mass., and Julio Collado-Vides of the University of Mexico at Cuernavaca.
The report, first and foremost, represents a record of the genes that make up the genome of the organism, and the establishment, where possible, of their functions. A surprising number of the genes, Blattner said, are new.
The work also details the similarity between every gene of E. coli and every gene of every other completely sequenced organism. The comparison, according to Blattner, shows that some genes appear commonly throughout nature while others are unique to E. coli. Such information is essential to any understanding of how E. coli and other bacteria have evolved, and what genes are required at a minimum to create life.
In addition to the base order of the chemical building blocks that make up the E. coli genome, and a better sense of its evolution and relationship to other organisms, the work of Blattner’s team has yielded a lode of new information about the organization of E. coli genes and how the information stored there is distributed.
It was noticed, too, that some of the DNA may have been added within the recent evolutionary history of the microbe. This immigrant DNA, said Blattner, is seemingly related to the genes of bacteria that cause disease, fueling speculation that the K-12 strain of E. coli has relics of a pathogenic past or, alternatively, is a pathogen waiting to happen.
The E. coli strain used in the Wisconsin study does not cause disease, but related strains are toxic and have been implicated in an increasing number of human food poisonings from products ranging from ground beef to unpasteurized apple juice to fecally contaminated lettuce. With the K-12 E. coli genome in hand, Blattner said it will soon be possible to make a gene by gene comparison with its pathogenic relatives and illuminate genes that govern the toxic nature of the bacteria.
The sequencing of the E. coli genome, said Blattner, was a necessary precursor to the sequencing of the human genome, now underway as part of the Human Genome Project under the direction of the National Human Genome Research Institute (NHGRI) of NIH. When scientists achieve this monumental goal, they will begin the daunting task of reading and understanding all of our protein-coding genes. They will accomplish this task, in part, by searching databases to find conserved biological motifs, first elucidated using simple model organisms like bacteria, yeast, worms and flies. By decoding the human genome, scientists can begin to decipher the genetic aspects of all disease, leading to improved treatments and even cures.
The NHGRI, a component of NIH, is a major partner in the Human Genome Project, the international research effort to map the estimated 50,000 to 100,000 genes and to read the complete set of genetic instructions encoded in human DNA. NHGRI also supports research on the application of genome technologies to the study of inherited disease, as well as the ethical, legal and social implications of this research. While primary funding for E. coli work came from the NHGRI, critical equipment was provided by the Division of Research Resources of the NIH. Substantial remodeling funds were provided to create the E. coli Genome Center by the WISTAR program of the State of Wisconsin, and research support was also provided by Genome Therapeutics Inc., SmithKline Beecham Inc., Dnastar Inc., and IBM.