Horizontal Gene Transfer

View Genes On Map


Horizontal Gene Transfer (HGT) is the process of inheriting DNA from other organisms in the environment rather than from parental cells. Genome sequencing has revealed that an unexpectedly high proportion of a bacterial genome can be comprised of genes derived from non-parental cells. Perhaps 20-30% of the genes of E. coli K-12 have been derived from HGT.


Horizontal gene transfer (HGT) is also called lateral gene transfer (LGT) or Horizontal Transmission (HT) and the terms are used interchangeably in EcoGene. The genes themselves are called horizontally or laterally transferred (or transmitted) genes, foreign genes, or alien genes. They can be intragenomic homologs of native genes, but they are not paralogs of the native genes but rather are xenologs (Koonin, 2001). We refer to them as HT genes, for Horizontally Transferred genes.

The HT Gene Set

The 1053 K-12 MG1655 genes identified as HGT genes in Davids (2008), reduced to 1006 genes as explained in the HT1006 subTopic text, are used as the current basis for annotating the HT genes in EcoGene. These 1006 HT genes identified by Davids (2008) are further reduced to a 940 HT set by removing 8 IS genes and pseudogenes and 58 prophage genes (including 11 pseudogenes) which are HT genes classified separately and more specifically as prophage and IS element subTopics. These 940 predicted HT genes include 911 intact HT genes and 28 HT pseudogenes. However the authors were unaware that they were examining pseudogenes so these 28 pseudogenes are not labeled as HT genes until they are studied further and are available in a subTopic.

EcoGene has currently annotated 911 genes as HT genes.

The HT and Common Core predictions should be evaluated heuristically by the user to assess validity. Continuity is one way to assess validity so the HT and Core designation are indicated in the prophage/IS region of the EcoMaps. Blocks of HT and Core genes can be inspected for inconsistencies. A curated form of these predictions will evolve as this is done within EcoGene with a log file of changes to these HT designations based on heuristic and bioinformatics analysis done in house and taken from the literature, incorporating predictions based on different approaches into a more consistent and accurate set of predictions.

The aga operon provides an example.

Horizontal gene transfer of foreign DNA into the E. coli K-12 genome can be facilitated by any mechanism that leads to DNA import and recombination. Plasmids and phages contain foreign genes themselves and may also have facilitated the uptake and recombination of neighboring genes. Prophage-encoded genes are foreign by definition. Pathogenicty islands, though not present in K-12, are examples of plasmid-derived or mobilized foreign gene clusters that confer virulence, as can prophages. Plasmid-mediated mating and phage-mediated transduction are well studied in E. coli K-12, which also has a competence system for DNA uptake (Palchevskiy, 2006). Inter-species plasmid-independent spontaneous zygogenesis (Z-mating) has also been reported (Gratia, 2007). Other mechanisms are conceivable, such as natural electroporation for direct DNA uptake or fusion with natural DNA-containing micelles.

Once foreign DNA has been incorporated, it is under the same selective pressure as host DNA and in time will mutate to resemble the host genome in terms of nucleotide composition, codon usage, and a genome signature of dinucleotide patterns in a process termed amelioration; it was originally estimated that 31 kb/million years of foreign DNA is taken up by E. coli K-12, indicating over 3 megabases of foreign DNA has been gained and lost since divergence from Salmonella 100 million years ago (Lawrence, 1997). Foreign gene loss due to lack of selective advantage presumably can enrich for pseudogenes and gene fusions.
Prediction Methods

Foreign genes have been predicted using genome-specific patterns that can used for identification purposes. This includes distinctive patterns of DNA sequence composition, codon usage (Medigue, 1991; Whittam, 1992, Ochman, 1996) and a genome signature based on dinucleotide frequencies (Campbell, 1999). Genes with signifcant pattern deviation as compared to the genome as a whole can include highly expressed genes and foreign genes.

Foreign genes are enriched among the set of genes in E. coli K-12 that do not have orthologs in Salmonella typhimurium LT2 and this has been used to predict the extent of HGT.

Another method to predict an HGT event is to examine a phylogenetic tree for each gene. Foreign genes will not cluster with homologs from closely related organisms but, presumably with organisms close to the source organism.

Several analyses combine multiple predictors and it generally improves results but inclusion of weak predictors can lessen performance, requiring careful evaluation and integration.
More or Less

Many methods have been used to predict the extent of HGT in E. coli and the fraction of foreign genes. The estimates range from a few hundred to over a thousand foreign genes in the E. coli K-12 genome. This range reflects uncertainty in the identification of foreign genes and there are differing views on its evolutionary significance as a consequence.

Foreign genes become ameliorated to the host patterns over time and become more difficult to predict. The process of amelioration limits detection to recent events. Genes inherited via HGT prior to speciation from Salmonella may be so ameliorated as to be indistiguishable from genes of ancient lineage. Thus even the highest predictions may be low estimates of the extent of the contribution of foreign genes to E. coli K-12 evolution and the acquisition of new function. HGT has been proposed to be a pervasive paradigm of genome evolution (Doolittle, 1999; Gogarten, 2002).

It is also possible that there are only a few hundred recent foreign genes and that HGT played a larger role very early in evolution than since divergence from Salmonella. Divergent paralogous evolution, the limits of alignment algorithms, the purging of non-adaptive genes, and alternative explanations for sequence pattern deviations suggest HGT may not be the major source of modern gene variation (Kurland, 2003).

It may be useful to consider that the often extensive IS element and prophage encoded foreign genes have inheritance and adaptive mechanisms that are distinct within HGT. Since IS elements amplify within a genome, inclusion of all transposase gene copies as separate products of LGT in the gene counts may be an overestimate.

The selfish operon model proposes that co-functional genes are clustered into operons and coinherited during HGT, leading to genome diversification and operon enrichment (Lawrence, 1996; 1999). The selfish operon model has been challenged on the grounds that presumably non-LGT essential genes cluster strongly (Pal, 2004). A lack of enrichment for foreign genes among recent operons also seems to contradict the selfish operon model (Price, 2005). However, a more recent analysis indicates that HGT may be the major mechanism for operon gain (Homma, 2007).
Non-PubMed Reviews

Ochman, H. & Lawrence, J.G.,
Phylogenetics and the amelioration of bacterial genomes,
in: Neidhardt F.C., Curtiss R., Ingraham J.L., Lin E.C.C., Low K.B., Magasanik B., Reznikoff W.S., Riley M., Schaechter M., Umbarger H.E. (Eds.),
Escherichia coli and Salmonella:: Cellular and Molecular Biology, ASM Press, Washington,D.C., 1996, pp. 2627-2637.

Whittam, T.S. & Ake, S.,
Genetic polymorphisms and recombination in natural populations of Escerichia coli,
In: Takahata, N., Clark, A.G. (Eds.)
Mechanisms of Molecular Evolution, Japan Scientific Society Press, Tokyo, 1992, pp 223-246.