Phylogenomics - A short introduction






Our knowledge about DNA and protein sequences is far more advanced than our  knowledge about the resulting biological and biochemical functions. This is due to the rapid progress in genome sequencing: 81 genomes are already completely sequenced and 437 genome projects are in progress (Mar. 2002). Therefore, researchers are frequently forced to assign biological functions on the basis of  sequence homology alone (inferring a similar biological function from similarities in the sequences). As the sequence databases grow larger, an increasing number of assignments are based on homology to sequences whose functions been assigned only tentatively, based on homology to still other sequences.

For example, the biological function of a sequence in organism A is known. The first inference is that if organism B has a similar sequence then this sequence codes for a similar biological function. But then, when organism C has a sequence that is more similar to B than to A, researchers sometimes assume, that the sequence of C still codes for the function of A. Such an assumption can be completely wrong, as shown below.

A good strategy to avoid such pitfalls is the use of phylogenetic trees. The construction of phylogenetic trees provides information about a protein of interest in terms of its relationship to other proteins and may allow to draw conclusions about its biological functions that would not otherwise be apparent.

The term "phylogenomics" was coined by Jonathan A. Eisen [1]; he postulated that evolutionary analysis of genes improves functional predictions for uncharacterized proteins. The basis of this idea is simple: Because the functions of genes or proteins change over time as a result of evolution, the reconstruction of the evolution of genes should faciliate functional predictions for proteins with unknown functions.

Even if the function of a protein is known, there are some distinct advantages when a phylogenomic analysis during annotation of genes is performed:
 

A final example shows the usefulness of phylogenetic methods in the comparative analysis of protein sequences: In the first draft report of the human genome, 113 cases of direct, horizontal gene transfer between bacteria and vertebrates were reported (4). However, by using phylogenetic analysis of 28 sequences, it was shown that this is not the case (5).
    (1) Eisen, J.A. 1998. Genome Res. 8: 163-167.
    (2) Ferretti, J.J., et al. 2001. PNAS 98: 4658-4653.
    (3) Mittenhuber, G. 2002. J. Mol. Microbiol. Biotechnol. 4: 77-91.
    (4) THE GENOME INTERNATIONAL SEQUENCING CONSORTIUM. 2001. Nature 409: 860-921.
    (5) Stanhope, M.J., et al. 2001. Nature 411: 940-944.

    Previous  Next