Phylogenomics
- A short introduction
Our knowledge about DNA and protein
sequences is far more advanced than our knowledge about the resulting
biological and biochemical functions. This is due to the rapid progress
in genome sequencing: 81 genomes are already completely sequenced and 437
genome projects are in progress (Mar. 2002). Therefore, researchers are
frequently forced to assign biological functions on the basis of
sequence homology alone (inferring a similar biological function from similarities
in the sequences). As the sequence databases grow larger, an increasing
number of assignments are based on homology to sequences whose functions
been assigned only tentatively, based on homology to still other sequences.
For example, the biological function
of a sequence in organism A is known. The first inference is that if organism
B has a similar sequence then this sequence codes for a similar biological
function. But then, when organism C has a sequence that is more similar
to B than to A, researchers sometimes assume, that the sequence of C still
codes for the function of A. Such an assumption can be completely wrong,
as shown below.
A good strategy to avoid such pitfalls
is the use of phylogenetic trees. The construction of phylogenetic trees
provides information about a protein of interest in terms of its relationship
to other proteins and may allow to draw conclusions about its biological
functions that would not otherwise be apparent.
The term "phylogenomics" was coined
by Jonathan A. Eisen
[1];
he postulated that evolutionary analysis of genes improves functional predictions
for uncharacterized proteins. The basis of this idea is simple: Because
the functions of genes or proteins change over time as a result of evolution,
the reconstruction of the evolution of genes should faciliate functional
predictions for proteins with unknown functions.
Even if the function of a protein is
known, there are some distinct advantages when a phylogenomic analysis
during annotation of genes is performed:
Phylogenomic methods can not only detect
errors associated with the annotation, but also false a interpretation
of genomic sequences. For example, during annotation of the
Streptococcus
pyogenes genome,it was stated that
S. pyogenes possesses a gene
similar to rpoE, a heat shock sigma factor of Escherichia coli(2).
However, a phylogenomic analysis showed that the rpoE
gene of S. pyogenes
encodes the delta subunit of RNA polymerase (3).
Phylogenetic methods allow the visualisation
and definition of large protein families as well as categorization of subfamilies.
Conclusions about the origin of orthologous
proteins (those with the same function in different organisms) and of paralogous
proteins (those with different functions in the same organism) can be obtained
by the application of phylogenetic methods. As a consequence, examples
of convergent evolution can be detected.
-
A gene tree, showing the relatedness between
a set of similar genes, can be compared to a species tree, which displays
the relatedness between a set of species.
Phylogenetic
trees can be constructed for certain regions only: This "masking" is used
to exclude regions without significance for the investigations. For example,
a great deletion in a gene of a certain species would affect the similarity
(in %) to homologous genes in other organisms. This deletion does not necessarily
influence the functionality of the gene product. A phylogenomic analysis
of a dataset could exclude the region of the deletion. Masking allows exclusion
of regions that do not exhibit a biological signal. But masking can be
also used to analyse and compare individual protein domains.
If the mutation rate between species differs
significantly, phylogenetic methods are most appropriate to establish evolutionary
relationships. For example, Mycoplasma species share a common origin
with gram positive bacteria. However, these organisms exhibit a higher
mutation rate relative to other bacteria. As a consequence, this relatedness
is however not evident, when the similarity (in %) between mycoplasmal
genes and their bacterial homologs is calculated.
A final
example shows the usefulness of phylogenetic methods in the comparative
analysis of protein sequences: In the first draft report of the human genome,
113 cases of direct, horizontal gene transfer between bacteria and vertebrates
were reported (4). However, by using phylogenetic analysis of 28 sequences,
it was shown that this is not the case (5).
(1) Eisen,
J.A. 1998. Genome Res. 8: 163-167.
(2) Ferretti,
J.J., et al. 2001. PNAS 98: 4658-4653.
(3) Mittenhuber,
G. 2002. J. Mol. Microbiol. Biotechnol. 4: 77-91.
(4) THE GENOME
INTERNATIONAL SEQUENCING CONSORTIUM. 2001. Nature 409: 860-921.
(5) Stanhope,
M.J.,
et al. 2001. Nature 411: 940-944.
Previous
Next