Building A Multiple Sequence Alignment
Building A Multiple Sequence Alignment
multiple alignments are useful for predicting protein structures (see Chapter 11),
central for predicting the function of proteins, and indispensable for phylogenetic
analysis
Important amino acids (or nucleotides) are not allowed to mutate. For instance,
active sites of enzymes are much conserved. _ Less-important residues change
more easily — sometimes randomly — and sometimes in order to adapt a function.
The main reason for using BLAST is to identify database sequences that are so
similar to the query that they probably are homologous. We commonly refer to such
sequences as hits or matches.
4. Keep the default option — Complete Database — in the pull-down menu. This
amounts to simultaneously searching Swiss-Prot + TrEMBL +TrEMBL_NEW.
If the search reports too many sequences that are very similar to your
sequence of interest, you can decrease the number of identical hits by
selecting a smaller database from the Database pulldown menu — Swiss-
Prot, for example.
DIAPO 7
5. Scroll down to the Options section and set the Number of Best Scoring Sequences
to Show option to 1000. Doing this makes it more likely that you’ll find appropriate
sequences in the BLAST result for your multiple alignment.
6. In the same Options section, set the Number of Best Alignments to Show option
to 1000. This choice makes it possible to judge the quality of the alignment before
selecting a sequence.
DIAPO 8
7. Click the Run BLAST button. After a brief pause, a Results page appears.
8. Scroll down the page to select the sequences you want. You select a sequence
by checking the box to its left.
This is the most delicate part of the process. There is no absolute rule to selecting
your sequences, but you can use the following guidelines:
• Select the top sequence. This top sequence is usually your sequence of interest.
If your sequence of interest is not at the top, you may have to add it to the list later
on.
• For a first analysis, you want to select ten sequences or fewer. Ideally, the ten
sequences to select should be evenly spaced between the very good E-values (10-
40) and less-good E-values (10-5).
• Before selecting a sequence, check to make sure it’s similar to
the query sequence — along its entire length. The alignment section is at the
bottom of the BLAST output. You must be especially careful with hits that have E-
values higher tan 10-10. They are equally likely to correspond to a good partial
match, a global overall match, or a match between a protein fragment and your
sequence. Inspecting the alignment is the only way to distinguish between these
situations.
DIAPO 9. Choose the method you want to use to export your sequences from
the Send Selected Sequences pull-down menu,
• FASTA: Generates a file that contains your sequences in FASTA format. You can
save this file with the File➪Save As option of your browser. When you need to, you
can reopen this file with your browser, in order to cut and paste its content into
another server
DIAPO 14
5. Examine your results.
Tcoffee returns a table that contains hyperlinks to your results, as
shown in Figure 9-8.
The first row of the table is dedicated to multiple sequence alignments
and includes
• msf_aln, clustalw_aln, fasta_aln: Text files containing your alignment in various
formats. Keep these files if you want to use your alignment as input for another
program.
• score_html, score_ascii: A colorized alignment where every residue appears on
a background that indicates the quality of this alignment. Red indicates high-quality
segments; blue indicates regions of your alignment that you have no reason to trust.
The score_ascii is a text version of the .html file. These two last files are meant only
for display; you can’t use them as an input for other sequence-analysis programs.
The second row is dedicated to phylogenetic trees:
dnd: The guide tree or dendrogram generated by Tcoffee in Newick
format (see Chapter 13). You should not use it in place of the true
phylogenetic tree
• phylogenetic_tree: The true phylogenetic tree in Newick format,
generated from the Tcoffee multiple alignment by using the Neighbor
Joining method (see Chapter 13). This is not a guide tree but a real
phylogenetic tree.
• pdf: A pdf picture of the phylogenetic tree that corresponds to the
phylogenetic_tree file.
DIAPO 16
Sabemos que las estructuras contienen bucles de superficie que evolucionan
rápidamente. Los bucles son porciones más suaves de la proteína que conectan
sus porciones más rígidas. Las estructuras de la proteína también contienen
regiones centrales que actúan como paredes de soporte de la proteína. Estas
paredes de apoyo evolucionan menos rápidamente que los bucles de la superficie
En tu alineación múltiple, puedes esperar encontrar bonitos bloques sin espacios
que corresponden a las regiones centrales - y las regiones ricas en brechas que
corresponden a los bucles.
Another criterion for a useful multiple alignment is knowing the type of amino acids
you can expect to see conserved