Genome Academy

Logo

The manual and programme for Wellcome Connecting Science's Genome Academy

View the Project on GitHub WCSCourses/genomeacademy

Welcome to Bioinformatics - Part 2

Introduction to Multiple Sequence Aligmnents and Phylogeny

This tutorial has been modified from a tutorial delivered at Scifest Africa by the Student Council of the South African Society for Bioinformatics - SASBi

You will explore genes for in Taste Receptors across different species!

Task 1: Retrieve sequences:

Obtain the protein sequences for TAS1R3, TAS1R2 and TAS1R1 for the organisms:

Procedure:

  1. Go to https://www.ensembl.org/biomart/
  2. Select Dataset on the left menu.
  3. Select Ensembl Genes 110 under the -CHOOSE DATABASE- dropdown menu.
  4. Select Human Genes (GRCh38.p14) under the -CHOOSE DATASET- dropdown menu.
  5. On the left menu, select Filters to filter out the genes you are interested in.
  6. Expand GENES and tick ID list limit [Max 500 advised].
  7. Enter the list of genes:
    TAS1R1
    TAS1R2
    TAS1R3
    

    in the textbox provided and select WiKi-Gene Name(s) on the dropdown menu above the textbox.

  8. On the left menu, select Attributes to get the features of your gene set.
  9. Tick Sequences, then expand the Header Information section.
  10. Untick all the ticked boxes.
  11. Under the Gene Information section tick:
    • Gene Stable ID
    • Gene Name
  12. Under the
  13. Click on Results towards the top of the page - you will get a list of protein sequences for the gene list you provided.
  14. Export results to a file by selecting File then FASTA from the dropdown menus in the Export all results to section.
  15. Click on Go
  16. A “mart_export.txt” file will download - you can rename this to the species you started with (human)

Repeat steps 4 - 14 (changing the species name under the -CHOOSE DATASET- dropdown menu) for each species to get the required protein sequences for all the species listed below.

Fugu (Takifugu) Opossum (Didelphimorphia) Dog (Canis lupus familiaris) Human (Homo sapiens) Chicken (Gallus gallus domesticus) Japanese Medaka (Oryzias latipes) Pufferfish (Tetraodontidae) - use (ENSTNIG00000011998, ENSTNIG00000014794, under gene stable ID)

Copy and paste all the sequences into a single file and call it all_sequences.fasta.

if you keep the suffix of mart_export, you can use the command:

cat *mart_export.txt > all_sequences.fasta

Task 2 Data Cleaning Then ensembl gene indentifier can be used to translate the organism the gene came from:

ENSTRU Takifugu rubripes (Fugu)

ENSMOD Monodelphis domestica (Opossum)

ENSCAF Canis lupus familiaris (Dog)

ENSGAL Gallus gallus (Chicken)

ENSORL Oryzias latipes (Medaka)

ENSTNI Tetraodon nigroviridis (Tetraodon)

Do last: ENSG0 Homo sapiens (Human)

Open your all sequences file with gedit

Hit control h to open up the find replace menu, or control f, then select replace.

In the search, place the “ENSTRU” symbol, and in the replace option, the species name - Takifugu rubripes (Fugu) for each symbol and species type.

Sequence Alignment

Task 3: Perform a multiple sequence alignment using the sequences you retrieved.

Procedure:

  1. Go to https://www.ebi.ac.uk/Tools/msa/clustalo/
  2. Upload the file with all the sequences you have downloaded (all_sequences.fasta) or copy all the contents of the file and paste it into the box. To Upload: Click on Choose File, navigate to the file location on the computer, then click Upload.
  3. Wait for the job to complete
  4. Look at the alignment of all the sequences - how does this compare to a pairwise analysis of two sequences
  5. Look at the tree of these sequences, how do they compare?

References