This is a repository for the SARS-CoV-2 Bioinformatics for Beginners Course
*May 2023 update note - Access to file download links may change in the first two weeks of May 2023 which would impact the input data for example commands. Please expect errors over this time period. *
SARS-CoV-2 variant lineage identification is key to pandemic tracking and enabling public health response. This course is an introduction to bioinformatics by applying skills used in SARS-CoV-2 genomic data analysis. This will be a distributed classrooms style course run across Africa; Latin America and the Caribbean; and Asia. This model was developed by H3ABioNet, see this publication for more info.
SARS-CoV-2 variant lineage identification is key to pandemic tracking and enabling public health response. This course is an introduction to bioinformatics by applying skills used in SARS-CoV-2 genomic data analysis. Bioinformatics skills are fundamental in management and assessment of viral sequences. This course will introduce you to processing data programmatically, the data formats used in viral sequencing, how to determine the variant lineage (Delta, Omicron etc.), and how to share data so that others around the world can benefit. These skills are the building blocks for scaling up analysis to pandemic response levels.
This course is making use of Google Colab - https://colab.research.google.com/, a free to use service.
Access to Colab is via a Google Account, which can be made for free.
Contact sessions will run twice a week, lasting for 4 hours per session. It will run between the 31st of October – 2nd of December 2022. There will be sessions in two time zones. Note, each session for Oceania and Asia; and Latin America and Africa; will run in the same block of time, but with regional time differences.
The course is aimed at postgraduate scientists, postdoctoral scientists, junior faculty members or clinicians/healthcare professionals based in the regions across Africa, Asia, and Latin America & the Caribbean. It does not require bioinformatics skills as a prerequisite.
The programme will cover the following core topics:
Introduction Week
Introduction Notebook - Begin here
Video Playlist - Introduction Week
Module 1: Introduction to Notebooks & Unix command line
Module 1 Video Playlist (Parts 1 and 2)
Module 1 Part 1 and Part 2 Notebook Instructions
Bonus Videos for NGS technologies
Module 2: Data QC and Consensus sequences
Module 2 Video Playlist (Parts 1 and 2)
Module 2 Data QC and Consensus Notebook Instructions Parts 1,2,3
Module 3: Variant Lineage Identification
Module 3 Video - Variant Lineage Identification
Module 3 Variant Lineage Identification Notebook Instructions
Module 3 Part 2 Day Plan - Exercise
Module 4: Data sharing and interpretation
Module 4 Video Playlist
(Please watch Sections 1-2 for Day 1, and Sections 3-7 for Day 2)
Module 4 Data Sharing and Interpretation Notebook Instructions
Module 4 Part 2 Day Plan (Exercises for Day 2 are in the videos for Sections 5-7)
WCS LMS
COG-Train Online courses
Your digital mentor podcast
WCS courses and conferences
Any reuse of the course materials, data or code is encouraged with due acknowledgement.
This work is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).