Genome Sequencing Bioinformatics Module 1 Session 1

Course manual site for the Genome Sequencing Bioinformatics course

Genome Sequencing Bioinformatics Module 1 Session 1

Practical assignment Module 1_Session 1

Module topic: Linux

Contact session title: Introduction to the command line

Trainers: Sonia Barasa and Lucier Olubayo

[!NOTE] You may copy and paste this text into a document, or use the file on Vula to complete your assignment.

Participant:

<write your name here>

Date:

<write today’s date here>

Introduction to the command line

Introduction

The aim of this assignment is to practice the command lines we covered during this first session of the module.

[Tools used in this session]

You’ll be using only your terminal.

[Please note]

Instructions

To begin, open a new terminal window . For each of the questions, type the command you used. Remember to always check the content of directories and files to understand content and formats before starting to use them and extract information.

  1. What is your current directory? What’s it’s absolute path?
  2. How can you check your current working directory?
  3. Move to the unix directory then complete the exercise below
  4. Create a directory Assignment under Session1 (remember Session1 is under /unix/practical/)
  5. Move to the directory you just created
  6. Copy the file PccAS_v3.gff3 located under ~/course_data/rna_seq_pathogen/data to Assignment. Write 2 different possible commands to do this.
  7. How can you check that the file has been properly copied?
  8. What’s the size of the file PccAS_v3.gff3? Type both the command you used to get the information and the size
  9. How many lines does PccAS_v3.gff3 contain?
  10. Display the first 15 lines of PccAS_v3.gff3
  11. Display all the lines containing details of all genes contained in PccAS_v3.gff3? Please note genes are a type of feature. More details about gff3 format here: https://learn.gencore.bio.nyu.edu/ngs-file-formats/gff3-format/
  12. Create a new file excluding all the gene features and name this file PccAS_v3_withoutgenes.gff3
  13. Rename the resulting file PccAS_v3_withoutgenes.gff3 to question_12_results.gff3
  14. How many CDS does PccAS_v3.gff3 contain? a. Write 2 separate commands to do this b. Combine 2 commands using | c. Write one single command

  15. Extract all information sequence ID: PccAS_01_v3 and copy it to a file PccAS_01_v3.gff3
  16. Write a command to display the names of files ending with .gff3 under the directory Assignment using wildcards
  17. Create a subdirectory Genomics under Assignment
  18. Download Plasmodium falciparum fasta file available at: http://plasmodb.org/common/downloads/release-9.0/Pfalciparum/fasta/PlasmoDB-9.0_Pfalciparum_BarcodeIsolates.fasta
  19. How many lines does the file contain?
  20. Save all the commands you type to a file named Assignment1_commands

Part 1: participant’s answer

<type your answers here. For most of the questions, you need to enter the command you used >

________________________________________________________________________________________________________________________________________________