Bioinformatics for Biologists: Analysing and Interpreting Genomics Datasets

9 October–10 December 2023

FutureLearn platform, online

Handle and analyse sequence datasets through hands-on exercises

Overview

Duration: 3 weeks, 6 hours per week
Free Certificate of Achievement available on satisfactory completion

Start Date: The course is run ‘live’ for 3 weeks from 9 October 2023.  After this period, the course will remain open for completion for a further 6 weeks, but without facilitation by the educators.

Why join this course? 

The course will cover Next Generation Sequencing and its significance in genomics, the installation of widely-used bioinformatics tools, the use of different file formats, and Linux commands for sequence quality control, mapping, and variant calling.

You will also learn how to perform analysis using the workflow management system Nextflow, as applied to viralrecon, an existing bioinformatics pipeline. You will learn how to use RStudio for data visualization of variants obtained from genomic data.

Who is this course for? 

This course caters to a specific audience, including genomics researchers seeking to analyse sequencing data, as well as individuals interested in data science. It is ideal for molecular biologists, professionals in genomic medicine, bioinformatics practitioners, and those pursuing career in data science

Learners are expected to have some knowledge of  computing or bioinformatics  and be familiar with Linux and R. We recommend taking the following course: Bioinformatics for Biologists: An Introduction to Linux, Bash Scripting, and R before starting this one.

 

Programme and start dates

Course start dates

This course will start on 9 October 2023

What topics will you cover?

  • In Week 1 of the course, we will introduce you to the concept of environment variables and demonstrate how to install key bioinformatics tools. We will talk about next-generation sequencing, its significance and different file formats used. We will describe different steps in a next generation pipeline ranging from sequence quality control, mapping and variant calling. We will show you how to run different commands on the UNIX command line for sequence quality control, mapping, and variant calling.
  • In Week 2 we will introduce you to the concepts of workflows and workflow management systems such as Nextflow. We will demonstrate how pipelines developed with Nextflow can perform the same types of tasks we outlined in Week 1 without you having to run each of the individual commands yourselves. You will be shown how to install Nextflow and use the existing viralrecon pipeline from the nf-core project. You will set up a samplesheet to be used as input for viralrecon and then run the pipeline yourselves. Finally, we will examine the outputs from viralrecon and explain how these could be used to decide whether the data we analysed was of sufficient quality for downstream analyses.
  • In Week 3, following a good understanding of the analysis procedures you acquired in Week 1 and Week 2, we proceed with data visualization of variants obtained from genomic data. We will use R and RStudio, tools that make it possible to use either built-in functions of R, or specific functions from packages for several purposes that greatly help when analyzing variants. For this, you should now be knowledgeable about how to use R to understand a data structure, how to provide statistical information on it, how to subset it, or how to visualize it. We cover examples of data types to represent, why we would do that, and how to quickly interpret it.

What will you achieve? 

By the end of the course, you‘ll be able to…

  • Use software managers to install and run reproducible bioinformatics tool
  • Handle and analyse sequence datasets through hands-on exercises
  • Analyse quality control metrics for sequencing data
  • Modify existing workflows to suit specific task requirements and optimise analysis processes
  • Interrogate and interpret the results obtained from running bioinformatics pipelines
  • Perform downstream analyses of pipeline outputs using R, enabling data visualisation and further exploration

Educators

Lead Educators

Fatma Guerfali
Researcher at Institut Pasteur in Tunis and Trainer in Bioinformatics. Passionate about data analysis and visualization for pathogens related Genomics and Transcriptomics

Andries van Tonder
A researcher at the University of Cambridge, UK, with extensive experience analysing large bacterial genome datasets. Specific research areas include using WGS to study transmission in different bacterial species.

Ruth Nanjala
Based at the University of Oxford, UK. A bioinformatician who’s previously led the Bioinformatics Mentorship and Incubation program at ICIPE, Kenya, which aims to build Bioinformatics capacity in Africa, and Carpentries instructor.

Education Developer
Dusanka Nikolic – Wellcome Connecting Science, UK

Contributors
Martin Aslett – Wellcome Connecting Science, UK
Jorge Batista da Rocha – Wellcome Connecting Science, UK
Phelelani Mpangase – Sydney Brenner Institute for Molecular Bioscience, South Africa

 

What's included

The Wellcome Connecting Science Learning and Training team are offering everyone who joins this course a free digital upgrade, so that you can experience the full benefits of studying online for free. This means that you get:

  • Unlimited access to this course
  • Includes any articles, videos, peer reviews and quizzes
  • Tests to validate your learning
  • A PDF Certificate of Achievement to prove your success when you’re eligible

Watch the trailer for Analysing and Interpreting Genomics Datasets

Back to top Back to top