Computational genomics (2016/2017)

Course code
Name of lecturer
Nicola Vitulo
Nicola Vitulo
Number of ECTS credits allocated
Academic sector
Language of instruction
II sem. dal Mar 1, 2017 al Jun 9, 2017.

Lesson timetable

Learning outcomes

The advent of the new sequencing technology (Next Generation Sequencing, NGS) had a great impact on the ability to study genome complexity at genomic, transcriptomic and epigenetic level and provided interesting opportunities for the development of bioinfomatic resources for data analyses and management.
The course will provide a general overview of the main computational methods based in NGS data that can be applied in genomic studies (mainly focused on the human genome) as for example , sequence alignment, genome sequencing, genome resequencing for the identification of variants, transcriptomic analysis for the identification of differentially expressed genes.

At the end of the course the student should be able to:
Know the main data file formats
Know the different algorithm used in genomic studies and their applications
Setting up a pipeline for data managing and analysis


1. Introduction to Next Generation Sequencing (NGS) data
• Biases and sequencing errors of Illumina technology
• FastQ file format
• Quality reads assessment (FastQC software)
• Reads preprocessing

2. Overview of bioinformatics methods for genome assembly
• Overlap-layout-consensus
• Debrujin graph
• Genome assembly assessment

3. Sequence alignment of NGS data
• Dynamic programming
• Heuristic methods
• SAM/BAM format

4. Resequencing and variant calling
• Identification of germline variants
• Identification of somatic variants
• Bioinformatics methods for the identification of structural variations (Insertion and Deletion, Translocation,Copy number variation)
• Variant Calling File (VCF) format and Genomic VCF format

5. Computational tools for prioritizing candidate genes

6. Analyse epigenetic data using bioinformatics tools

7. Transcriptomic analysis and RNA-seq
• RNA-seq genome alignment (TopHat, STAR)
• Transcripts reconstruction
• Gene quantification
• Data normalization
• Identification of differentially expressed genes
• Gene enrichment and gene set analysis

Bioinformatics laboratory
• Introduction to bash and linux operative system
• Usage of FastQC software for sequence quality assessment
• Setting up of a pre-processing sequence pipeline
• Sequence alignment with bowtie2
• BAM/SAM file manipulation

Assessment methods and criteria

Written with six open questions regarding the arguments of the course.