Μεταπτυχιακό πρόγραμμα Βιοπληροφορικής

Εκτενής περιγραφή περιεχομένου μαθημάτων

 

Semester

Course

Coordinators

Enrolment

Fall 2016

BIO101: Introduction to Mathematics

Petrantonakis

Life Sciences students

Fall 2016

BIO102: Introduction to Programming

Pavlidis

Life Sciences students

Fall 2016

COMP101: Principles of Cellular and Molecular Biology

Kafetzopoulos

Physical Sciences, Computer Sciences and Engineers

Fall 2016

COMP102: Introduction to Genetics and Evolutionary Biology

Iliopoulos

Physical Sciences, Computer Sciences and Engineers

Spring 2017

BC201: Advanced Statistics

Tsagris

All students

Spring 2017

BC202: Big Data Biomedical Databases

Topalis

All students

Spring 2017

BC203: Introduction to R for Bioinformatics

Lagani/Pavlidis

All students

Spring 2017

BC204: Methods in Bioinformatics

Poirazi/Tsamardinos

All students

Spring 2017

BC205: Algorithms in Bioinformatics

Nikolaou/ Tsamardinos

All students

 

 

 

Course Name: Introduction to Mathematics

Course Code: BIO101

Semester: Fall

Coordinator: Panagiotis Petrantonakis

Instructors: Panagiotis Petrantonakis, Georgios Potamias

 

Summary

This course teaches two fundamental math topics: linear algebra and probability theory. The course is divided equally into two parts. The first part is for linear algebra, including systems of equations, vector spaces, determinants, eigenvalues, linear transformation and their applications. The second part covers topics in probability theory related to basic notions of probability, conditional probability and inference, random variables and respective distribution functions, expectations (mean and variance), covariance and correlation, the entropy of, and between random variables is presented and its application to Markov chains.

 

Target audience

The course addresses to students of medicine, biology, computer science, chemistry and mathematics, who wish to understand basic and advanced concepts of linear algebra and probability theory. 

 

Basic Prerequisites

Basic knowledge of mathematics. Knowledge of programming, in R/Matlab would be beneficial.

 

Course Goals

·         Understanding and solving linear algebra and related applications.

·         Understanding and interpreting of fundamental concepts of probability theory.

 

Assessment

Midterm exam (written) (30%).

Final exam (written) (70%).

 

Suggested Reading

Linear Algebra:

 

Linear Algebra and its Applications, David Lay

 

Probability Theory:

(1) Christina Goldschmidt. Prelims Probability. https://www0.maths.ox.ac.uk/system/files/coursematerial/2015/2635/43/ProbNotes2015.pdf

https://www.dropbox.com/s/tn2sn2szw2dam2r/Prelims_Probability.pdf?dl=0

(2) Joseph C Watkins. An Introduction to the Science of Statistics: From Theory to Implementation.

http://math.arizona.edu/~hzhang/math574m/statbook.pdf

https://www.dropbox.com/s/p9fmav7po4pgik3/An_Introduction_to_the_Theory_of_Statistics.pdf?dl=0

(3) Mai Vu. Entropy and mutual information.

http://www.info612.ece.mcgill.ca/lecture_02.pdf

https://www.dropbox.com/s/3i3srcpatdh88cs/Entropy.pdf?dl=0

 

Tentative Program

Linear Algebra

Weeks

Syllabus

Chapters

1-2

Systems of Linear Equations

1

3-4

Matrix Algebra

2

5-6

Determinants

3

7-8

Vector Spaces

4

9-10

Eigen values and Eigenvectors

5

11-12

Orthogonality and Least Squares

6

13

Revision.

 

 

Probability Theory

Weeks

Syllabus

Chapters (1)

Chapters (2)

1-2

Introduction: Motivating examples – data & probabilities; Sample space, events, outcomes, Venn diagrams, Inclusion-Exclusion principle; Empirical and axiomatic definition of probability

1.1 – 1.3

4.1 , 4.2.1 , 4.2.2

4.3.2

[Principles of Inheritance]

 

5.1 – 5.6

3-4

Counting: Arrangements: permutations, combinations, binomial coefficients/pascal’s triangle; Axioms/Laws of probability; set-theory and probabilities

5-6

Conditional probability. Definition, multiplication principle, law of total probability, Bayes formula, Independence of events

1.4 – 1.7

6.1 – 6.5

7-8

Recap and midterm exam.

 

 

9-10

Discrete random variables & Distribution functions (discrete): Definitions, properties, mass function, classical distributions (Bernoulli, Binomial, geometric, Poisson), mass function, classical distributions, Joint distributions, Conditional distribution.

Expected values: mean, variance, independence, covariance/correlation

pp. 16 - 17

2.1 – 2.4

7.1 – 7.4

7.6 - 7.7/7.7.1

9.1

8.1 , 8.7

11-12

Entropy. Definition, Joint, Conditional, Relative Entropy (Kullback-Leibler distance), Mutual Information, Data processing inequality (application in Markov chains)

Notes

(3)

13

Revision

 

 

Course Name: Introduction to Programming

Course Code: BIO102

Semester: Fall

Coordinator: Pavlos Pavlidis

Instructors: Pavlos Pavlidis, Anastasis Oulas, Evaggelos Pafilis, Alexandros Kanterakis

 

Part A: Python

1.    Introduction: Motivating algorithmic examples in bioinformatics; data representation in computers; what is software; operating systems; programming languages (compiled, interpreted)

2.    Data types: Variables, assignments; immutable variables, e.g. strings; different numerical types; operators (mathematical, in strings, e.g. concatenation); comments in programs; error messages and understanding them;

3.    Conditions and boolean logic: logical operators; Ranges; control statements: if-else, loops for and while;

4.    Strings and text files, basic Input Output: manipulating files; reading/writing to text files; generating a formatted file

5.    Data Structures I: Lists, tuples, basic list operators, replacing, inserting, removing elements; searching and sorting lists;

6.    Data Structures II: dictionaries: adding and removing keys, accessing and replacing values; traversing dictionaries, specific cases that dictionaries are superior to lists and the vice versa

7.    Functions and libraries: design with functions, hiding redundancy; arguments and return values; library packages, searching for packages, recursive functions

8.    Dynamic programming I:  theoretical background

9.    Dynamical  programming II: applications in bioinformatics

10.                        Drawing using python: simple 2d drawing: colors, shapes; the 'image' module

11.                        Object Oriented programming I: classes, objects, methods

12.                        Object Oriented programming II: inheritance, polymorphisms

13.                        Object Oriented programming III: multithreading

14.                        String algorithms I: regular expressions: searching for motifs

15.                        String algorithms II:hamming distance, edit distance, trie search

 

16.                        Specific algorithms in bioinformatics: sorting, searching, complexity analysis, smith-waterman algorithm, clustering

 

PartB: Introduction to Unix and Linux

1.    Introduction to the UNIX Operating System

2.    The Command line and the directory structure, file manipulation

3.    Ubuntu and Debian Linux Distributions: Installation and virtual boxes

4.    The vi editor, emacs editor

5.    Unix Communications

6.    Utilities and Filters

7.    I/O redirection

8.    Shells

9.    Intro to shell Programming

10.                        Variables in shell Programming

11.                        Conditionals in shell Programming

12.                        Loops in shell Programming

13.                        Pipes

14.                        Stream Editing - sed

15.                        Grep, count, uniq, sort

16.                        gawk, data file merging, filtering and statistics calculation

17.                        program compiling and interpreters (python, perl, R)

18.                        system Administration Intro

19.Text processing : LaTeX - Comparing to nroff /troff




Course Name: Principles of Cellular and Molecular Biology

Course Code: COMP102

Semester: Fall

Coordinator: Dimitris Kafetzopoulos

Instructor(s): Dimitris Kafetzopoulos

 

Summary

The course covers all fundamental concepts, topics and techniques of Molecular and Cellular Biology from the chemical basis of life, to the central dogma of Molecular Biology, and the cellular organization of biological systems, presents the functional specialization of cell and tissues, and concludes with the molecular events underlying complex traits and diseases. Throughout this course the advances of analytical technologies and challenges of quantitative and computational approaches in Biomedical Sciences and research are highlighted.

Basic Prerequisites

None

Course Goals

The course is designed specifically for graduate students who didn’t have ANY biological education in their previous studies and in particular for Computer and Physical scientists, Mathematicians and Statisticians. It is a fast and intense means to cover the basic biological concepts and technologies and provide with the adequate background knowledge of Molecular and Cellular Biology to understand and address the cutting-edge challenges and questions of Biomedical and Life Science research in general.

Assessment

Exercise(s): 50

Final Exam: 50

Suggested Textbooks

QuickStart Molecular Biology: An Introductory Course for Mathematicians, Physicists, and Engineers, by Philip Benfey

Molecular Biology of the Cell, by Bruce Alberts & Alexander Johnson

Campbell Biology (International Edition), by Neil Campbell & Jane Reece.

Lehninger Principles of Biochemistry, by David Nelson & Michael Cox

Lewin's Genes XI, by Jocelyn Krebs et al.

Basic Biotechnology, by Colin Ratledge & Bjorn Kristiansen

 

 

Tentative Program

 

Week

Module

1

The chemical basis of life: Atoms, Molecules, Energy, Reactions, Interactions, Amino acids, Nucleotides, Lipids, Sugars, Proteins, Nucleic acids, Membranes, Polysaccharides and their Physicochemical parameters.

2

The storage and maintenance of genetic information: The central dogma of Molecular Biology, DNA structure, properties and function. DNA replication and repair. Genomics technologies.

3

The process of genetic information: Genetic code, Gene expression regulation, Transcription, RNA structure and function, Transcriptomics technologies.

4

The use of genetic information: Translation, Protein structures and functional diversity, Proteomics technologies.

5

Cellular organization of life. Membranes, Sub-cellular organelles, Molecular trafficking, Cell communication, Cell specialization. Imaging technologies.

6

The cell factory: Metabolism and Metabolomics technologies.

7

Regulation of genetic information: Cell Signaling, Gene expression regulation, Gene regulatory networks, Spatiotemporal control of gene expression.

8

Cell growth and division

9

Understanding complex traits. Selected examples of healthy and disease phenotypes.

10

Specialized tissues: Neuronal and Immune system

11

Experimental and computational approaches in Biology: Model animal systems and population approaches

12

Drug discovery processes, Biotechnological applications

13

Exercise: Paper presentation  and/or Seat-in Exam


Course Name: Introduction to Genetics and Evolutionary Biology

Course Code: COMP-102

Semester: Fall

Coordinator: Yiannis Iliopoulos

Instructors: Yiannis Iliopoulos, Pantelis Topalis, Tereza Manousaki

 

Summary

This course discusses the principles of genetics and evolution with application to the study of biological function at the level of molecules, cells, and multicellular organisms, including humans. The topics include: structure and function of genes, chromosomes and genomes, biological variation resulting from recombination, mutation, and selection, population genetics, use of genetic methods to analyze protein function, gene regulation and inherited disease. Students are required to attend a exercise/problem solving session (joint class with the sophomores of the department of Biology).

Basic Prerequisites

None

 

Course Goals

To provide basic knowledge in genetics and evolution.

Being able to understand the biological function at the level of molecules, cells and multicellular organisms.

 

Assessment

Exercises: 10%

Final Exam : 90%

 

Tentative Program

Week

Module

1-2

Introduction: Mendelian analysis and its extensions

Discussed Topics: Mendelian inheritance (one locus, two loci), dominance, sex-linked inheritance, epistasis, genealogical trees

3-4

Chromosomal theory of inheritance – Mutations. Molecular Genetic Tools Understanding the Genetic Basis of Cancer

Discussed Topics: Linked genes, genetic mapping, point mutations, chromosomal rearrangements, chromosomal aberrations

5-6

Molecular basis of genetic diseases – Regulation of gene expression. DNA Fingerprinting

Discussed Topics: Loss of function mutations, Gain of function mutations, Prokaryotic gene regulation, eukaryotic gene regulation

7-8

Human genome – Human cytogenetics  

Discussed Topics: Linkage analysis, WGS analysis, Exome analysis

9-10

Introduction to the theory of evolution – Origin of life

Discussed Topics: Scientific theories

11-12

Population genetics

Discussed Topics: Genetic structure of natural populations, Hardy-Weinberg equilibrium, Genetic drift, Founder effect, Fitness

13

Molecular evolution and speciation

Discussed Topics: Hierarchical organization of life.


Course name: Advanced Statistics

Course Code: BC201

Semester: Spring

Coordinators: Michail Tsagris and Ioannis Tsamardinos

Instructor: Michail Tsagris

 

Summary

The course covers basic areas of statistics, such as graphical representations of data, random variables, types of sampling and design, estimation via maximum likelihood. Hypothesis testing and confidence intervals (for means and proportions), type I and II errors and p-values. Hypothesis testing via computational techniques (bootstrap and permutation). On a second phase, correlations for continuous variables (Pearson and Spearman coefficients), association of categorical variables (G2 test of independence) and linear regression. Finally, false discovery methods in multiple hypothesis testing will be mentioned. Demonstration and exercises using the R statistical package will be considered as well.

 

Target audience

The course addresses to students of medicine, biology, computer science, chemistry and mathematics, who have a minor knowledge of statistics and wish to understand in more depth certain statistical terms.

 

Prerequisites 

A basic knowledge of mathematics and statistics. Knowledge of programming, in R would be beneficial.

 

Course learning objectives

Understanding of terms like sampling techniques, types of studies, hypothesis testing, type I and II error, confidence intervals, bootstrap, permutation, relationship between two variables.

Ability of interpreting some basic statistical results.

Acquiring the foundations for latter statistical analyses.

Ability to implement hypothesis testing in R.

 

Assessment

Exercises  (10%).

Midterm exam (written)  (20%).

Final exam (written)  (70%).

Suggested textbook

Biostatistics with R, An Introduction to Statistics Through Biological Data

Babak Shahbaba, 2012, Springer.

 

Weekly program

Week

Material

Chapters

1-2

Introduction: Basic principles of probability and statistics, graphical representation of data, random variables, sampling techniques, types of studies.

1, 2, 4

3-4

Estimation: Maximum likelihood estimation of parameters (mean, median, proportion) and confidence intervals (for mean and proportion).

6

5-6

Hypothesis testing: one and two means with and without computational techniques, relationship with confidence intervals, explanation of concepts like type I and II errors and p-values + demonstration with R.

7, 11

7

Recap and midterm exam.

 

8-10

Associations: Relationship between pairs of continuous and discrete variables (Pearson and Spearman correlation coefficients, G2 test of independence), linear regression + demonstration with R.

3, 8, 10

11

Extensive demonstration of the covered material in R.

 

12

False discovery rate methods: Bonferroni,  Benjamini–Hochberg and Storey-Tibshirani corrections,  + demonstration with R.

 

13

Revision.

 

 

 


Course Name: Big data biomedical databases

Course Code: BC202

Semester: Spring

Coordinator: Pantelis Topalis

Instructor: Pantelis Topalis

 

Summary

This course aims to describe some of the most popular biomedical databases covering a wide range of different datatypes. It will present the different ways the same information is often stored in different databases and the problems that are caused by those. Proper use of controlled vocabularies and ontologies can provide a solution and can promote data integration.

 

Basic Prerequisites

A preparatory course on molecular biology. Basic knowledge of a scripting language it would be useful.

 

Course Goals

Becoming familiarized with the various kinds of biological datatypes and formats..

Being able to extract relevant information for a project from different sources.

Learn how to massively access the data stored in databases via their API.

 

Assessment

Exercises: 50%

Final Exam : 50%

 

Tentative Program

Week

Module

1-2

Introduction: Biological datatypes and their formats

Discussed Topics: Biological sequences and their annotations (fasta,fastq,fast5,gtf,gff), Variation calls (vcf,gvf), Sequence alignments(sam,bam), Other formats (bed,wig,bedgraph,bigwig)

3-4

Data integration

Discussed Topics: Metadata and how they can be organized. Controlled vocabularies and ontologies. Upper level ontologies, reference ontologies and application ontologies. Basic Formal Ontology. OBOFoundry. NCBO bioportal. Ontology development.

5-6

Non species specific databases

Discussed Topics: NCBI and Entrez query system, ENA, Pfam and motif databases, SwissProt,UniProt, String, PDB

7-8

Species specific databases

Discussed Topics: Human databases, MGI, Flybase, Yeast, VectorBase

9-10

Pathway databases

Discussed Topics: KEGG, Reactome, Pathway Commons, Pathway Interaction Database, MetaCyc, Pantherdb

11-12

Genome viewers

Discussed Topics: Ensembl, NCBI, UCSC

13

Data Repositories

Discussed Topics: SRA, Human Atlas, ArrayExpress


Course Name: Introduction to R for Bioinformatics

Course Code: BC203

Semester: Fall

Coordinator: Vincenzo Lagani, Pavlos Pavlidis

Instructors: Vincenzo Lagani, Pavlos Pavlidis

 

Summary (~150-250 words)

The course will introduce the R statistical software as a tool for performing data analysis tasks in the bioinformatics field. At the beginning, the basics of the R language will be explained, along with the main concepts related to the R software and its modular architecture. Most advanced concepts will then be introduced, as for example data structure in R, functional programming, graphical visualization and the creation of R packages. The second part of the course will focus on the Bioconductor initiative and its repository of R packages for bioinformatics. Particularly, functionalities for analyzing RNA-seq and microarray data will be explored in detail.

 

Basic Prerequisites

Elementary knowledge of programming and statistics.

 

Course Goals

At the end of the course, the students are supposed to:

.know the capabilities of the R software and its possible uses;

.master the R language and being able to use it for writing scripts and simple data analysis pipelines

.know the scope and characteristics of the Bioconductor initiative

.be able to identify and use the most suitable Bioconductor packages for a given data analysis task.

 

Assessment

Assignments (30%)

Midterm (30%)

Final project with oral presentation (40%)

 

Suggested Reading

Applied Statistics for Bioinformatics using R”, freely available at:

https://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf

 

Tentative Program

Week

Module

1-2

The R statistical environment

.The R software and its characteristics

.The basics of the R language: syntax, control flow statements, data structures

.Matrices manipulations

.Functional programming

3-4

Advanced R part 1

  • Object oriented programming in R
  • Creation of R packages

5-6

Advanced R part 2

  • Visualization in R: ggplot2
  • Parallel computing in R

7

Examination week

  • Short recapitulation of the previous lessons and midterm

8-10

Bioconductor, part 1

.Bioconductor overview

.Microarray data in R

11-12

Bioconductor, part 2

.RNA-seq data in R

.Other applications

13

Examination week

Recapitulation of the previous lessons and assignment of the final project


Course Name: Methods in Bioinformatics

Course Code: BC204

Semester: Spring

Coordinators: Panayiota Poirazi, Ioannis Tsamardinos

Instructors: Panayiota Poirazi, Ioannis Tsamardinos

 

Summary

This course aims to describe some of the most prominent and most widely used methods for the analysis of biological data, with emphasis on different large-scale data sets (e.g. microarray gene expression data, RNAseq data, metagenomics, biological networks etc). The course focuses on different methodologies for dimensionality reduction, feature selection, model selection, clustering, classification and network inference. The main goal of the course is not to describe in detail the most sophisticated implementations but to present the features and rational behind each method and its appropriatness for solving specific problems. Classes will include theoretical lectures as well as practical exercises, where students will be required to utilize existing software tools containing the presented methods to solve selected problems.

 

Basic Prerequisites

A preparatory course on molecular biology. Basic knowledge of math and statistics.

 

Course Goals

.Becoming familiarized with the most common problems in the analysis of different types of large-scale biological data.

.Being able to categorize the most popular methods (dimensionality reduction, regression, clustering, classification, supervised-unsupervised, probabilistic, deterministic, sequential learning etc).

.Achieving a high level of competence in the use of several data analysis methods.

Assessment

Exercises (algorithm implementations and/or quiz question after each module): 50%

Final Exam : 50%

 

Suggested Reading

Introduction to Bioinformatics Algorithms. Pevzner and Jones

 

Tentative Program

Week

Module

1-2

Introduction: Types of data and analysis problems

Discussed Topics: microarray gene expression data, RNAseq data, protein-protein interaction data, DNA-protein interaction data, mass-spectrometry data

Biological Problem: Identifying differentially expressed genes in microarray data

3-4

Dimesionality Reduction: SVD, PCA

Discussed Topics: The need and applications of DR methods in biological data

Biological Problem: Identifying categories of disease in gene expression data

5-6

Clustering methods: Hierarchical clustering, k-means

Discussed Topics: Presentation of the algorithms and initialization of coditions. Discussions regarding different distance metrics that can be used, ways to optimize cluster size etc

Biological Problem: Identifying categories of patients using expression data in control and cancer patient groups

7-8

Classification methods: Kernel methods, SVMs

Discussed Topics: Description of methods and issues regarding feature extraction, model selection, avoiding overfitting

Biological Problem: Find miRNA genes in the human genome using various existing classification tools

9-10

HMM: Hidden Markov Models

11-12

Bayesian Networks

13

Revision

 

Course Name: Algorithms in Bioinformatics

Course Code: BC205

Semester: Spring

Coordinators: Christoforos Nikolaou, Ioannis Tsamardinos

Instructor: Christoforos Nikolaou

 

Summary

This course aims to describe some of the most prominent and most widely used algorithms for the analysis of biological data, with emphasis on handling and analyzing biological sequences. The course focuses on a detailed description of algorithms for alignment, the rapid search of short sequences, tracing patterns and finding motifs in sequences, sequence assembly and phylogenetic analysis among others. The main goal of the course is not to describe in detail the most sophisticated implementations but to present the rationale behind the design of algorithms in a constructive and educational manner from both theoretical and practical viewpoints. Classes will include theoretical lectures as well as practical exercises, where students will be required to implement algorithms in the language of their choice.

Basic Prerequisites

A preparatory course on molecular biology. Basic knowledge of math and statistics.

Course Goals

.Becoming familiarized with the most common problems in the analysis of biological sequences.

.Being able to categorize the algorithms that find wide use in bioinformatics (dynamic programming, randomized algorithms, divide and conquer algorithms etc).

.Achieving a high level of competence in the performance of alignment and BLAST searches.

.Acquiring the ability to implement simple algorithms from the blackboard to the keyboard.

 

Assessment

Exercises (algorithm implementations and/or quiz question after each module): 50%. Final Exam : 50%

Suggested Reading

Introduction to Bioinformatics Algorithms. Pevzner and Jones

Bioinformatics Algorithms. A Practical Approach. Compeau and Pevzner

 

Tentative Program

Week

Module

1-2

Introduction: Analysis of Sequence Composition

Discussed Topics: k-mer analysis, over-representations in sequences, sequence segmentation

Biological Problem: Locating the origin of replication in a bacterial genome

3-4

Motif Discovery: Randomized Algorithms

Discussed Topics: Motifs in biological sequences, sequence logos, Shannon Entropy calculations in motifs, de novo motif discovery  

Biological Problem: Locating transcription factor binding sites is DNA sequences

5-6

Sequence Alignment: Dynamic Programming

Discussed Topics: Sequence comparisons, local and global alignment algorithms, scoring matrices

Biological Problem: Pairwise alignment of two protein sequences with various scoring matrices

7-8

String Matching and Rapid Searches

Discussed Topics: Rapid sequence comparisons, the BLAST algorithm, statistics of BLAST, rapid searches in complete genomes with BLAT

Biological Problem: Annotating the human genome with BLAT

9-10

Data Structures and Transformation for NGS data analysis

Discussed Topics: NGS data, big data biology, suffix trees, Burrows-Wheeler Transformation.

Biological Problem: Constructing and parsing a suffix tree for fast/accurate motif finding in a genomic sequence

11-12

Phylogenetic Analysis

Discussed Topics: Sequence comparison and clustering, distance methods, phylogenetic trees, tree combinatorics

Biological Problem: Building an NJ-tree for a DNA sequence distance matrix

13

Anomaly Detection for NGS data analysis

Discussed Topics: NGS data, working with files and coordinates

Biological Problem: Implementing algorithms for Peak detection in NGS data

 

List of Suggested Elective Courses*

*The list of Electives is currently being updated and remains under consideration. Elective courses will be finalized by the end of the Spring 2017 semester.

 

Course Name

Existing/New

Instructor(s)

Semester

ECTS

Suggested by

Computational Neuroscience

New

Yiota Poirazi, Athanasia Papoutsi

TBD

 

Yiota Poirazi

Text Mining in Bioinformatics

New

Ioannis Iliopoulos, Nikolas Papanikolaou, Evangelos Pafilis

TBD

 

Ioannis Iliopoulos