Whole genome based analyses are becoming increasingly important in biological research, spanning, but not limited to, evolutionary, medical, and conservation contexts. Genome assembly, an initial step in genomic analyses, is a rapidly developing area of research, and so staying up to date with its current state can be challenging. Furthermore, it can be difficult to understand for researchers new to the field. This workshop is targeted towards researchers having anywhere from no background up to advanced knowledge of genome assembly. It will function as a roadmap from designing genome sequencing projects all the way to obtaining a “final”genome assembly, with some brief discussion of downstream analyses. On the first afternoon, I will start very basic by covering pre-planning and laboratory topics such as the different sequencing technologies available, and how to decide on which sequencing platform and library preparation method to use. On the second afternoon, I will outline the different steps needed to process the raw sequencing data, as well as the different assembly, quality assessment and improvement methods. To make the workshop more user-friendly, I will discuss popular tools employed at the different steps.
Part I, 9/25/2017
Basics and A priori Knowledge of the Genome to be Sequenced
Prior knowledge about the genome that will be sequenced can help in choosing the appropriate sequencing and assembly strategy. Here, I will cover some basics and then discuss different genome characteristics that strongly influence whether agenome will be easy or difficult to sequence and assemble successfully.
I will outline different 1st, 2nd and 3rd generation sequencing strategies. The sequencing platforms I will cover in this section include Illumina (MiSeq and HiSeq), IonTorrent, IonProton, ABI Solid, PacBio, Nanopore and Helicos.
Sequencing Library Setup
I will discuss the differences (including pros and cons) of Illumina library preparation methods, such as paired-end (PE), mate pair (MP), TrueSeq Synthetic Long-Reads and 10X genomics Linked Long-Reads. In this section, I will also outline other strategies such as BAC or fosmid based sequencing and chromosome folding based long-range linkage methods such as Dovetail Genomics’ Chicago library.
Part II, 9/26/2017
Raw Read Data Processing
In this section, I will talk about tools used to assess, as well as, improve sequencing read quality.
De Novo Assembly Strategies and Tools
To make the workshop more useful, I will outline the different popular assembly tools (for assembly of large genomes) and briefly discuss the underlying algorithms. By doing so, I will also explain terms commonly used in genome assembly ( e.g. kmer, N50, etc).
Assembly Quality Assessment
A critical step after assembling a genome is assessing the quality of the resulting sequence. In cases where different assemblers or different kmer sizes are used, tools are needed to decide which of the assemblies is the best.
Bioinformatic Assembly Improvement
There are different tools that can be used to improve a genome sequence after the initial assembly, either by filling gap regions or finding and resolving mis-assembledregions. Furthermore, genome assemblies can be merged to improve quality.
Lab-based Assembly Improvement
In this section, I will briefly discuss the pros and cons of Physical and Optical Mapping methods.
Draft vs. Finished Assembly
A crucial decision in genomics is whether a genome assembly is good enough to address the desired research questions. Here, I will explain the differences between finished and draft genome assemblies, and give some guidance on deciding if further sequencing is needed or not.
To conclude the workshop, I will briefly outline subsequent downstream processing and analyses steps, such as repeat and gene annotation, or how to get a haploid genome sequence into a diploid genome mapping framework.