Design, Implementation and Evaluation of a Highly Sensitive CUDA-based High-Throughput Short Read Mapping Method

High-throughput DNA sequencing technologies (such as Illumina sequencers) have promoted the production of short reads with dramatically low unit cost. The explosive growth of short read datasets poses a challenge to the mapping of short reads to reference genomes, such as the human genome, in terms of alignment quality and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow larger. The goal in this project is therefore to design, implement, and evaluate a new short read alignment method with i. high sensitivity ii. high throughput iii. high efficiency for increasing short read length and increasing error rates To achieve these goals our approach will be based on using fast and flexible data structures and algorithms that exhibit high efficiency on massively parallel CUDA-enabled graphics hardware (GPUs).

This research project is carried out in close collaboration with Dr. John Castle (TrOn - Translationale Onkologie gGmbH Mainz)