Bioinformatics Tools to Discover the Biological Function of Repeats in Eukaryotic Genomes

Repetitive DNA sequences are abundant and cover nearly half of the human genome. Increasing evidence suggests that at least parts of these are functional elements, which might actively contribute to genome function. However, knowledge if repeats also function in regulating these mechanisms is scarce. This is largely due to the technical challenges that repetitive DNA sequences impose to alignment algorithms applied to genome-wide short-read datasets created by high-throughput next-generation sequencing (NGS). Because of this, the portions of NGS data containing repeats are most commonly discarded. The goal in this project is to design, implement and evaluate a new ChIP-seq analysis pipeline that includes a novel parallelized algorithm for aligning multi-reads accurately and efficiently. This new bioinformatics pipeline will then be used to analyze newly created ChIP-seq datasets in order to reveal the possible role of repeat elements in driving gene regulation.