Q is a fast saturation-based ChIP-seq and
Q works well in conjunction with the irreproducible discovery rate
Q was extensively tested on publicly available ChIP-seq datasets from
ENCODE and shown to perform well with respect to reproducibility of the called peak set, consistency of the peak sets with respect to
predicted transcription factor binding motifs contained in them, as well as overall run time. Q is implemented in C++ making use of the
There are a number of useful features for the primary analysis of ChIP-seq data. Q can be run with or without data from a control experiment. Duplicate reads are removed on the fly without altering the original BAM file, and the number of duplicated reads is then shown in Q's output. The average fragment length of the sequencing library, which is an essential parameter for peak calling and for downstream analysis, is estimated automatically from the data. This is done by examing the vector of read start positions along individual chromosomes and calcuting the shift that is associated with the smallest Hamming distance. This procedure yields an equivalent estimation of the average fragment length as the cross-correlation plot of SPP but is approximately three times faster. As a part of this procedure, Q also calculates the relative strand cross-correlation coefficient (RSC), which allows a global quality assessment of the enrichment. In addition Q offers its own quality metrics, which can be used for trouble-shooting and quality control of the results. If desired, Q also generates fragment coverage profiles which can be uploaded to UCSC's genome browser, where they can be displayed in the context of other related data such as for example ChIP-seq data for histone modifications and cofactors or expression data.
ChIP-nexus, an extension of the ChIP-exo protocol, requires less input DNA and enables selective PCR duplicate removal using random barcodes. We developed a comprehensive software package for ChIP-nexus data that exploits the random barcodes used in the ChIP-nexus protocol. Furthermore, we developed bespoke methods to estimate the length of the protected region resulting from protein-DNA binding as well as for peak calling.