Q is a fast saturation-based ChIP-seq and
ChIP-nexus
peak caller.
Q works well in conjunction with the irreproducible discovery rate
(IDR)
procedure.
Q was extensively tested on publicly available ChIP-seq datasets from
ENCODE and shown to perform well with respect to reproducibility of the called peak set, consistency of the peak sets with respect to
predicted transcription factor binding motifs contained in them, as well as overall run time. Q is implemented in C++ making use of the
SeqAn library.
There are a number of useful features for the primary analysis of ChIP-seq data.
Q can be run with or without data from a control experiment.
Duplicate reads are removed on the fly without altering the original BAM file, and the number of duplicated reads is then shown in Q's output.
The average fragment length of the sequencing library,
which is an essential parameter for peak calling and for downstream analysis,
is estimated automatically from the data.
This is done by examing the vector of read start positions along individual chromosomes and
calcuting the shift that is associated with the smallest Hamming distance. This procedure yields
an equivalent estimation of the average fragment length as the cross-correlation plot of
SPP
but is approximately three times faster.
As a part of this procedure, Q also calculates the relative strand cross-correlation coefficient
(RSC),
which allows a global quality assessment of the enrichment.
In addition Q offers its own quality metrics,
which can be used for trouble-shooting and quality control of the results.
If desired, Q also generates fragment coverage profiles which can be uploaded to
UCSC's genome browser,
where they can be displayed in the context of other related data such as
for example ChIP-seq data for histone modifications and cofactors or expression data.
ChIP-nexus, an extension of the ChIP-exo protocol, requires less input DNA and
enables selective PCR duplicate removal using random barcodes. We developed
a comprehensive software package for ChIP-nexus data that exploits the
random barcodes used in the ChIP-nexus protocol. Furthermore, we developed
bespoke methods to estimate the length of the protected region resulting from
protein-DNA binding as well as for peak calling.