SeqStat

SeqStat is a package that contains tools to generate stats from a FastQ file, merge those stats for multiple samples, and validate the generated stats files.

Mode - Generate

Generate outputs several stats on a FASTQ file.

Outputted stats:

  • Bases
    • Total number
    • Base qualities, with the number of bases having that quality
    • Number of each nucleotide
  • Reads
    • Total number
    • minimum length
    • maximum length
    • A histogram of the average base qualities
    • The quality encoding (Sanger, solexa etc.)
    • A histogram of the read lengths.

Mode - Merge

This module will merge seqstat files together and keep the sample/library/readgroup structure. If required it's also possible to collapse this, the output file then des not have any sample/library/readgroup structure.

Mode - Validate

A file from SeqStat will validate the input files. If aggregation values can not be regenerated the file is considered corrupt. This should only happen when the user will edit the seqstat file manually.

Documentation

For documentation and manuals visit our github.io page.

About

SeqStat is part of BIOPET tool suite that is developed at LUMC by the SASC team. Each tool in the BIOPET tool suite is meant to offer a standalone function that can be used to perform a dedicate data analysis task or added as part of a pipeline, for example the SASC team's biowdl pipelines.

All tools in the BIOPET tool suite are Free/Libre and Open Source Software.

Contact

For any question related to SeqStat, please use the github issue tracker or contact the SASC team directly at: [email protected].