pyseer documentation
pyseer
was first written a python reimplementation of seer, which was written in C++.
pyseer
uses linear models with fixed or mixed effects to estimate the
effect of genetic variation in a bacterial population on a phenotype of
interest, while accounting for potentially very strong confounding population
structure. This allows for genome-wide association studies (GWAS) to be performed in
clonal organisms such as bacteria and viruses.
The original version of seer
used sequence elements (k-mers) to represent
variation across the pan-genome. pyseer
also allows variants stored in VCF
files (e.g. SNPs and INDELs mapped against a reference genome) or Rtab files
(e.g. from roary or
piggy to be used too). There are also a greater range of association models
available, and tools to help with processing the output.
Testing shows that results (p-values) should be the same as the original
seer
, with a runtime that is roughly twice as long as the optimised C++
code.
We have also extended pyseer
to fit association models to the whole genome, which also
allows the use of machine learning to predict traits in new samples.
Citations
If you find pyseer useful, please cite:
Lees, John A., Galardini, M., et al. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 34:4310–4312 (2018). doi:10.1093/bioinformatics/bty539.
If you use unitigs (through unitig-counter) please cite:
Jaillard M., Lima L. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLOS Genetics. 14, e1007758 (2018). doi:10.1371/journal.pgen.1007758.
The whole genome/predictive models:
Lees, John A., Mai, T. T., et al. Improved inference and prediction of bacterial genotype-phenotype associations using interpretable pangenome-spanning regressions. (2020) Preprint: https://doi.org/10.1101/852426