Web applicationDrag & drop assemblies/sequencing reads to Pathogenwatch to get GPSC assignments
Command-lineInstall popPUNK as per instructions at https://poppunk.readthedocs.io/en/latest/installation.html and download the GPS reference database “GPS_query.tar.bz2” from the following link Database and the GPSC designations “gpsc_definitive.csv” from CSV this page.
- Files required to run GPSC assignment using popPUNK:
- queries.txt: a list of paths to assemblies you wish to assign GPSCs to
- GPS_query: GPS reference database, uncompress GPS_query.tar.bz2
- gpsc_definitive.csv: Published GPSC designations for the references
output directory name is assigned using --output
number of threads can be changed using --threads
Run GPSC assignment:
poppunk --assign-query --ref-db GPS_query --distances GPS_query/GPS_query.dists --model-dir GPS_query --q-files queries.txt --output GPSC_assignment --threads 8 --full-db --external-clustering gpsc_definitive.csv
_clusters.csv: popPUNK clusters with dataset specific nomenclature
_external_clusters.csv: GPSC v2 scheme designations
Novel Clusters: Will be assigned NA in the _external_clusters.csv as they have not been seen in the v2 dataset used to designate the GPSCs. The popPUNK _clusters.csv file can be used to determine if NA isolates are the same cluster or not. Please email: email@example.com to have novel clusters added to the database and a GPSC cluster name assigned after you have checked for low level contamination which may contribute to biased accessory distances.
Merged clusters: Unsampled diversity may represent missing variation linking two clusters. GPSCs are then merged. For example if GPSC23 and GPSC362 merged, the GPSC would be then reported as GPSC23, with a merge history of GPSC23;362.
These instructions are available to download here: Instructions