Assigning your own data to GPSCs v2


Web application

Drag & drop assemblies/sequencing reads to Pathogenwatch to get GPSC assignments


Command-line

Install popPUNK as per instructions at https://poppunk.readthedocs.io/en/latest/installation.html and download the GPS reference database “GPS_query.tar.bz2” from the following link Database and the GPSC designations “gpsc_definitive.csv” from CSV this page.


    Files required to run GPSC assignment using popPUNK:
  1. queries.txt: a list of paths to assemblies you wish to assign GPSCs to
  2. GPS_query: GPS reference database, uncompress GPS_query.tar.bz2
  3. gpsc_definitive.csv: Published GPSC designations for the references

output directory name is assigned using --output

number of threads can be changed using --threads


Run GPSC assignment:

poppunk --assign-query --ref-db GPS_query --distances GPS_query/GPS_query.dists --model-dir GPS_query --q-files queries.txt --output GPSC_assignment --threads 8 --full-db --external-clustering gpsc_definitive.csv


Outputs:

_clusters.csv: popPUNK clusters with dataset specific nomenclature

_external_clusters.csv: GPSC v2 scheme designations


Novel Clusters: Will be assigned NA in the _external_clusters.csv as they have not been seen in the v2 dataset used to designate the GPSCs. The popPUNK _clusters.csv file can be used to determine if NA isolates are the same cluster or not. Please email: globalpneumoseq@gmail.com to have novel clusters added to the database and a GPSC cluster name assigned after you have checked for low level contamination which may contribute to biased accessory distances.


Merged clusters: Unsampled diversity may represent missing variation linking two clusters. GPSCs are then merged. For example if GPSC23 and GPSC362 merged, the GPSC would be then reported as GPSC23, with a merge history of GPSC23;362.

These instructions are available to download here: Instructions