Training: Command Line
In silico serotyping
Install SeroBA (Epping et al 2018) as per instructions at https://github.com/sanger-pathogens/seroba#installation and git clone the database from the following link https://github.com/sanger-pathogens/seroba.git.
Files required to run serotyping using SeroBA:
- paired-end fastq files
- database
- sample list (only for running on multiple samples)
Run in silico serotyping on a single sample:
serotype runSerotyping <full path to the database> <read 1> <read 2> <output folder prefix>
Run in silico serotyping on multiple samples:
- create a list of sample names and save it as samplelist (e.g. the sample name for 24371_8#283_1.fastq.gz is 24371_8#283)
-
for f in $(cat samplelist); do seroba runSerotyping <path to the database> ${f}_1.fastq.gz ${f}_2.fastq.gz ${f}; done
-
seroba summary ./
Output:
summary.tsv
These instructions are available to download here:
Download
GPSC assignment
Install PopPUNK 2.4 as per instructions at PopPUNK documentation and download the GPS reference database and the GPS designation.
GPS reference database:
Download
GPS designation:
Download
Files required to run GPSC assignment using PopPUNK 2.4:
- A 2-column tab-delimited file containing sample name and path to the corresponding assembly (no header)
- GPS reference database <GPS_v5>
- GPS designation <GPS_v5_external_clusters.csv>
output directory name is assigned using --output
number of threads can be changed using –threads
Run GPSC assignment:
poppunk_assign --db GPS_v5 --distances GPS_v5/GPS_v5.dists --query <2-column path to assembly> --output <GPSC_assignment> --external-clustering GPS_v5_external_clusters.csv
Outputs:
_clusters.csv: popPUNK clusters with dataset specific nomenclature
_external_clusters.csv: GPSC v5 scheme designations
Novel Clusters: Will be assigned NA in the _external_clusters.csv as they have not been defined in the v5 dataset used to designate the GPSCs. Please email: globalpneumoseq@gmail.com to have novel clusters added to the database and a GPSC cluster name assigned after you have checked for low level contamination which may contribute to biased accessory distances.
Merged clusters: Unsampled diversity may represent missing variation linking two clusters. GPSCs are then merged. For example if GPSC23 and GPSC362 merged, the GPSC would be then reported as GPSC23, with a merge history of GPSC23;362.
The instructions for PopPUNK v2.4 are available to download
Download
The instructions for PopPUNK v1 are available to download
Download