Tutorial
INSTALLATION (SCOPA and METASCOPA)
Copy SCOPAv*.zip file into your computer, unzip the file:
unzip SCOPAv*.zip
To compile SCOPA program, use command:
make
in the folder where files have been unpacked. The program can be run by typing:
./SCOPA
Copy METASCOPAv*.zip file into your computer, unzip the file:
unzip METASCOPAv*.zip
To compile METASCOPA program, use command:
make
in the folder where files have been unpacked. The program can be run by typing:
./METASCOPA
INPUT FILES FOR SCOPA
For running SCOPA, you need input files in SNPTESTv.2 format. SNPTEST file formats are described here. In case of case-control type of analysis, you should have single gen and sample file, where the phenotype is coded 0=control; 1=case. Please note that SCOPA cannot currently use covariates - therefore please adjust all phenotypes for the covariates and use the residuals of the phenotypes in sample file (case and controls values will be floating numbers around 0 and 1).
GENOTYPE file:
1 rs1 11 A T 1 0 0 1 0 0 1 0 0
1 rs2 210 A T 0 1 0 1 0 0 1 0 0
1 rs3 300 A T 1 0 0 1 0 0 1 0 0
1 rs4 4637 A T 1 0 0 1 0 0 1 0 0
1 rs5 5555 A T 1 0 0 1 0 0 1 0 0
(Genotype file can be gzipped, if it has *.gz extension)
SAMPLE file:
Sample_id Subject_id Missing Phenotype1 Phenotype2 Phenotype3
0 0 0 P P P
1 1 0 1.24 0.331 0.41
2 2 0 1.23 -0.3 0.42
3 3 0 1.22 -.47 0.43
This file contains data for three phenotypes. As the program cannot use covariates, please adjust your phenotypes for all the covariates and use the residuals of the phenotypes.
RUNNING SCOPA
Command line options:
./SCOPA [--debug] [--print_covariance] [--print_complex] [--betas]
[--print_all] [--remove_missing] --pheno_name <string> ...
[--imp_threshold <double>] [--missing_phenotype <string>] [-e
<string>] -o <string> -g <string> [--chr <int>] -s <string>
[--] [--version] [-h]
Where:
--debug
Debug mode on (default OFF)
--print_covariance
Print covariance matrix data for the model with all phenotypes. This is necessary for METASCOPA and can only be used with "--print_complex" option
(default OFF)
--print_complex
Print only the model with all phenotypes. These ful models can be meta-analysed with METASCOPA (default OFF)
--betas
Print each phenotype's effect size and stderr info of all selected models into separate output file (default OFF)
--print_all
Print out all models (default OFF)
--remove_missing
Remove sample if any of the phenotype values is missing. This is necessary if you want to compare models based on BIC scores (default OFF)
--pheno_name <string> (accepted multiple times)
(required) Name of phenotype to use (use this command multiple times i.e. --pheno_name BMI --pheno_name HEIGHT etc.)
--imp_threshold <double>
Imputation quality threshold (default 0)
--missing_phenotype <string>
This specifies missing data value (default NA)
-e <string>, --exclusion <string>
This specifies marker exclusion list
-o <string>, --out <string>
(required) This specifies output root
-g <string>, --gen <string>
(required) This specifies genotype file.
--chr <int>
This specifies chromosome to be printed into chromosome column
-s <string>, --sample <string>
(required) This specifies sample file
--, --ignore_rest
Ignores the rest of the labeled arguments following this flag
--version
Displays version information and exits
-h, --help
Displays usage information and exits
SCOPA OUTPUT COLUMNS
1 Chromosome - chromosome of variant if set with --chr option. Otherwise 0
2 Position - position of variant
3 MarkerName - variant name
4 EffectAllele - effect allele (necessary for meta-analysis)
5 OtherAllele - non-effect allele (necessary for meta-analysis)
6 InfoScore - Imputation quality measurement calculated similarly to IMPUTE2
7 HWE - p-value for HWE
8 MAF - minor allele frequency
9 N - samplesize
10 AA - genotype counts from imputed data
11 AB - genotype counts from imputed data
12 BB - genotype counts from imputed data
13 PhenotypeCount - number of phenotypes in model
14 Mask - binary mask showing the phenotypes used in current model (1-usd, 0-unused)
15 LogLikelihood - model likelihood
16 nullLogLikelihood - null model likelihood
17 LikelihoodRatio - likelyhood ratio
18 P-value - model p-value
19 BIC - Bayesian information score
20 BICnull - Bayesan iformation score for null model
21 Model - phenotypes in the order they were used in model (important for selecting covariance matrix for meta-analysis)
22 sortedModel - phenotypes in model in alphabetical order
23 beta_1 - effect size for phenotype 1
24 se_1 - stderr of effect for phenotype 1
25 beta_2 - effect size for phenotype 2
26 se_2 - stderr of effect for phenotype 2
27 beta_3 - effect size for phenotype 3
28 se_3 - stderr of effect for phenotype 3
29 cov_1_1 - inverted covariance matrix values
30 cov_1_2 - inverted covariance matrix values
31 cov_1_3 - inverted covariance matrix values
32 cov_2_2 - inverted covariance matrix values
33 cov_2_3 - inverted covariance matrix values
34 cov_3_3 - inverted covariance matrix values
INPUT FILE FOR METASCOPA
METASCOPA is the script for meta-analysing output files from SCOPA program. As the only input file, you will need a file listing all SCOPA *.results files, which you want to meta-analyse. Plese ote that the listed files must be gzipped. The input files must only contain single model (e.g. using option --print_complex in SCOPA) and you must have the covariance matrix between phenotypes (using option --print_covariance in SCOPA).
List file metascopa.in can contain rows:
cohort1.result.gz
cohort2.result.gz
cohort3.result.gz
RUNNING METASCOPA
Command line options:
./METASCOPA [--debug] [--ogc] [--gc] [-n <int>] [--mac <double>] [--maf <double>] [--info <double>] [--hwe <double>] -o <string> -i <string> [--] [--version] [-h]
Where:
--debug
Debug mode enabled
--gc
Use genomic control to adjust each contibuting file for population stratification (default
OFF)
--ogc
Use genomic control to adjust meta-analysis results for population
stratification (default OFF)
-n <int>, --samplesize <int>
This specifies minimum samplesize filter (default 0)
--mac <double>
This specifies minimum minor allele count filter (default 0)
--maf <double>
This specifies minimal minor allele frequency filter (default 0)
--info <double>
This specifies infoscore filter (default 0)
--hwe <double>
This specifies HWE p-value filter (default 1)
-o <string>, --out <string>
(required) This specifies output file
-i <string>, --input <string>
(required) This specifies input list file
--, --ignore_rest
Ignores the rest of the labeled arguments following this flag.
--version
Displays version information and exits.
-h, --help
Displays usage information and exits.
METASCOPA OUTPUT COLUMNS
1 MarkerName - variant name
2 EA - effect allele
3 NEA - other allele
4 CohortCount - number of cohorts having data of given variant
5 N - totalsamplesize
6 beta_0 - meta-analysed effect size of phenotype 1
7 se_0 - meta-analysed stderr of effect for phenotype 1
8 beta_1 - meta-analysed effect size of phenotype 1
9 se_1 - meta-analysed stderr of effect for phenotype 1
10 beta_2 - meta-analysed effect size of phenotype 1
11 se_2 - meta-analysed stderr of effect for phenotype 1
12 ChiSq - chi for entire model
13 Pvalue - p-value for entire model (please note that the script can only calculate p-values down to 1e-20. Lower p-values are given as 0. It is possible to get exact p-values down to ~1e-200 in R using formula:
p<-pchisq(ChiSq,PhenotypeCount, lower.tail=F)
ANALYSIS EXAMPLE
In the first GWAS step we would recommend testing for the full model (print_complex) as it reflects all associations with each separate phenotype and their combinations (e.g. ratios). If several datasets are available, it is recommended to also print the covariance matrix (print_covariance) and use that info for meta-analysing cohorts using METASCOPA tool. An example command line for analysing anthropometric traits would be:
./SCOPA --remove_missing -g cohort1_chr1.gen --chr 1 --print_complex --betas --print_covariance --out cohort1_chr1.result -s one.sample --pheno_name height --pheno_name weight --pheno_name hip --pheno_name waist
By merging all chromosomes a single output file for each cohort can be created:
awk '{if(NR==1 || $1!="Chromosome"){print;}}' cohort1_chr*.result | gzip -c > cohort1.result
These results files from contributing cohorts must be written into a file, listing all file names and then all contributing files can be meta-analysed using METASCOPA with double GC correction:
ls cohort1.result.gz cohort2.result.gz cohort3.result.gz > cohorts.in
./METASCOPA --gc --ogc --mac 10 --info 0.4 --hwe 1e-4 -i cohorts.in -o meta.results
As the next step, top signals from the METASCOPA output can be selected, these variants could be filtered and SCOPA analysis with “print_all” (or without print command to get only the best possible model based on model BIC scores) can be made to find the optimal set of phenotypes associated with particular variant.
./SCOPA --remove_missing -g cohort1_hits.gen --out cohort1_modelselection.result -s one.sample --pheno_name height --pheno_name weight --pheno_name hip --pheno_name waist