MR-MEGA
INTRODUCTION
MR-MEGA (Meta-Regression of Multi-Ethnic Genetic Association) is a tool to detect and fine-map complex trait association signals via trans-ethnic meta-regression. This approach uses genome-wide metrics of diversity between populations to derive axes of genetic variation via multi-dimensional scaling [Purcell 2007]. Allelic effects of a variant across GWAS, weighted by their corresponding standard errors, can then be modelled in a linear regression framework, including the axes of genetic variation as covariates. The flexibility of this model enables partitioning of the heterogeneity into components due to ancestry and residual variation, which would be expected to improve fine-mapping resolution.
Questions and suggestions conserning the method should be sent to: apmorris [ät] liverpool.ac.uk
CITATION
Please cite the paper:
DOWNLOAD
MR-MEGA ver. 0.1.6 (changelog)Please note that current version is still beta version and may contain errors. In case of any problems, please write to reedik.magi [ät] ut.ee
Some additional tools:
fixP.r manh.r qq.r
As the current C++ library enables to calculate p-values>1e-14, you can use fixP.r script to recalculate p-values in R based on chisq and ndf values down to p-values>1e-325. This script is not necessary for creating MANH and QQ plots as these scripts will do the same calculation independently.
To run the script, use following command:
R --slave --vanilla < fixP.r
If your result file is not mrmega.result, then you can also change input and output files of the script by:
R --slave --vanilla --args input=inputfilename out=outputfilename < fixP.r
Manhattan and QQ plots can be created with accompanied R scripts.
R --slave --vanilla < manh.r
R --slave --vanilla < qq.r
By default they expect input file name "mrmega.result" and they create output files: "mrmega.result.qq_assoc.png", "mrmega.result.qq_ancest.png", "mrmega.result.qq_resid.png" and "mrmega.result.manh.png". Different names can be used as:
R --slave --vanilla --args input=inputfilename out=outputfilename < MANH.R
R version 2.9.0 or later must be used with png support
INSTALLATION
Copy MR-MEGA_v*.zip file into your computer, unzip the file:
unzip MRMEGA_v*.zip
To compile MR-MEGA program, use command:
make
in the folder where files have been unpacked. The program can be run by typing:
./MR-MEGA
INPUT FILES FOR MR-MEGA
For running MR-MEGA you have to create an input file (default name “mr-mega.in”), which contains the list of all study files. The should have each results' file on separate row.
Sample “MR-MEGA.in” file:
Pop1.txt.gz
Pop2.txt.gz
Pop3.txt.gz
Pop4.txt.gz
Pop5.txt.gz
Pop6.txt.gz
Pop7.txt.gz
Pop8.txt.gz
Each GWA study file has mandatory column headers:
1) MARKERNAME – snp name
2) EA – effect allele
3) NEA – non effect allele
4) OR - odds ratio
5) OR_95L - lower confidence interval of OR
6) OR_95U - upper confidence interval of OR
7) EAF – effect allele frequency
8) N - sample size
9) CHROMOSOME - chromosome of marker
10) POSITION - position of marker
In case of quantitative trait:
4) BETA – beta
5) SE – std. error
Study files might also contain column:
11) STRAND – marker strand (if the column is missing then program expects all markers being on positive strand)
Sample study file (NB! This file is a quantitative trait one and MR-MEGA has to be run with --qt command line option):
MARKERNAME STRAND CHROMOSOME POSITION IMP EA NEA EAF N BETA SE
rs12565286 + 1 761153 0 G C 0.3 1200 -0.02 0.0403
rs2977670 + 1 763754 0 C G 0.23 1200 -0.01 0.40612
rs12138618 + 1 790098 0 G A 0.97 1200 -0.07 0.37
rs3094315 + 1 792429 0 G A 0.01 1199 0.0258 0.1012
rs3131968 + 1 794055 0 G A 0.27 1200 -0.373 0.0101
rs2519016 + 1 805811 0 T C 0.04 1200 0.26 0.3472
rs12562034 + 1 808311 0 G A 0.65 1200 0.0092 0.2
Input files must be either tab or space delimited. Files must not have empty columns as multiple separators are treated as one. Files may contain additional columns, which are not used by MR-MEGA.
RUNNING Mr-MEGA
Command line options:
./MR-MEGA [--name_pos <string>] ... [--name_chr <string>] ...
[--name_n <string>] ... [--name_strand <string>] ...
[--name_or_95u <string>] ... [--name_or_95l <string>] ...
[--name_or <string>] ... [--name_se <string>] ...
[--name_beta <string>] ... [--name_eaf <string>] ...
[--name_nea <string>] ... [--name_ea <string>] ...
[--name_marker <string>] ... [-f <string>] ... [--pc <int>]
[-t <double>] [--no_std_names] [--debug] [--qt] [--gco]
[--gc] [--no_alleles] [-m <string>] [-o <string>] [-i
<string>] [--] [--version] [-h]
Where:
--name_pos <string> (accepted multiple times)
Alternative header to position column. Default POSITION
--name_chr <string> (accepted multiple times)
Alternative header to chromosome column. Default CHROMOSOME
--name_n <string> (accepted multiple times)
Alternative header to sample size column. Default N
--name_strand <string> (accepted multiple times)
Alternative header to strand column. Default STRAND
--name_or_95u <string> (accepted multiple times)
Alternative header to upper 95 CI of odds ratio column. Default OR_95U
--name_or_95l <string> (accepted multiple times)
Alternative header to lower 95 CI of odds ratio column. Default OR_95L
--name_or <string> (accepted multiple times)
Alternative header to odds ratio column. Default OR
--name_se <string> (accepted multiple times)
Alternative header to standard error column. Default SE
--name_beta <string> (accepted multiple times)
Alternative header to effect column. Default BETA
--name_eaf <string> (accepted multiple times)
Alternative header to effect allele frequency column. Default EAF
--name_nea <string> (accepted multiple times)
Alternative header to other allele column. Default NEA
--name_ea <string> (accepted multiple times)
Alternative header to effect allele column. Default EA
--name_marker <string> (accepted multiple times)
Alternative header to marker name column. Default MARKERNAME
-f <string>, --filter <string> (accepted multiple times)
Set a filtering based on column name. It needs 3 arguments: column
name, equation [>,<,>=,<=,==,!=], numeric filter value. Multiple
filters can be set. Please note that UNIX may require using '\' before
'<' and '>' signs. Column names are not case sensitive. (Example:
INFO\>0.4)
--pc <int>
This specifies the number od PC to use in regression. Default = 4. Please note that the PC count must be < cohort count - 2. Therefore, if five cohorts have been used in the analyse, then the maximum number of PC-s can be two!
-t <double>, --threshold <double>
The p-value threshold for showing direction. Default = 1
--no_std_names
Default column names are not used. All columns must be be defined by
user
--debug
Debug mode on (default OFF)
--qt
Use this option, if trait is quantitative (columns BETA & SE). Default
is binary trait (columns OR, OR95_U, OR_95_L)
--gco
Use second genomic control correction on output file
--gc
Use genomic control correction on input files
--no_alleles
No allele information has been given. Expecting always the same EA
-m <string>, --map <string>
This specifies map file
-o <string>, --out <string>
This specifies output root. By default mrmega
-i <string>, --filelist <string>
Specify studies' result files. Default = mrmega.in
--, --ignore_rest
Ignores the rest of the labeled arguments following this flag.
--version
Displays version information and exits.
-h, --help
Displays usage information and exits.
OUTPUT FILES
MR-MEGA generates two output files: *.result and *.log
Results file contains following columns:
MarkerName - unique marker identification across input files
Chromosome - chromosome of marker
Position - physical position in chromosome of marker
EA - allele, which effect was measured across input files
NEA - other allele
EAF - average effect allele frequency (weighted by the samplesize of each input file)
Nsample - total number of samples
Ncohort - total number of cohorts, where the marker was present
Effects - effect direction across cohorts (+ if the effect allele effect was positive, - if negative, 0 if the effect was zero, ? if marker was not available in cohort)
beta_0 - effect of first PC of meta-regression
se_0 - stderr of the effect of first PC of meta-regression
(beta_1)
(se_1)
(...)
chisq_association - chisq value of the association
ndf_association - number of degrees of freedom of the association
P-value_association - p-value of the association
chisq_ancestry_het - chisq value of the heterogeneity due to different ancestry
ndf_ancestry_het - ndf of the heterogeneity due to different ancestry
P-value_ancestry_het - p-value of the heterogeneity due to different ancestry
chisq_residual_het - chisq value of the residual heterogeneity
ndf_residual_het - ndf of the residual heterogeneity
P-value_residual_het - p-value of the residual heterogeneity
lnBF - log of Bayes factor
Comments - reason why marker was not analysed