Bioconductor Code: GIGSEA

Name	Mode	Size
R	040000
data	040000
inst	040000
man	040000
vignettes	040000
.gitignore	100644	0 kb
DESCRIPTION	100644	1 kb
NAMESPACE	100644	1 kb
README.md	100644	3 kb

README.md

# GIGSEA Genotype Imputed Gene Set Enrichment Analysis using GWAS Summary Level Data ## Description Various methods of gene set analysis for trait-associated SNPs have been proposed, however, many challenges and limitations remained: 1. Gene boundaries: different criteria have been proposed to assign a SNP to a gene but no consensus was reached; 2. Long-range regulation: assigning a causal link to the gene nearest the associated variant falls short of elucidating long-range functional connection; 3. Gene size: longer genes are more likely to have significant P-values, possibly inflating the association test for gene sets that have many long genes; 4. Multiple-marker regulation: the best strategy is not determined on the number of SNPs for each gene and aggregation of different effect sizes of SNPs; 5. Linkage disequilibrium (LD): the local LD may reduce power to detect associations dependent on multiple markers. 6. Redundancy among gene sets: a gene may function in multiple ways and thus appear multiple times in functional gene sets. In spite of reflecting the crosstalk between gene sets, the overlap in gene sets may make the results of gene set enrichment analysis more difficult to interpret; 7. Permutation efficiency: the computational burden of permutation can be substantial; 8. Threshold-selection: a threshold-dependent procedure may cause the instability of results. Here, we present GIGSEA (Genotype Imputed Gene Set Enrichment Analysis), a novel method that uses GWAS summary statistics and eQTL to infer differential gene expression and interrogate gene set enrichment for the trait-associated SNPs. By incorporating empirical eQTL of disease-relevant tissue, GIGSEA naturally accounts for factors such as gene size, gene boundary, SNP distal regulation, and multiple-marker regulation. The weighted linear regression model was used to perform the enrichment test, properly adjusting imputation accuracy, model incompleteness and redundancy in different gene sets. The significance level of enrichment is assessed by permutation, where matrix operation was employed to dramatically improve computation speed and efficiency. We have shown GIGSEA has appropriate type I error, and demonstrated high computational efficiency on real data set and discovered the plausible biological findings. ## Dependencies on R packages - Matrix - [Matrix](https://cran.r-project.org/web/packages/Matrix/index.html) - locfdr - [locfdr](https://cran.r-project.org/web/packages/locfdr/index.html) ## Installation: 1. Install GIGSEA R package from [devtools](https://github.com/hadley/devtools) package in R ``` > install.packages("devtools") > library(devtools) > install_github("zhushijia/GIGSEA") ``` 2. Install [MetaXcan](https://github.com/hakyimlab/MetaXcan) package in Python ## Example: See [Tutorial](https://github.com/zhushijia/GIGSEA/blob/master/vignettes/GIGSEA_tutorial.Rmd)