Differential coexpression analysis
##### Wenbin Wei, Sandeep Amberkar, Winston Hide, Aug 9, 2023
## 1. Description
This package identifies differentially coexpressed links (DCLs) and differentially coexpressed genes (DCGs). DCLs are gene pairs with significantly different correlation coefficients under two conditions (de la Fuente 2010, Jiang et al., 2016). DCGs are genes with significantly more DCLs than by chance (Yu et al., 2011, Jiang et al., 2016). It takes two gene expression matrices or data frames under two conditions as input, calculates gene-gene correlations under two conditions and compares them with Fisher's Z transformation(Fisher 1915 and Fisher 1921). It filters gene pairs with the thresholds for correlation coefficients and their adjusted p value as well as the thresholds for the difference between the two correlation coefficients and its adjusted p value. It identifies DCGs using binomial probability model (Jiang et al., 2016).
The main steps are as follows:
a). Correlation coefficients and p values of all gene pairs under two conditions are calculated.
b). The differences between the correlation coefficients under two conditions are calculated and their p values are calculated using Fisher's Z-transformation.
c). p values are adjusted.
d). Gene pairs (links) coexpressed in at least one condition are identified using the criteria that at least one of the correlation coefficients under two conditions has absolute value greater than the threshold *rth* and adjusted p value less than the threshold *qth*. The links that meet the criteria are included in co-expressed links (CLs).
e). Differentially coexpressed links (gene pairs) are identified from CLs using the criteria that the absolute value of the difference between the two correlation coefficients is greater than the threshold *r.diffth* and its adjusted p value is less than the threshold *q.diffth*. The links that meet the criteria are included in DCLs.
f). The DCLs are classified into three categories: *same signed*, *diff signed*, or *switched opposites*. *same signed* indicates that the gene pair has same signed correlation coefficients under both conditions. *diff signed* indicates that the gene pair has oppositely signed correlation coefficients under two conditions and only one of them meets the criteria that absolute correlation coefficient is greater than the threshold *rth* and adjusted p value less than the threshold *qth*. *switched opposites* indicates that the gene pair has oppositely signed correlation coefficients under two conditions and both of them meet the criteria that absolute correlation coefficient is greater than the threshold *rth* and adjusted p value less than the threshold *qth*.
g). All the genes in DCLs are tested for their enrichment of DCLs, i.e, whether they have more DCLs than by chance using binomial probability model (Jiang et al., 2016). Those with adjusted p value less than the threshold *q.dcgth* are included in DCGs.
## 2. Installation and removal
This package is available from Bioconductor and can be
installed within R as follows:
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
To install this package from GitHub, start R and enter:
devtools::install_git("git://github.com/hidelab/diffcoexp.git", branch = "master")
The above method does not build and install vignette. To install the package with vignette, enter the following from command line:
git clone https://github.com/hidelab/diffcoexp.git
R CMD build diffcoexp
R CMD check diffcoexp_1.11.1.tar.gz
R CMD INSTALL diffcoexp_1.11.1.tar.gz
To remove this package, start R and enter:
## 3. Input and output of *diffcoexp* function
The main function of this package is *diffcoexp* function. The first two arguments, *exprs.1* and *exprs.2*, are normalized gene expression data under two conditions with rows as genes and columns as samples. They should be objects of classes *SummarizedExperiment*, *data.frame* or *matrix*. Both should have the same number of genes in the same order. The third argument *r.method* is passed to the *cor* function of the *WGCNA* package as argument *method*, details of which can be found by typing
The fourth argument *q.method* is passed to the *p.adjust* function of the *stats* package as argument *method*, details of which can be found by typing
Details of other arguments of *diffcoexp* function can be found by typing
The output of *diffcoexp* function is a list of two data frames, one for differentially co-expressed links (DCLs), the other for differentially co-expressed genes (DCGs). Further details of the output can be seen on the help page.
## 4. Analysis and interpretation of DCGs and DCLs
DCGs are a list of genes and therefore can be further analysed using other tools such as FGNet (https://bioconductor.org/packages/release/bioc/html/FGNet.html), clusterProfiler (https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) and enrichr (http://amp.pharm.mssm.edu/Enrichr/). DCLs are a list of differentially co-expressed gene pairs and can be assembled into a differential coexpression network. The network is scale-free but not smallworld (Hsu et al., 2017). The network can be visualized and analyzed using igraph (https://cran.r-project.org/web/packages/igraph/index.html). DCLs can also be further analyzed to identify upstream causal regulators using other tools such as DCGL v2.0 (Yang et al., 2013).
## 5. Example
This example illustrates the workflow of downloading gene expression data from GEO and identifying differentially coexpressed links (DCLs) and differentially coexpressed genes (DCGs).
gse4158 <- getGEO("GSE4158")
keep<-rowSums(is.na(exprs)) < ncol(exprs)/5
exprs<-merge(GPL3415@dataTable@table, exprs, by.x="ID", by.y="ID")
exprs<-exprs[, c(7, 11:36)]
exprs<-aggregate(exprs[, -1], by=list(Gene=exprs$ORF), FUN=mean, na.action = na.omit)
Analysis of all the genes (6104) will take about 20 minutes on a computer with 8 cores and 16GB RAM.
res=diffcoexp(exprs.1 = exprs.1, exprs.2 = exprs.2, r.method = "spearman" )
The results are a list of two data frames, one for differentially co-expressed links (DCLs, gene pairs), the other for differentially co-expressed genes (DCGs).
de la Fuente A (2010). From “differential expression” to “differential networking” –
identification of dysfunctional regulatory networks in diseases. *Trends in Genetics*, 26(7):326-33.
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. *Biometrika*, 10 (4): 507–521.
Fisher, R. A. (1921). On the 'probable error' of a coefficient of correlation deduced from a small sample. *Metron*, 1: 3–32.
Hsu C-L, Juan H-F, Huang H-C (2015). Functional analysis and characterization of differential coexpression networks. *Scientific Reports*, 5: 13295
Jiang Z, Dong X, Li Z-G, He F, Zhang Z (2016). Differential coexpression analysis reveals extensive rewiring of Arabidopsis gene coexpression in response to Pseudomonas syringae infection. *Scientific Reports*, 6(1):35064.
Yang J, Yu H, Liu B-H, Zhao Z, Liu L, Ma L-X, et al. (2013) DCGL v2.0: An R package for unveiling differential regulation from differential co-expression. *PLoS ONE*, 8(11):e79729.
Yu H, Liu B-H, Ye Z-Q, Li C, Li Y-X, Li Y-Y (2011). Link-based quantitative methods to identify differentially coexpressed genes and gene pairs. *BMC bioinformatics*, 12(1):315.