Name Mode Size
R 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 1 kb
NAMESPACE 100644 0 kb
README.Rmd 100644 3 kb
README.md 100644 3 kb
README.md
This repository demonstrates the use of the *pengls* package for high-dimensional data with spatial or temporal autocorrelation. It consists of an iterative loop around the *nlme* and *glmnet* packages. Currently, only continuous outcomes and \(R^2\) as performance measure are implemented. # Installation instuctions The *pengls* package is available from BioConductor, and can be installed as follows: ``` r library(BiocManager) install("pengls") ``` Once installed, it can be loaded and version info printed. ``` r suppressPackageStartupMessages(library(pengls)) cat("pengls package version", as.character(packageVersion("pengls")), "\n") ``` ## pengls package version 0.99.5 # Illustration We first create a toy dataset with spatial coordinates. ``` r library(nlme) n <- 75 #Sample size p <- 100 #Number of features g <- 10 #Size of the grid #Generate grid Grid <- expand.grid("x" = seq_len(g), "y" = seq_len(g)) # Sample points from grid without replacement GridSample <- Grid[sample(nrow(Grid), n, replace = FALSE),] #Generate outcome and regressors b <- matrix(rnorm(p*n), n , p) a <- rnorm(n, mean = b %*% rbinom(p, size = 1, p = 0.25), sd = 0.1) #25% signal #Compile to a matrix df <- data.frame("a" = a, "b" = b, GridSample) ``` The *pengls* method requires prespecification of a functional form for the autocorrelation. This is done through the *corStruct* objects defined by the *nlme* package. We specify a correlation decaying as a Gaussian curve with distance, and with a nugget parameter. The nugget parameter is a proportion that indicates how much of the correlation structure explained by independent errors; the rest is attributed to spatial autocorrelation. The starting values are chosen as reasonable guesses; they will be overwritten in the fitting process. ``` r # Define the correlation structure (see ?nlme::gls), with initial nugget 0.5 and range 5 corStruct <- corGaus(form = ~ x + y, nugget = TRUE, value = c("range" = 5, "nugget" = 0.5)) ``` Finally the model is fitted with a single outcome variable and large number of regressors, with the chosen covariance structure and for a prespecified penalty parameter \(\lambda=0.2\). ``` r #Fit the pengls model, for simplicity for a simple lambda penglsFit <- pengls(data = df, outVar = "a", xNames = grep(names(df), pattern = "b", value =TRUE), glsSt <- corStruct, lambda = 0.2, verbose = TRUE) ``` ## Starting iterations... ## Iteration 1 ## Iteration 2 ## Iteration 3 Standard extraction functions like print(), coef() and predict() are defined for the new “pengls” object. ``` r penglsFit ``` ## pengls model with correlation structure: corGaus ## and 46 non-zero coefficients ``` r penglsCoef <- coef(penglsFit) penglsPred <- predict(penglsFit) ```