Name Mode Size
R 040000
data 040000
inst 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
DESCRIPTION 100644 1 kb
NAMESPACE 100644 0 kb
NEWS 100644 1 kb
README.md 100644 4 kb
README.md
# CCAFE **C**ase **C**ontrol **A**llele **F**requency (AF) **E**stimation R Package This repository contains the source code for the CaseControlAF R package which can be used to reconstruct the allele frequency (AF) for cases and controls separately given commonly available summary statistics. The package contains two functions: 1) CaseControl_AF 2) CaseControl_SE See full documentation, vignettes, and examples here: (https://wolffha.github.io/CCAFE_documentation/) ## Download the package To install this package using BioConductor (Not yet available): ```R if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("CCAFE") ``` To download this package using *devtools* in R: ```R require(devtools) devtools::install_github("https://github.com/wolffha/CCAFE/") ``` ## CaseControl_AF Use this function when you have the following statistics (for each variant) * Number of cases * Number of controls * Odds Ratio (OR) or beta coefficient * **AF** (allele frequency) for the total sample (cases and controls combined) ### Usage **data**: a dataframe with a row for each variant and columns for OR and total AF **N_case**: an integer for the number of case samples **N_control**: an integer for the number of control samples **OR_colname**: a string containing the exact column name in 'data' with the OR **AF_total_colname**: a string containing the exact column name in 'data' with the total AF Returns a dataframe with two columns: AF_case and AF_control. The number of rows is equal to the number of variants. ## CaseControl_SE Use this function when you have the following statistics (for each variant) * Number of cases * Number of controls * Odds Ratio (OR) or beta coefficient * **SE** of the log(OR) for each variant *Code adapted from ReACt GroupFreq function available here: (https://github.com/Paschou-Lab/ReAct/blob/main/GrpPRS_src/CountConstruct.c)* ### Usage **data**: a dataframe where each row is a variant and columns for the OR, SE, chromosome, and position **N_case**: an integer for the number of case samples **N_control**: an integer for the number of control samples **OR_colname**: a string containing the exact column name in *data* with the odds ratios **SE_colname**: a string containing the exact column name in *data* with the standard errors **position_colname**: a string containing the exact column name in *data* with the positions of the variants **chromosome_colname**: a string containing the exact column name in *data* with the chromosome of the variants. Note, sex chromosomes can be either characters ('X', 'x', 'Y', 'y') or numeric where X=23 and Y=24 **sex_chromosomes**: boolean, TRUE if variants from sex chromosome(s) are included in the dataset **do_correction**: boolean, TRUE if data is provided to correct the estimates using proxy MAFs **remove_sex_chromosomes**: boolean, TRUE if variants on sex chromosomes should be removed. This is only necessary if *sex_chromosomes* == TRUE and the number of XX/XY individuals per case and control sample is NOT known CaseControl_SE has the following optional inputs: If *sex_chromosomes* == TRUE and *remove_sex_chromosomes* == FALSE, then the following inputs are required: **N_XX_case**: the number of XX chromosome case individuals **N_XX_control**: the number of XX chromosome control individuals **N_XY_case**: the number of XY chromosome case individuals **N_XY_control**: the number of XY chromosome control individuals If *do_correction* == TRUE, then data must be provided that includes harmonized data with proxy MAFs **correction_data**: a dataframe with the following EXACT column names: CHR, POS, proxy_MAF, containing data for variants harmonized between the observed and proxy datasets Returns the *data* dataframe with three additional columns with names: MAF_case, MAF_control and MAF_total containing the estimated minor allele frequency in the cases, controls, and total sample. The number of rows is equal to the number of variants. If proxyMAFs_colname is not NA, will include three additional columns containing the adjusted estimated MAFs (MAF_case_adj, MAF_control_adj, MAF_total_adj) **NOTE:** This method assumes we are estimating the minor allele frequency (MAF) ### Examples and documentation See full documentation, vignettes, and examples here: (https://wolffha.github.io/CCAFE_documentation/)