Bioconductor Code: ReUseData

Name	Mode	Size
.github	040000
R	040000
docs	040000
inst	040000
man	040000
tests	040000
vignettes	040000
.BBSoptions	100644	0 kb
.Rbuildignore	100644	0 kb
.gitignore	100644	0 kb
DESCRIPTION	100644	2 kb
LICENSE.md	100644	34 kb
NAMESPACE	100644	1 kb
NEWS	100644	0 kb
README.md	100644	4 kb
_pkgdown.yml	100644	0 kb

README.md

[![R-CMD-check](https://github.com/rworkflow/ReUseData/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/rworkflow/ReUseData/actions/workflows/R-CMD-check.yaml) # Introduction ReUseData is an _R/Bioconductor_ software tool to provide a systematic and versatile approach for standardized and reproducible data management. ReUseData facilitates transformation of shell or other ad hoc scripts for data preprocessing into workflow-based data recipes. Evaluation of data recipes generate curated data files in their generic formats (e.g., VCF, bed). Both recipes and data are cached using database infrastructure for easy data management and reuse. Prebuilt data recipes are available through ReUseData portal ("https://rcwl.org/dataRecipes/") with full annotation and user instructions. Pregenerated data are available through ReUseData cloud bucket that is directly downloadable through "getCloudData()". This quick start shows the basic use of package functions in 2 major categories for managing: - Data recipes - Reusable data Details for each section can be found in the other vignettes `ReUseData_recipe.html` and `ReUseData_data.html`. # Package installation ``` BiocManager::install(c("ReUseData", "Rcwl")) library(ReUseData) ``` # Data recipes All pre-built data recipes are included in the package and can be easily updated (`recipeUpdate`), searched (`recipeSearch`) and loaded (`recipeLoad`). Details about data recipes can be found in the vignette `ReUseData_recipe.html`. ## Search and load a data recipe ``` recipeUpdate(cachePath = "ReUseDataRecipe", force = TRUE) recipeSearch("echo") recipeLoad("echo_out", return = TRUE) ``` ## Evaluate a data recipe A data recipe can be evaluated by assigning values to the recipe parameters. `getData` runs the recipe as a CWL scripts internally, and generates the data of interest with annotation files for future reuse. ``` Rcwl::inputs(echo_out) echo_out$input <- "Hello World!" echo_out$outfile <- "outfile" outdir <- file.path(tempdir(), "SharedData") res <- getData(echo_out, outdir = outdir, notes = c("echo", "hello", "world", "txt")) res$out readLines(res$out) ``` ## Create your own data recipes One can create a data recipe from scratch or by converting an existing shell script for data processing, by specifying input parameters, output globbing patterns using `recipeMake` function. ``` script <- system.file("extdata", "echo_out.sh", package = "ReUseData") rcp <- recipeMake(shscript = script, paramID = c("input", "outfile"), paramType = c("string", "string"), outputID = "echoout", outputGlob = "*.txt") Rcwl::inputs(rcp) Rcwl::outputs(rcp) ``` # Reusable data The data that are generated from evaluating data recipes are automatically annotated and tracked with user-specified keywords and time/date tags. It uses a similar cache system as for recipes for users to easily update (`dataUpdate`), search (`dataSearch`) and use (`toList`). Pre-generated data files from existing data recipes are saved in Google Cloud Bucket, that are ready to be queried (`dataSearch(cloud=TRUE)`) and downloaded (`getCloudData`) to local cache system with annotations. ## Update data files that are generated using `ReUseData` ``` dh <- dataUpdate(dir = outdir) dataSearch(c("echo", "hello")) dataNames(dh) dataParams(dh) dataNotes(dh) ``` ## Export data into workflow-ready files ``` toList(dh, format="json", file = file.path(outdir, "data.json")) ``` ## Download pregenerated data from Google Cloud ``` dh <- dataUpdate(dir = outdir, cloud = TRUE) getCloudData(dh[2], outdir = outdir) ```