Name Mode Size
R 040000
man 040000
tests 040000
vignettes 040000
.Rbuildignore 100644 0 kb
.gitignore 100644 0 kb
DESCRIPTION 100644 1 kb
LICENSE 100644 0 kb
LICENSE.md 100644 1 kb
NAMESPACE 100644 0 kb
NEWS.md 100644 0 kb
README.md 100644 2 kb
README.md
# GEOfastq ### Install GEOfastq To download and install `GEOfastq`: ```R install.packages('remotes') remotes::install_github('alexvpickering/GEOfastq') ``` ### Install Aspera Connect (optional) `GEOfastq` can use [aspera connect](https://downloads.asperasoft.com/en/downloads/8?list) to download fastqs. It is faster than ftp for large single-file downloads (single-cell fastqs). To download and install it according to the [documentation](https://downloads.asperasoft.com/en/documentation/8). For me (Fedora 30), this works: ```bash wget https://download.asperasoft.com/download/sw/connect/3.9.6/ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz tar -zxvf ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.tar.gz ./ibm-aspera-connect-3.9.6.173386-linux-g2.12-64.sh ``` I also had to make sure `ascp` was on the the `PATH`: ```bash echo 'export PATH=$HOME/.aspera/connect/bin:$PATH' >> ~/.bashrc source ~/.bashrc ``` For Rstudio to find `ascp` on the `PATH`, I also had to add this to a .Renviron: ```bash echo 'PATH=${HOME}/.aspera/connect/bin:${PATH}' >> ./Renviron ``` After restarting Rstudio, to confirm things are set up properly: ```R # should have the above path added Sys.getenv('PATH') # should print info about Aspera Connect system2('ascp', '--version') ``` ### Install docker image To install `GEOfastq` and Aspera Connect from a pre-built docker image: ```bash # retrieve pre-built geofastq docker image docker pull alexvpickering/geofastq # run interactive container with host portion of #`-v host:container` mounted where you want to persist data to sudo docker run -it --rm \ -v /srv:/srv \ geofastq /bin/bash ``` ### Usage First crawl a study page on [GEO](https://www.ncbi.nlm.nih.gov/geo/) to get study metadata and corresponding fastq.gz download links on [ENA](https://www.ebi.ac.uk/ena): ```R library(GEOfastq) gse_name <- 'GSE117570' #' gse_text <- crawl_gse(gse_name) #' gsm_names <- extract_gsms(gse_text) #' srp_meta <- crawl_gsms(gsm_names) ``` Next, subset `srp_meta` to samples that you want, then download: ```R srp_meta <- srp_meta[srp_meta$source_name == 'Adjacent normal', ] get_fastqs(srp_meta, data_dir = tempdir()) ``` That's all folks! GOTO: `kallisto`?