Bioconductor Code: seqArchR

Name	Mode	Size
.github	040000
R	040000
inst	040000
man	040000
tests	040000
vignettes	040000
.Rbuildignore	100644	0 kb
.gitignore	100644	0 kb
DESCRIPTION	100644	2 kb
LICENSE	100644	34 kb
NAMESPACE	100644	2 kb
NEWS.md	100644	1 kb
README.md	100644	3 kb
_pkgdown.yml	100644	2 kb
codecov.yml	100644	0 kb

README.md

# seqArchR  [![DOI](https://zenodo.org/badge/188449833.svg)](https://zenodo.org/badge/latestdoi/188449833) [![codecov](https://codecov.io/gh/snikumbh/seqArchR/branch/main/graph/badge.svg?token=NEjCGuOUlW)](https://codecov.io/gh/snikumbh/seqArchR) [![R build status](https://github.com/snikumbh/seqArchR/workflows/R-CMD-check/badge.svg)](https://github.com/snikumbh/seqArchR/actions)  Note: _This package is currently under development. So, please bear with me while I put the final blocks together. Thanks for your understanding!_ seqArchR is an unsupervised, non-negative matrix factorization (NMF)-based algorithm for discovery of sequence architectures de novo. Below is a schematic of seqArchR's algorithm. <img src="https://github.com/snikumbh/seqArchR/blob/main/vignettes/seqArchR_algorithm_1080p_cropped.gif" width="550" align="center"> ## Installation ### Python scikit-learn dependency This package requires the Python module scikit-learn. Please see installation instructions [here](https://scikit-learn.org/stable/install.html). ### To install this package, use ```r if (!requireNamespace("remotes", quietly = TRUE)) { install.packages("remotes") } remotes::install_github("snikumbh/seqArchR", build_vignettes = FALSE) ``` ### Usage ```r # load package library(seqArchR) library(Biostrings) # Creation of one-hot encoded data matrix from FASTA file # You can use your own FASTA file instead inputFastaFilename <- system.file("extdata", "example_data.fa", package = "seqArchR", mustWork = TRUE) # Specifying dinuc generates dinucleotide features inputSeqsMat <- seqArchR::prepare_data_from_FASTA(inputFastaFilename, sinuc_or_dinuc = "dinuc") inputSeqsRaw <- seqArchR::prepare_data_from_FASTA(inputFastaFilename, raw_seq = TRUE) nSeqs <- length(inputSeqsRaw) positions <- seq(1, Biostrings::width(inputSeqsRaw[1])) # Set seqArchR configuration # Most arguments have default values seqArchRconfig <- seqArchR::set_config( parallelize = TRUE, n_cores = 2, n_runs = 100, k_min = 1, k_max = 20, mod_sel_type = "stability", bound = 10^-6, chunk_size = 100, result_aggl = "ward.D", result_dist = "euclid", flags = list(debug = FALSE, time = TRUE, verbose = TRUE, plot = FALSE) ) # ### Call/Run seqArchR seqArchRresult <- seqArchR::seqArchR(config = seqArchRconfig, seqs_ohe_mat = inputSeqsMat, seqs_raw = inputSeqsRaw, seqs_pos = positions, total_itr = 2, set_ocollation = c(TRUE, FALSE)) ``` # Contact Comments, suggestions, enquiries/requests are welcome! Feel free to email sarvesh.nikumbh@gmail.com or [create an new issue](https://github.com/snikumbh/seqArchR/issues/new)