# SEMPLR <a href="https://grkenney.github.io/SEMplR"><img src="man/figures/SEMplR-new.png" align="right" height="200" alt="SEMplR website" style="float:right; height:200px;" /></a>
## Overview
SEMPLR (SNP Effect Matrix Pipeline in R) is an R package that predicts
transcription factor (TF) binding. SEMPLR can be used to predict binding
affinity of TFs at genomic loci or predict the affect of genetic variation on
TF binding.
SEMPLR scores genomic regions or sequences of interest against SNP Effect
Matrices (SEMs). SEMs are position x nucleotide matrix, generated by
integrating information from position weighted matrices (PWMs), ChIP-seq,
and DNase-seq data. This integration of binding data means that motif analysis
with SEMs is more indicative of true binding potential compared to traditional
motif analyses with PWMs where scores are more indicative of sequence
similarity to consensus motifs. You can read more about SEMs and how they are
generated in the original
[SEMpl paper](https://doi.org/10.1093/bioinformatics/btz612).
This package extends the functionality of the
[SEMpl](https://github.com/Boyle-Lab/SEMpl) (SNP Effect Matrix pipeline)
command line tool developed by the Boyle Lab at the University of Michigan. To
support data analysis and visualizations with SEMs.
## Citation
If you use SEMPLR in your work, please also cite SEMpl:
Sierra S Nishizaki, Natalie Ng, Shengcheng Dong, Robert S Porter,
Cody Morterud, Colten Williams, Courtney Asman, Jessica A Switzenberg,
Alan P Boyle, Predicting the effects of SNPs on transcription factor binding
affinity, *Bioinformatics*, Volume 36, Issue 2, 15 January 2020, Pages 364–372,
https://doi.org/10.1093/bioinformatics/btz612
## Installation
```
devtools::install_github("grkenney/SEMPLR")
```
## Basic Usage
Below are some examples of basic usage. Please see the
[vignette](https://grkenney.github.io/SEMPLR/) for more detailed workflow
examples.
### Predicting transcription factor binding
SEMPLR accepts GRanges objects or lists of sequences to score. Here, we analyze
two loci with SEMPLR's default set of 223 pre-computed SEMs, stored in the
`SEMC` object. The `scoreBinding` function produces a data object with
information about the ranges analyzed, SEM meta data, and a table with 446 rows
(an entry for each loci and SEM combination).
```
library(BSgenome.Hsapiens.UCSC.hg19)
# load the default set of SEMs
# define genomic loci to score
gr <- GenomicRanges::GRanges(seqnames = c("chr12", "chr19"),
ranges = c(94136009, 10640062))
# score TF binding at each loci
sb <- scoreBinding(gr,
sem = SEMC,
genome = Hsapiens)
```
When analyzing large sets of loci, it can be helpful to know if one
or more TFs are bound more than we would expect by chance. SEMPLR includes
enrichment and plotting functions to address this question.
```
# compute enrichment
e <- enrichSEMs(sb, SEMC)
# plot enrichment results
plotEnrich(e, SEMC)
```
### Predicting effect of genetic variation on transcription factor binding
SEMPLR accepts both VRanges and GRanges objects, specifying a reference an
alternative allele. Every variant is scored against every SEM and a scoring is
done for each allele independently.
The resulting object contains three slots containing the variants scored, SEM
meta data, and the scoring table. These can be accessed with the `variants()`,
`semData()`, and `scores()` functions respectively.
```
vr <- VRanges(seqnames = c("chr12", "chr19"),
ranges = c(94136009, 10640062),
ref = c("G", "T"), alt = c("C", "A"),
id = c("A", "B"))
sv <- scoreVariants(vr = vr,
sem = SEMC,
genome = Hsapiens)
```
SEMPLR includes two plotting functions to help users predict (1) which
TFs change binding with a genetic variant and (2) which variants change the
binding of a TF.
```
plotSEMVariants(s, "IKZF1")
plotSEMMotifs(s, "A")
```
Please see more information on these plots and their interpretation in our
[vignette](https://grkenney.github.io/SEMPLR/).