...
|
...
|
@@ -48,21 +48,64 @@ What is its binding motif?
|
48
|
48
|
The MotifDb **query** function performs a broad, case-neutral text search through all the metadata (all the annotation) for all of
|
49
|
49
|
the motifs. It returns a list of matching motifs. More information about the metadata is provided below.
|
50
|
50
|
|
51
|
|
-We begin with a simple search, retrieving all motifs annotated with the case-neutral string "CTCF".
|
|
51
|
+We begin with a simple search which retrieves all motifs annotated with the case-neutral string "ctcf".
|
52
|
52
|
|
53
|
53
|
```{r load MotifDb, query CTCF, prompt=FALSE, message=FALSE, results="show"}
|
54
|
54
|
library(MotifDb)
|
55
|
|
-query(MotifDb, "CTCF")
|
|
55
|
+query(MotifDb, "ctcf")
|
56
|
56
|
```
|
57
|
57
|
|
58
|
|
-Let us sharpen the search, looking only for human Jaspar 2018, or HOCOMOCO v11 core motifs, category "A". Eliminate "CTCFL".
|
|
58
|
+Let us refine the search, looking only for human Jaspar 2018, or HOCOMOCO v11 core motifs, category "A". Eliminate "CTCFL".
|
59
|
59
|
|
60
|
60
|
```{r query CTCF human, prompt=FALSE, message=FALSE, results="show"}
|
61
|
61
|
library(MotifDb)
|
62
|
|
-query(MotifDb, andStrings=c("CTCF", "hsapiens"),
|
63
|
|
- orStrings=c("jaspar2018", "hocomocov11-core-A"),
|
64
|
|
- notStrings="ctcfl")
|
|
62
|
+motifs <- query(MotifDb, andStrings=c("CTCF", "hsapiens"),
|
|
63
|
+ orStrings=c("jaspar2018", "hocomocov11-core-A"),
|
|
64
|
+ notStrings="ctcfl")
|
|
65
|
+length(motifs)
|
65
|
66
|
```
|
|
67
|
+Motifs from different sources sometimes agree and sometimes differ. Analytical methods for comparison exist, of which two are
|
|
68
|
+
|
|
69
|
+ * Bioconductor package [DiffLogo](https://bioconductor.org/packages/release/bioc/html/DiffLogo.html)
|
|
70
|
+ * Meme Suites [Tomtom](http://meme-suite.org/doc/tomtom.html)
|
|
71
|
+
|
|
72
|
+The Biostrings function *consensusString* provides a quick and sometimes adequate comparison. In this case, this
|
|
73
|
+reveals that the two motifs are nearly identical:
|
|
74
|
+
|
|
75
|
+```{r compare CTCF motifs, prompt=FALSE, message=FALSE, results="show"}
|
|
76
|
+sapply(motifs, consensusString)
|
|
77
|
+```
|
|
78
|
+We can also inspect the similarity visually, using the Bioconductor package [seqLogo](https://bioconductor.org/packages/release/bioc/html/seqLogo.html)
|
|
79
|
+
|
|
80
|
+```{r use seqLogo, prompt=FALSE, message=FALSE, results="hide", fig.width = 5, fig.height = 5, fig.show = "hold", out.width = "50%"}
|
|
81
|
+library(seqLogo)
|
|
82
|
+seqLogo(motifs[[1]]) # Hsapiens-jaspar2018-CTCF-MA0139.1
|
|
83
|
+seqLogo(motifs[[2]]) # Hsapiens-HOCOMOCOv11-core-A-CTCF_HUMAN.H11MO.0.A
|
|
84
|
+
|
|
85
|
+```
|
|
86
|
+
|
|
87
|
+# Beware of False Precision
|
|
88
|
+
|
|
89
|
+Though we cannot offer published, peer-reviewed support for this cautionary warning, we urge you to consider
|
|
90
|
+it and its implications.
|
|
91
|
+
|
|
92
|
+One is tempted to regard curated motif matrices from respected sources as a
|
|
93
|
+reliable guide to TF/DNA binding potential. A common strategy is to match motif against sequence, retaining
|
|
94
|
+only matches above a certain threshold fidelity: for instance a *minScore* for Biostrings::matchPWM, or a p-value
|
|
95
|
+or q-value threshold for [FIMO](http://meme-suite.org/doc/fimo.html).
|
|
96
|
+
|
|
97
|
+We explored this topic (unpublished data) using recent high-quality CTCF ChIP-seq and FIMO, for which the default
|
|
98
|
+p-value sequence match threshold is 1e-4. This scatterplot shows that high-scoring ChIP-seq hits sometimes occur at
|
|
99
|
+binding sites where motif-match scores are low. We therefore suggest that motif-matching is most useful
|
|
100
|
+in conjuction with other information, for instance open chromatin from highly-resovled experiments (scATAC-seq),
|
|
101
|
+DNAse footprinting, epigenetic markers, and correlated tissue-specific, or cell-type specific gene and
|
|
102
|
+TF protein expression.
|
|
103
|
+
|
|
104
|
+```{r ChIP-vs-FIMO, eval=TRUE, echo=FALSE}
|
|
105
|
+knitr::include_graphics("GSM749704-CTCF-chipVsFimo.png")
|
|
106
|
+```
|
|
107
|
+
|
|
108
|
+
|
66
|
109
|
|
67
|
110
|
|
68
|
111
|
# References
|