Browse code

more sections added

paul-shannon authored on 10/04/2020 15:46:50
Showing2 changed files

1 1
new file mode 100644
2 2
Binary files /dev/null and b/vignettes/GSM749704-CTCF-chipVsFimo.png differ
... ...
@@ -48,21 +48,64 @@ What is its binding motif?
48 48
 The MotifDb **query** function performs a broad, case-neutral text search through all the metadata (all the annotation) for all of
49 49
 the motifs.  It returns a list of matching motifs.  More information about the metadata is provided below.
50 50
 
51
-We begin with a simple search, retrieving all motifs annotated with the case-neutral string "CTCF".
51
+We begin with a simple search which retrieves all motifs annotated with the case-neutral string "ctcf".
52 52
 
53 53
 ```{r load MotifDb, query CTCF, prompt=FALSE, message=FALSE, results="show"}
54 54
 library(MotifDb)
55
-query(MotifDb, "CTCF")
55
+query(MotifDb, "ctcf")
56 56
 ```
57 57
 
58
-Let us sharpen the search, looking only for human Jaspar 2018, or HOCOMOCO v11 core motifs, category "A". Eliminate "CTCFL".
58
+Let us refine the search, looking only for human Jaspar 2018, or HOCOMOCO v11 core motifs, category "A". Eliminate "CTCFL".
59 59
 
60 60
 ```{r query CTCF human, prompt=FALSE, message=FALSE, results="show"}
61 61
 library(MotifDb)
62
-query(MotifDb, andStrings=c("CTCF", "hsapiens"),
63
-               orStrings=c("jaspar2018", "hocomocov11-core-A"),
64
-               notStrings="ctcfl")
62
+motifs <- query(MotifDb, andStrings=c("CTCF", "hsapiens"),
63
+                orStrings=c("jaspar2018", "hocomocov11-core-A"),
64
+                notStrings="ctcfl")
65
+length(motifs)
65 66
 ```
67
+Motifs from different sources sometimes agree and sometimes differ.  Analytical methods for comparison exist, of which two are
68
+
69
+    * Bioconductor package [DiffLogo](https://bioconductor.org/packages/release/bioc/html/DiffLogo.html)
70
+    * Meme Suites [Tomtom](http://meme-suite.org/doc/tomtom.html)
71
+
72
+The Biostrings function *consensusString* provides  a quick and sometimes adequate comparison.  In this case, this
73
+reveals that the two motifs are nearly identical:
74
+
75
+```{r compare CTCF motifs, prompt=FALSE, message=FALSE, results="show"}
76
+sapply(motifs, consensusString)
77
+```
78
+We can also inspect the similarity visually, using the Bioconductor package [seqLogo](https://bioconductor.org/packages/release/bioc/html/seqLogo.html)
79
+
80
+```{r use seqLogo, prompt=FALSE, message=FALSE, results="hide", fig.width = 5, fig.height = 5, fig.show = "hold", out.width = "50%"}
81
+library(seqLogo)
82
+seqLogo(motifs[[1]])  # Hsapiens-jaspar2018-CTCF-MA0139.1
83
+seqLogo(motifs[[2]])  # Hsapiens-HOCOMOCOv11-core-A-CTCF_HUMAN.H11MO.0.A
84
+
85
+```
86
+
87
+# Beware of False Precision
88
+
89
+Though we cannot offer published, peer-reviewed support for this cautionary warning, we urge you to consider
90
+it and its implications.
91
+
92
+One is tempted to regard curated motif matrices from respected sources as a
93
+reliable guide to TF/DNA binding potential.  A common strategy is to match motif against sequence, retaining
94
+only matches above a certain threshold fidelity:  for instance a *minScore* for Biostrings::matchPWM, or a p-value
95
+or q-value threshold for [FIMO](http://meme-suite.org/doc/fimo.html).
96
+
97
+We explored this topic (unpublished data) using recent high-quality CTCF ChIP-seq and FIMO, for which the default
98
+p-value sequence match threshold is 1e-4.  This scatterplot shows that high-scoring ChIP-seq hits sometimes occur at
99
+binding sites where motif-match scores are low.  We therefore suggest that motif-matching is most useful
100
+in conjuction with other information, for instance open chromatin from highly-resovled experiments (scATAC-seq),
101
+DNAse footprinting, epigenetic markers, and correlated tissue-specific, or cell-type specific gene and
102
+TF protein expression.
103
+
104
+```{r ChIP-vs-FIMO, eval=TRUE, echo=FALSE}
105
+knitr::include_graphics("GSM749704-CTCF-chipVsFimo.png")
106
+```
107
+
108
+
66 109
 
67 110
 
68 111
 # References