Browse code

2.99.4

ramon diaz-uriarte (at Phelsuma) authored on 17/12/2020 15:07:07
Showing 1 changed files
... ...
@@ -97,7 +97,10 @@ diversityLOD(llod)
97 97
   descent in a given simulation. In v. 2.9.1 we also returned the LOD
98 98
   as explained above. Now we only return the LOD as defined above.
99 99
   
100
-  
100
+  Beware, however, that if you use multiple initial mutants the LOD
101
+function will probably not do what you want. It is not even clear that
102
+the LOD is well defined in this case. We are working on this.
103
+
101 104
 }
102 105
 
103 106
 \value{
Browse code

v. 2.99.3

ramon diaz-uriarte (at Phelsuma) authored on 13/12/2020 14:35:47
Showing 1 changed files
... ...
@@ -168,7 +168,13 @@ pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4"
168 168
 
169 169
 
170 170
 pancr1 <- oncoSimulIndiv(pancr, model = "Exp")
171
-pancr8 <- oncoSimulPop(8, pancr, model = "Exp",
171
+
172
+RNGkind("L'Ecuyer-CMRG")
173
+set.seed(3)
174
+pancr8 <- oncoSimulPop(3, pancr, model = "Exp",
175
+                       finalTime = 600,
176
+                       onlyCancer = TRUE,
177
+                       seed = NULL,
172 178
                        mc.cores = 2)
173 179
 
174 180
 POM(pancr1)
Browse code

2.17.9: POM doc and adapt to stringsAsFactors = FALSE

ramon diaz-uriarte (at Phelsuma) authored on 17/03/2020 12:31:37
Showing 1 changed files
... ...
@@ -46,7 +46,7 @@ diversityLOD(llod)
46 46
 \item{llod}{A list of LODs, as returned from \code{LOD} on an object of
47 47
   class \code{oncosimulpop}.}
48 48
 
49
-\item{...}{Other arguments passed to methods (ignored now).}
49
+% \item{...}{Other arguments passed to methods (ignored now).}
50 50
 }
51 51
 
52 52
 \details{
Browse code

v. 2.9.2 - LOD: using only the strict Szendro et al. meaning. - POM: computed in C++. - Using fitness landscape directly when given as input (no conversion to epistasis)

ramon diaz-uriarte (at Phelsuma) authored on 24/11/2017 12:41:48
Showing 1 changed files
... ...
@@ -34,8 +34,11 @@ diversityLOD(llod)
34 34
 
35 35
 \arguments{ \item{x}{An object of class \code{oncosimulpop} (version >=
36 36
   2, so simulations with the old poset specification will not work) or
37
-  class \code{oncosimul2} (a single simulation). For \code{LOD}
38
-  simulations must have been run with \code{keepPhylog = TRUE}.}
37
+  class \code{oncosimul2} (a single simulation). }
38
+
39
+%% \item{strict}{If TRUE, a single LOD as in Szendro et al. See Details.
40
+%%   If FALSE, simulations must have been run with \code{keepPhylog = TRUE}
41
+%%   to compute all possible LODs (see Details).}
39 42
 
40 43
 \item{lpom}{A list of POMs, as returned from \code{POM} on an object of
41 44
   class \code{oncosimulpop}.}
... ...
@@ -49,35 +52,50 @@ diversityLOD(llod)
49 52
 \details{
50 53
 
51 54
   Lines of Descent (LOD) and Path of the Maximum (POM) were defined in
52
-  Szendro et al. (2013) and I follow those definitions here as closely
53
-  as possible, as applied to a process in continuous time with sampling
54
-  at user-specified periods.
55
-
56
-  For POM, the results can depend strongly on how often we sample and
57
-  keep samples (i.e., the \code{sampleEvery} and \code{keepEvery}
58
-  arguments to \code{oncoSimulIndiv} and \code{oncoSimulPop}), since the
59
-  POM is computed from the values stored in the \code{pops.by.time}
60
-  matrix. This also explains why it is generally meaningless to use POM on
61
-  \code{oncoSimulSample} runs: these only keep the very last sample.
62
-
63
-
64
-  For LOD my implementation is not exactly identical to the definition
65
-  given in p. 572 of Szendro et al. (2013). First, in case this might be
66
-  useful, for each simulation I keep all the paths that
67
-  "(...) arrive at the most populated genotype at the final time" (first
68
-  paragraph in p. 572 of Szendro et al.), whereas they only keep one
69
-  (see second column of p. 572). However, I do provide a single LOD for
70
-  each run, too. This is the first path to arrive at the genotype that
71
-  eventually becomes the most populated genotype at the final time (and,
72
-  in this sense, agrees with the LOD of Szendro et al.). However, in
73
-  contrast to what is apparently done in Szendro
74
-  ("A given genotype may undergo several episodes of colonization and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step."),
75
-  I do not check that this genotype (which is the one that will become
76
-  the most populated at final time) does not become extinct before the
77
-  final colonization. So there could be other paths (all in
78
-  \code{all_paths}) that are actually the one(s) that are colonizers of
79
-  the most populated genotype (with no extinction before the final
80
-  colonization).
55
+  Szendro et al. (2013) and I follow those definitions here, as applied
56
+  to a process in continuous time with sampling at user-specified
57
+  periods.
58
+
59
+  For POM, the results can depend strongly on how often we sample (i.e.,
60
+  the \code{sampleEvery} argument to \code{oncoSimulIndiv} and
61
+  \code{oncoSimulPop}), since the POM is computed by finding the clone
62
+  with largest population size whenever we sample.%% from the values
63
+  %% stored in the \code{pops.by.time} matrix.
64
+  This also explains why
65
+  it is generally meaningless to use POM on \code{oncoSimulSample} runs:
66
+  these only keep the very last sample.
67
+
68
+
69
+  For LOD, %% when using \code{strict = TRUE}, 
70
+  a single LOD per simulation
71
+  is returned, with the same meaning as that in p. 572 of Szendro et
72
+  al. (2013). "A given genotype may undergo several episodes of colonization and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step.",
73
+  and I check that this genotype (which is the one that will become the
74
+  most populated at final time) does not become extinct before the final
75
+  colonization.
76
+
77
+  %% If \code{strict = FALSE}, and if you have run the simulations with
78
+  %% \code{keepPhylog = TRUE}, then a I return both \code{all_paths} and
79
+  %% \code{lod_single}, with meanings as follow.  First, in case this might
80
+  %% be useful, for each simulation I keep all the paths that
81
+  %% "(...) arrive at the most populated genotype at the final time" (first
82
+  %% paragraph in p. 572 of Szendro et al.), and these are stored in
83
+  %% \code{all_paths}.  When \code{strict = FALSE} I also provide another
84
+  %% single LOD for each run, too. This is the first path to arrive at the
85
+  %% genotype that eventually becomes the most populated genotype at the
86
+  %% final time (and, in this sense, agrees with the LOD of Szendro et
87
+  %% al.). However, in contrast to what is done in Szendro
88
+  %% ("A given genotype may undergo several episodes of colonization and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step.")
89
+  %% and when \code{strict = TRUE}, I do not check that this genotype
90
+  %% (which is the one that will become the most populated at final time)
91
+  %% does not become extinct before the final colonization. So there could
92
+  %% be other paths (all in \code{all_paths}) that are actually the one(s)
93
+  %% that are colonizers of the most populated genotype (with no extinction
94
+  %% before the final colonization).
95
+
96
+  Note \emph{breaking changes}: for LOD we used to return all lines of
97
+  descent in a given simulation. In v. 2.9.1 we also returned the LOD
98
+  as explained above. Now we only return the LOD as defined above.
81 99
   
82 100
   
83 101
 }
... ...
@@ -89,18 +107,29 @@ diversityLOD(llod)
89 107
   the ordered set of genotypes that contain the largest subpopulation at
90 108
   the times of sampling.
91 109
 
92
-  For \code{LOD}, if \code{x} is a single simulation, a two-element
93
-  list. The first, \code{all_paths}, contains all paths to the
94
-  maximum. The second, \code{lod_single}, contain the single LOD which
95
-  is closest in meaning to the original definition of Szendro et
96
-  al. (See "Details"). If \code{x} is a list (population)  of
97
-  simulations, then a list where each element is a two-element list, as
98
-  just explained. All the lists contain objects of class "igraph.vs" (an
99
-  igraph vertex sequence: see \code{\link[igraph]{vertex_attr}}).  
110
+  For \code{LOD}, if \code{x} is a single simulation, the line of
111
+  descent as defined above (either an object of class "igraph.vs" (an
112
+  igraph vertex sequence: see \code{\link[igraph]{vertex_attr}}) or a
113
+  character vector if there were no descendants). If \code{x} is a list
114
+  (population) of simulations, then a list where each element is a list
115
+  as just explained.
116
+
117
+  %% a two-element
118
+  %% list. If \code{strict = TRUE}, only \code{lod_single} is returned. If
119
+  %% \code{strict = FALSE} (and simulations were run with \code{keepPhylog
120
+  %% = TRUE}), \code{all_paths} contains all paths to the maximum, and
121
+  %% \code{lod_single} contains the single LOD which first arrives at the
122
+  %% maximum.
123
+
124
+  %% If \code{x} is a list (population) of simulations, then a list
125
+  %% where each element is a two-element list, as just explained.
126
+  %% All the lists
127
+  %% contain objects of class "igraph.vs" (an igraph vertex sequence: see
128
+  %% \code{\link[igraph]{vertex_attr}}).
100 129
  
101 130
   For \code{diversityLOD} and \code{diversityPOM} a single element
102
-  vector with the Shannon's diversity (entropy) of the \code{lod_single}
103
-  (for \code{diversityLOD}) or of the POMs (for \code{diversityPOM}).
131
+  vector with the Shannon's diversity (entropy) of the LODs (for
132
+  \code{diversityLOD}) or of the POMs (for \code{diversityPOM}).
104 133
 
105 134
 }
106 135
 
... ...
@@ -138,8 +167,8 @@ pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4"
138 167
                                       typeDep = "MN"))
139 168
 
140 169
 
141
-pancr1 <- oncoSimulIndiv(pancr, model = "Exp", keepPhylog = TRUE)
142
-pancr8 <- oncoSimulPop(8, pancr, model = "Exp", keepPhylog = TRUE,
170
+pancr1 <- oncoSimulIndiv(pancr, model = "Exp")
171
+pancr8 <- oncoSimulPop(8, pancr, model = "Exp",
143 172
                        mc.cores = 2)
144 173
 
145 174
 POM(pancr1)
Browse code

v.2.5.2 - Lots and lots of addition to vignette including benchmarks. - Diversity of sampled genotypes. - Genotyping error can be added in samplePop. - LOD and POM (lines of descent, path of maximum, sensu Szendro et al.). - simOGraph can also out rT data frames. - Better (and better explained) estimates of simulation error for McFL.

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@124982 bc3139a8-67e5-0310-9ffc-ced21a209358

Ramon Diaz-Uriarte authored on 10/12/2016 16:05:05
Showing 1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,168 @@
1
+\name{POM}
2
+\alias{POM}
3
+\alias{LOD}
4
+\alias{diversityPOM}
5
+\alias{diversityLOD}
6
+\alias{POM.oncosimul2}
7
+\alias{LOD.oncosimul2}
8
+\alias{POM.oncosimulpop}
9
+\alias{LOD.oncosimulpop}
10
+
11
+
12
+\title{
13
+  Obtain Lines of Descent and Paths of the Maximum and their diversity from simulations.
14
+}
15
+
16
+\description{
17
+  
18
+  Compute Lines of Descent (LOD) and Path of the Maximum (POM) for a
19
+  single simulation or a set of simulations (from \code{oncoSimulPop}).
20
+
21
+  \code{diversityPOM} and \code{diversityLOD} return the Shannon's
22
+  diversity (entropy) of the POM and LOD, respectively, of a set of
23
+  simulations (it makes no sense to compute those from a single simulation).
24
+  
25
+}
26
+
27
+\usage{
28
+
29
+POM(x)
30
+LOD(x)
31
+diversityPOM(lpom)
32
+diversityLOD(llod)
33
+}
34
+
35
+\arguments{ \item{x}{An object of class \code{oncosimulpop} (version >=
36
+  2, so simulations with the old poset specification will not work) or
37
+  class \code{oncosimul2} (a single simulation). For \code{LOD}
38
+  simulations must have been run with \code{keepPhylog = TRUE}.}
39
+
40
+\item{lpom}{A list of POMs, as returned from \code{POM} on an object of
41
+  class \code{oncosimulpop}.}
42
+
43
+\item{llod}{A list of LODs, as returned from \code{LOD} on an object of
44
+  class \code{oncosimulpop}.}
45
+
46
+\item{...}{Other arguments passed to methods (ignored now).}
47
+}
48
+
49
+\details{
50
+
51
+  Lines of Descent (LOD) and Path of the Maximum (POM) were defined in
52
+  Szendro et al. (2013) and I follow those definitions here as closely
53
+  as possible, as applied to a process in continuous time with sampling
54
+  at user-specified periods.
55
+
56
+  For POM, the results can depend strongly on how often we sample and
57
+  keep samples (i.e., the \code{sampleEvery} and \code{keepEvery}
58
+  arguments to \code{oncoSimulIndiv} and \code{oncoSimulPop}), since the
59
+  POM is computed from the values stored in the \code{pops.by.time}
60
+  matrix. This also explains why it is generally meaningless to use POM on
61
+  \code{oncoSimulSample} runs: these only keep the very last sample.
62
+
63
+
64
+  For LOD my implementation is not exactly identical to the definition
65
+  given in p. 572 of Szendro et al. (2013). First, in case this might be
66
+  useful, for each simulation I keep all the paths that
67
+  "(...) arrive at the most populated genotype at the final time" (first
68
+  paragraph in p. 572 of Szendro et al.), whereas they only keep one
69
+  (see second column of p. 572). However, I do provide a single LOD for
70
+  each run, too. This is the first path to arrive at the genotype that
71
+  eventually becomes the most populated genotype at the final time (and,
72
+  in this sense, agrees with the LOD of Szendro et al.). However, in
73
+  contrast to what is apparently done in Szendro
74
+  ("A given genotype may undergo several episodes of colonization and extinction that are stored by the algorithm, and the last episode before the colonization of the final state is used to construct the step."),
75
+  I do not check that this genotype (which is the one that will become
76
+  the most populated at final time) does not become extinct before the
77
+  final colonization. So there could be other paths (all in
78
+  \code{all_paths}) that are actually the one(s) that are colonizers of
79
+  the most populated genotype (with no extinction before the final
80
+  colonization).
81
+  
82
+  
83
+}
84
+
85
+\value{
86
+
87
+  For \code{POM} either a character vector (if \code{x} is a single
88
+  simulation) or a list of character vectors. Each character vector is
89
+  the ordered set of genotypes that contain the largest subpopulation at
90
+  the times of sampling.
91
+
92
+  For \code{LOD}, if \code{x} is a single simulation, a two-element
93
+  list. The first, \code{all_paths}, contains all paths to the
94
+  maximum. The second, \code{lod_single}, contain the single LOD which
95
+  is closest in meaning to the original definition of Szendro et
96
+  al. (See "Details"). If \code{x} is a list (population)  of
97
+  simulations, then a list where each element is a two-element list, as
98
+  just explained. All the lists contain objects of class "igraph.vs" (an
99
+  igraph vertex sequence: see \code{\link[igraph]{vertex_attr}}).  
100
+ 
101
+  For \code{diversityLOD} and \code{diversityPOM} a single element
102
+  vector with the Shannon's diversity (entropy) of the \code{lod_single}
103
+  (for \code{diversityLOD}) or of the POMs (for \code{diversityPOM}).
104
+
105
+}
106
+
107
+\references{
108
+
109
+  Szendro, I. G., Franke, J., Visser, J. A. G. M. de, & Krug,
110
+  J. (2013). Predictability of evolution depends nonmonotonically on
111
+  population size. \emph{Proceedings of the National Academy of Sciences},
112
+  110(2), 571-576. \url{https://doi.org/10.1073/pnas.1213613110}
113
+
114
+}
115
+
116
+\author{
117
+  Ramon Diaz-Uriarte
118
+}
119
+
120
+\seealso{
121
+  \code{\link{oncoSimulPop}}, \code{\link{oncoSimulIndiv}}
122
+  
123
+}
124
+
125
+\examples{
126
+
127
+######## Using a poset for pancreatic cancer from Gerstung et al.
128
+###      (s and sh are made up for the example; only the structure
129
+###       and names come from Gerstung et al.)
130
+
131
+pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A", 
132
+                                          "TP53", "TP53", "MLL3"),
133
+                                      child = c("KRAS","SMAD4", "CDNK2A", 
134
+                                          "TP53", "MLL3",
135
+                                          rep("PXDN", 3), rep("TGFBR2", 2)),
136
+                                      s = 0.05,
137
+                                      sh = -0.3,
138
+                                      typeDep = "MN"))
139
+
140
+
141
+pancr1 <- oncoSimulIndiv(pancr, model = "Exp", keepPhylog = TRUE)
142
+pancr8 <- oncoSimulPop(8, pancr, model = "Exp", keepPhylog = TRUE,
143
+                       mc.cores = 2)
144
+
145
+POM(pancr1)
146
+LOD(pancr1)
147
+
148
+POM(pancr8)
149
+LOD(pancr8)
150
+
151
+diversityPOM(POM(pancr8))
152
+diversityLOD(LOD(pancr8))
153
+
154
+
155
+
156
+}
157
+
158
+\keyword{manip}
159
+\keyword{univar}
160
+
161
+
162
+
163
+
164
+
165
+
166
+
167
+
168
+