Browse code

2.5.5; vignette fixes

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@125127 bc3139a8-67e5-0310-9ffc-ced21a209358

Ramon Diaz-Uriarte authored on 14/12/2016 12:53:22
Showing 6 changed files

... ...
@@ -1,8 +1,8 @@
1 1
 Package: OncoSimulR
2 2
 Type: Package
3 3
 Title: Forward Genetic Simulation of Cancer Progression with Epistasis 
4
-Version: 2.5.4
5
-Date: 2016-12-12
4
+Version: 2.5.5
5
+Date: 2016-12-14
6 6
 Authors@R: c(person("Ramon", "Diaz-Uriarte", role = c("aut", "cre"),
7 7
 		     email = "rdiaz02@gmail.com"),
8 8
 	      person("Mark", "Taylor", role = "ctb", email = "ningkiling@gmail.com"))
... ...
@@ -1,3 +1,6 @@
1
+Changes in version 2.5.5 (2016-12-14):
2
+	- Vignette: miscell changes (typos, etc)
3
+	
1 4
 Changes in version 2.5.4 (2016-12-12):
2 5
 	- Vignette: miscell changes (order of examples, typos, etc)
3 6
 
... ...
@@ -160,11 +160,11 @@ mutator effects.
160 160
  
161 161
  
162 162
  
163
-OncoSimulR was originally developed to simulate tumor progression using
164
-several models of tumor progression with emphasis on allowing users to set
165
-restrictions in the accumulation of mutations as specified, for example, by
166
-Oncogenetic Trees [OT: @Desper1999JCB; @Szabo2008] or Conjunctive Bayesian
167
-Networks [CBN: @Beerenwinkel2007; @Gerstung2009; @Gerstung2011], with the
163
+OncoSimulR was originally developed to simulate tumor progression with
164
+emphasis on allowing users to set restrictions in the accumulation of
165
+mutations as specified, for example, by Oncogenetic Trees
166
+[OT: @Desper1999JCB; @Szabo2008] or Conjunctive Bayesian Networks
167
+[CBN: @Beerenwinkel2007; @Gerstung2009; @Gerstung2011], with the
168 168
 possibility of adding passenger mutations to the simulations and allowing
169 169
 for several types of sampling.
170 170
 
... ...
@@ -195,22 +195,24 @@ of genes in a module. In addition, arbitrary numbers of genes without
195 195
 interactions (and with fitness effects coming from any distribution you
196 196
 might want) are also possible.
197 197
 
198
-Mutator/antimutator genes, genes that alter the mutation rate of other genes
199
-[@gerrish_complete_2007, @tomlinson_mutation_1996], can also be simulated
200
-with OncoSimulR and specified with most of the mechanisms above (you can
201
-have, for instance, interactions between mutator genes). And, with regards
202
-to mutation rates, different genes can have different mutation rates
203
-(regardless of the presence or not of other mutator/antimutator genes).
198
+Mutator/antimutator genes, genes that alter the mutation rate of
199
+other genes [@gerrish_complete_2007; @tomlinson_mutation_1996], can
200
+also be simulated with OncoSimulR and specified with most of the
201
+mechanisms above (you can have, for instance, interactions between
202
+mutator genes). And, regardless of the presence or not of other
203
+mutator/antimutator genes, different genes can have different
204
+mutation rates.
204 205
 
205 206
 
206
-Simulations can be stopped as a function of total population size, number of
207
-mutated driver genes, or number of time periods. Simulations can also be
208
-stopped with a stochastic detection mechanism where the probability of
207
+Simulations can be stopped as a function of total population size, number
208
+of mutated driver genes, or number of time periods. Simulations can also
209
+be stopped with a stochastic detection mechanism where the probability of
209 210
 detecting a tumor increases with total population size. Simulations return
210
-the number of cells of every genotype/clone at each of the sampling periods
211
-and the genealogical relationships of all clones generated during the
212
-simulation. We can take samples from those data with single-cell or whole-
213
-tumor resolution, adding noise if we want.
211
+the number of cells of every genotype/clone at each of the sampling
212
+periods and we can take samples from the former with single-cell or whole-
213
+tumor resolution, adding noise if we want. If we ask for them, simulations
214
+also store and return the genealogical relationships of all clones
215
+generated during the simulation.
214 216
 
215 217
 
216 218
 The models so far implemented are all continuous time models, which are
... ...
@@ -219,7 +221,7 @@ implemented in C++, providing for fast execution.  To help with simulation
219 221
 studies, code to simulate random graphs of the kind often seen in CBNs, OTs,
220 222
 etc, is also available. Finally, OncoSimulR also allows for the generation
221 223
 of random fitness landscapes and the representation of fitness landscapes
222
-and provides some statistics of of evolutionary predictability.
224
+and provides statistics of evolutionary predictability.
223 225
 
224 226
 
225 227
 
... ...
@@ -228,7 +230,7 @@ and provides some statistics of of evolutionary predictability.
228 230
 
229 231
 As mentioned above, OncoSimulR is now a very general package for forward
230 232
 genetic simulation, with applicability well beyond tumor progression. This
231
-is a summary of some of the key features:
233
+is a summary of some of its key features:
232 234
 
233 235
 <!-- FIXME: add the tables of the poster -->
234 236
 
... ...
@@ -240,14 +242,14 @@ is a summary of some of the key features:
240 242
       Oncogenetic Trees (OTs), Conjunctive Bayesian Networks (CBNs),
241 243
       semimonotone progression networks, and XOR relationships.
242 244
 		  
243
-    - Epistatic interactions, including, but not limited to, synthetic
245
+    - Epistatic interactions including, but not limited to, synthetic
244 246
        viability and synthetic lethality.
245 247
     - Order effects.
246 248
 	
247 249
 * You can add passenger mutations.
248 250
 * You can add mutator/antimutator effects.
249 251
 * Fitness and mutation rates can be gene-specific.
250
-* More generally, you can add arbitrary numbers of non-interacting
252
+* You can add arbitrary numbers of non-interacting
251 253
       genes with arbitrary fitness effects.
252 254
   
253 255
 * you can allow for deviations from the OT, CBN, semimonotone, and
... ...
@@ -259,9 +261,9 @@ is a summary of some of the key features:
259 261
       sampling. 
260 262
 	  
261 263
 * You can stop the simulations using a flexible combination of
262
-      conditions, from final time to number of drivers to population
263
-      size, to fixation of certain genotypes, to a stochastic
264
-      stopping mechanism that depends on size, etc.  
264
+      conditions: final time, number of drivers, population
265
+      size, fixation of certain genotypes, and a stochastic
266
+      stopping mechanism that depends on population size.
265 267
   
266 268
 * Right now, three different models are available, two that lead
267 269
       to exponential growth, one of them loosely based on @Bozic2010, and
... ...
@@ -270,7 +272,7 @@ is a summary of some of the key features:
270 272
 <!-- * Code in C++ is available (though not yet callable from R) for -->
271 273
 <!--       using several other models, including the one from @Beerenwinkel2007b. -->
272 274
       
273
-* You can use very large numbers of genes (e.g., see an example of
275
+* You can use large numbers of genes (e.g., see an example of
274 276
       50000 in section \@ref(mcf50070) ).
275 277
       
276 278
 * Simulations are generally very fast: I use C++ to implement the BNB
... ...
@@ -288,13 +290,13 @@ is a summary of some of the key features:
288 290
       
289 291
 * You can plot fitness landscapes.
290 292
       
291
-* You can obtain some basic measures of evolutionary predictability
293
+* You can obtain statistics of evolutionary predictability
292 294
   from the simulations.
293 295
   
294 296
 
295 297
 
296 298
 The table below, modified from the table at the
297
-[Genetics Simulation Resources (GSR) page](https://popmodels.cancercontrol.cancer.gov/gsr/packages/oncosimulr/#detailed)
299
+[Genetics Simulation Resources (GSR) page](https://popmodels.cancercontrol.cancer.gov/gsr/packages/oncosimulr/#detailed), 
298 300
 provides a summary of the key features of OncoSimulR. (An
299 301
 explanation of the meaning of terms specific to the GSR table is
300 302
 available from
... ...
@@ -304,21 +306,21 @@ by moving the mouse over each term).
304 306
 
305 307
 \clearpage
306 308
 
307
-|Attribute Category     | Attribute                                     |
308
-|-----------------------|-----------------------------------------------|
309
-|**Target**             |  |
310
-|&nbsp; Type of Simulated Data|           Haploid DNA Sequence|
311
-|&nbsp; variations            |           Biallelic Marker, Genotype or Sequencing Error|
312
-|**Simulation Method**            |           Forward-time|
313
-|&nbsp; Type of Dynamical Model | Continuous time|
314
-|&nbsp; Entities Tracked | Clones (see \@ref(trackindivs))|
309
+|Attribute Category                  | Attribute                                     |
310
+|-------------------------------|-----------------------------------------------|
311
+|**Target**                           |  |
312
+|&nbsp; Type of Simulated Data        |           Haploid DNA Sequence|
313
+|&nbsp; Variations                    |           Biallelic Marker, Genotype or Sequencing Error|
314
+|**Simulation Method**                |           Forward-time|
315
+|&nbsp; Type of Dynamical Model       | Continuous time|
316
+|&nbsp; Entities Tracked              | Clones (see \@ref(trackindivs))|
315 317
 |**Input** | Program specific (R data frames and matrices specifying genotypes' fitness, gene effects, and starting genotype) |
316 318
 |**Output**||
317 319
 |&nbsp; Data Type| Genotype or Sequence, Individual Relationship (complete parent-child relationships between clones), Demographic (populations sizes of all clones at sampling times), Diversity Measures (LOD, POM, diversity of genotypes), Fitness|
318 320
 |&nbsp; Sample Type|	Random or Independent, Longitudinal, Other (proportional to population size)|
319 321
 |**Evolutionary Features**	||
320 322
 |&nbsp; Mating Scheme| Asexual Reproduction |
321
-|&nbsp; Demographic||	
323
+|&nbsp; Demographic                   ||	
322 324
 |&nbsp; &nbsp; Population Size Changes|	Exponential (two models), Logistic (McFarland et al., 2013)|
323 325
 |&nbsp; Fitness Components||	
324 326
 |&nbsp; &nbsp; Birth Rate|	Individually Determined from Genotype (models "Exp" and "McFL")|
... ...
@@ -345,28 +347,29 @@ found in @Diaz-Uriarte2015, where additional comments about model
345 347
 parameters and caveats are discussed.
346 348
 
347 349
 
348
-Are there similar programs? The Java program by @Reiter2013a offers somewhat
349
-similar functionality to the previous version of OncoSimulR, but it is
350
-restricted to at most four drivers (whereas v.1 of OncoSimulR allowed for up
351
-to 64), you cannot use arbitrary CBNs or OTs (or XORs or semimonotone
352
-graphs) to specify restrictions, there is no allowance for passengers, and a
353
-single type of model (a discrete time Galton-Watson process) is
354
-implemented. The current functionality of OncoSimulR goes well beyond the
355
-the previous version (and, thus, also the TPT of [@Reiter2013a]). We now
356
-allow you to specify all types of fitness effects in other general forward
357
-genetic simulators such as FFPopSim [@Zanini2012], and some that, to our
358
-knowledge (e.g., order effects) are not available from any genetics
359
-simulator. In addition, the "Lego system" to flexibly combine different
360
-fitness specifications is also unique; by "Lego system" I mean that we can
361
-combine different pieces and blocks, similarly to what we do with Lego
362
-bricks. (I find this an intuitive and very graphical analogy, which I have
363
-copied from @Hothorn_2006 and @Hothorn_2008). In a nutshell, salient
364
-features of OncoSimulR compared to other simulators are the unparalleled
365
-flexibility to specify fitness and mutator effects, with modules and order
366
-effects as particularly unique, and the options for sampling and stopping
367
-the simulations, particularly convenient in cancer evolution models. Also
368
-unique in this type of software is the addition of functions for simulating
369
-fitness landscapes and assessing evolutionary predictability.
350
+Are there similar programs? The Java program by @Reiter2013a, TTP, offers
351
+somewhat similar functionality to the previous version of OncoSimulR, but
352
+it is restricted to at most four drivers (whereas v.1 of OncoSimulR
353
+allowed for up to 64), you cannot use arbitrary CBNs or OTs (or XORs or
354
+semimonotone graphs) to specify restrictions, there is no allowance for
355
+passengers, and a single type of model (a discrete time Galton-Watson
356
+process) is implemented. The current functionality of OncoSimulR goes well
357
+beyond the the previous version (and, thus, also the TPT of
358
+@Reiter2013a). We now allow you to specify all types of fitness effects
359
+in other general forward genetic simulators such as FFPopSim
360
+[@Zanini2012], and some that, to our knowledge (e.g., order effects) are
361
+not available from any genetics simulator. In addition, the "Lego system"
362
+to flexibly combine different fitness specifications is also unique; by
363
+"Lego system" I mean that we can combine different pieces and blocks,
364
+similarly to what we do with Lego bricks. (I find this an intuitive and
365
+very graphical analogy, which I have copied from @Hothorn_2006 and
366
+@Hothorn_2008). In a nutshell, salient features of OncoSimulR compared to
367
+other simulators are the unparalleled flexibility to specify fitness and
368
+mutator effects, with modules and order effects as particularly unique,
369
+and the options for sampling and stopping the simulations, particularly
370
+convenient in cancer evolution models. Also unique in this type of
371
+software is the addition of functions for simulating fitness landscapes
372
+and assessing evolutionary predictability.
370 373
 
371 374
 
372 375
 
... ...
@@ -390,7 +393,7 @@ OncoSimulR can help address involve combinations of:
390 393
           DAGs/posets where these DAGs/posets:
391 394
 		     - Are user-specified
392 395
 	         - Generated randomly (`simOGraph`)
393
-        - Any mapping between genotypes and fitness:
396
+        - Any mapping between genotypes and fitness where this mapping is:
394 397
             - User-specified
395 398
             - Generated randomly from families of random fitness landscapes (`rfitness`)     
396 399
      - Mutation rates can:
... ...
@@ -411,12 +414,12 @@ OncoSimulR can help address involve combinations of:
411 414
     - ... and assessing the consequences of those on the observed
412 415
       genotypes and their diversity (`sampledGenotypes`) and any other
413 416
       inferences that depend on the observational process.
414
-	- OncoSimulR returns the genotypes at each of the sampling
415
-	  points, so you are not restricted by what the
416
-	  `samplePop` function provides.
417
+	- (OncoSimulR returns the abundances of all genotypes at each of
418
+	  the sampling points, so you are not restricted by what the
419
+	  `samplePop` function provides.)
417 420
   
418 421
 
419
-* Tracing the genealogical relationships of clones
422
+* Tracking the genealogical relationships of clones
420 423
   (`plotClonePhylog`) and assessing evolutionary predictability
421 424
   (`LOD`, `POM`).
422 425
 
... ...
@@ -428,8 +431,8 @@ OncoSimulR are discussed in section \@ref(whatfor).
428 431
 -----
429 432
 
430 433
 A quick overview of the main functions and their relationships is shown in
431
-the figure \@ref(fig:frelats), where we use italics for the type/class of R
432
-object and courier for the name of the functions.
434
+Figure \@ref(fig:frelats), where we use italics for the type/class of R
435
+object and courier font for the name of the functions.
433 436
 <!-- Note: figure1.png, and how to create it, explained in miscell-files -->
434 437
 <!-- in the repo -->
435 438
 <!-- ![Relationship between the main functions in OncoSimulR.](figure1.png) -->
... ...
@@ -440,21 +443,22 @@ object and courier for the name of the functions.
440 443
 knitr::include_graphics("relfunct.png")
441 444
 ```
442 445
 
443
-
446
+\clearpage
444 447
 
445 448
 ## Examples of questions that can be addressed with OncoSimulR {#whatfor}
446 449
 
447 450
 Most of the examples in the rest of this vignette, starting with those in
448 451
 \@ref(quickexample), focus on the mechanics. Here, we will illustrate some
449
-problems in cancer genomics and evolutionary genetics where OncoSimulR could
450
-be of help. This section does not try to provide an answer to any of these
451
-questions (those would be full papers by themselves) but simply to
452
-illustrate some kinds of questions where you can use OncoSimulR; of course,
453
-the possible uses of OncoSimulR are only limited by your ingenuity. Finally,
454
-I will only use short snippets of working code as we are limited by time of
455
-execution; for real work you would want to use many more scenarios and many
456
-more simulations, you would use appropriate statistical methods to compare
457
-the output of runs, etc, etc, etc.
452
+problems in cancer genomics and evolutionary genetics where OncoSimulR
453
+could be of help. This section does not try to provide an answer to any of
454
+these questions (those would be full papers by themselves). Instead, this
455
+section simply tries to illustrate some kinds of questions where you can
456
+use OncoSimulR; of course, the possible uses of OncoSimulR are only
457
+limited by your ingenuity. Here, I will only use short snippets of working
458
+code as we are limited by time of execution; for real work you would want
459
+to use many more scenarios and many more simulations, you would use
460
+appropriate statistical methods to compare the output of runs, etc, etc,
461
+etc.
458 462
 
459 463
 
460 464
 ```{r firstload}
... ...
@@ -464,10 +468,10 @@ library(OncoSimulR)
464 468
 
465 469
 ### Recovering restrictions in the order of accumulation of mutations {#ex-order}
466 470
 
467
-This is a question that was addressed, for instance in
471
+This is a question that was addressed, for instance, in
468 472
 @Diaz-Uriarte2015: do methods that try to infer restrictions in the
469
-order of accumulation of mutations (e.g., @Szabo2008, @Gerstung2009,
470
-@ramazzotti_capri_2015) work well under different evolutionary
473
+order of accumulation of mutations [e.g., @Szabo2008; @Gerstung2009;
474
+@ramazzotti_capri_2015] work well under different evolutionary
471 475
 models and with different sampling schemes? (This issue is also
472 476
 touched upon in section \@ref(sample-1)).
473 477
 
... ...
@@ -486,52 +490,55 @@ RNGkind("L'Ecuyer-CMRG")
486 490
 ```
487 491
 
488 492
 ```{r ex-dag-inf}
489
-## 1. Simulate a DAG
493
+## For reproducibility
494
+set.seed(2)
495
+RNGkind("L'Ecuyer-CMRG")
496
+
497
+## Simulate a DAG
490 498
 g1 <- simOGraph(4, out = "rT")
491 499
 
492
-## 2. Simulate two evolutionary trajectories
500
+## Simulate 10 evolutionary trajectories
493 501
 s1 <- oncoSimulPop(10, allFitnessEffects(g1, drvNames = 1:4),
494 502
                    mc.cores = 2, ## adapt to your hardware
495 503
                    seed = NULL) ## for reproducibility of vignette
496 504
 
497
-## 3. Sample those data uniformly, and add noise
505
+## Sample those data uniformly, and add noise
498 506
 d1 <- samplePop(s1, timeSample = "unif", propError = 0.1)
499 507
 
500
-## 4. You would now run the appropriate inferential method and
501
-## compare observed and true
508
+## You would now run the appropriate inferential method and
509
+## compare observed and true. For example
502 510
 
503
-require(Oncotree)
504
-fit1 <- oncotree.fit(d1)
511
+## require(Oncotree)
512
+## fit1 <- oncotree.fit(d1)
505 513
 
506
-## Now, compare fitted and original. This is well beyond the
507
-## scope of this document (and OncoSimulR itself).
514
+## Now, you'd compare fitted and original. This is well beyond 
515
+## the scope of this document (and OncoSimulR itself).
508 516
 
509 517
 ```
510 518
 
511
-```{r, echo=FALSE}
512
-set.seed(7)
513
-RNGkind("L'Ecuyer-CMRG")
519
+```{r hidden-rng-exochs, echo = FALSE}
520
+set.seed(NULL)
514 521
 ```
515 522
 
523
+
524
+
516 525
 ### Sign epistasis and probability of crossing fitness valleys {#ex-ochs}
517 526
 
518
-These questions, together with the next one (\@ref(ex-predict)),
519
-encompass a wide range of specific issues that have been addressed
520
-in evolutionary genetics studies and which include from detailed
521
-analysis of simple models with a few uphill paths and valleys as in
522
-@Weissman2009 or @Ochs2015, to questions that refer to larger, more
523
-complex fitness landscapes as in @szendro_predictability_2013 or
524
-@franke_evolutionary_2011 (see below).
525
-
526
-
527
-Using as an example @Ochs2015, we could specify the fitness
528
-landscape and run simulations until fixation (with argument
529
-`fixation` to `oncoSimulPop` ---see more details in section
530
-\@ref(fixation) where we also use this same model). We would then
531
-examine the proportion of genotypes fixed under different
532
-scenarios. For instance, we can use the example from @Ochs2015 (we
533
-will see this example also in section \@ref(ochsdesai), where we
534
-cover different ways of specifying fitness). And we can extend this
527
+This question, and the question in the next section (\@ref(ex-predict)),
528
+encompass a wide range of issues that have been addressed in evolutionary
529
+genetics studies and which include from detailed analysis of simple models
530
+with a few uphill paths and valleys as in @Weissman2009 or @Ochs2015, to
531
+questions that refer to larger, more complex fitness landscapes as in
532
+@szendro_predictability_2013 or @franke_evolutionary_2011 (see below).
533
+
534
+
535
+Using as an example @Ochs2015 (we will see this example again in
536
+section \@ref(ochsdesai), where we cover different ways of
537
+specifying fitness), we could specify the fitness landscape and run
538
+simulations until fixation (with argument `fixation` to
539
+`oncoSimulPop` ---see more details in section \@ref(fixation), again
540
+with this example). We would then examine the proportion
541
+of genotypes fixed under different scenarios. And we can extend this
535 542
 example by adding mutator genes:
536 543
 
537 544
 
... ...
@@ -541,6 +548,10 @@ RNGkind("L'Ecuyer-CMRG")
541 548
 ```
542 549
 
543 550
 ```{r exochs}
551
+## For reproducibility
552
+set.seed(2)
553
+RNGkind("L'Ecuyer-CMRG")
554
+
544 555
 ## Specify fitness effects. 
545 556
 
546 557
 ## Numeric values arbitrary, but set the intermediate genotype en
... ...
@@ -578,7 +589,8 @@ initS <- 10
578 589
 od_sim <- oncoSimulPop(10, od, muEF = odm,
579 590
                        fixation = c("u", "i, v"), initSize = initS,
580 591
                        model = "McFL",
581
-                       mu = 1e-4, detectionDrivers = NA, finalTime = NA,
592
+                       mu = 1e-4, detectionDrivers = NA, 
593
+					   finalTime = NA,
582 594
                        detectionSize = NA, detectionProb = NA,
583 595
                        onlyCancer = TRUE, 
584 596
 					   mc.cores = 2, ## adapt to your hardware
... ...
@@ -587,7 +599,7 @@ od_sim <- oncoSimulPop(10, od, muEF = odm,
587 599
 sampledGenotypes(samplePop(od_sim))
588 600
 ```
589 601
 
590
-```{r hidden-rng-exochs, echo = FALSE}
602
+```{r hidden-rng-exochs33, echo = FALSE}
591 603
 set.seed(NULL)
592 604
 ```
593 605
 
... ...
@@ -598,7 +610,8 @@ we would run simulations under random fitness landscapes with varied
598 610
 ruggedness, and would then examine the evolutionary predictability
599 611
 of the trajectories with measures such as "Lines of Descent" and
600 612
 "Path of the Maximum" [@szendro_predictability_2013] and the
601
-diversity of the sampled genotypes under different sampling regimes.
613
+diversity of the sampled genotypes under different sampling regimes (see
614
+details in section \@ref(evolpredszend)).
602 615
 
603 616
 
604 617
 
... ...
@@ -608,17 +621,22 @@ RNGkind("L'Ecuyer-CMRG")
608 621
 ```
609 622
 
610 623
 ```{r exszendro}
624
+## For reproducibility
625
+set.seed(7)
626
+RNGkind("L'Ecuyer-CMRG")
627
+
611 628
 
612 629
 ## Repeat the following loop for different combinations of whatever
613 630
 ## interests you, such as number of genes, or distribution of the
614 631
 ## c and sd (which affect how rugged the landscape is), or 
615
-## reference genotype or evolutionary model, or stopping criterion, 
616
-## sampling procedure, or ...
632
+## reference genotype, or evolutionary model, or stopping criterion, 
633
+## or sampling procedure, or ...
617 634
 
618
-## 1. Generate a random fitness landscape, from the Rough Mount
619
-##    Fuji model, with g genes, and c ("slope" constant) and
620
-##    reference chosen randomly. Ask for a minimal number of
621
-##    accessible genotypes
635
+##  Generate a random fitness landscape, from the Rough Mount
636
+##  Fuji model, with g genes, and c ("slope" constant) and
637
+##  reference chosen randomly (reference is random by default and 
638
+##  thus not specified below). Require a minimal number of  
639
+##  accessible genotypes
622 640
 
623 641
 g <- 6
624 642
 c <- runif(1, 1/5, 5)
... ...
@@ -632,20 +650,19 @@ rl <- rfitness(g, c = c, min_accessible_genotypes = g)
632 650
 ## Obtain landscape measures from Magellan. Export to Magellan
633 651
 to_Magellan(rl, file = "rl1.txt")
634 652
 
635
-## (Getting the statistics requires you to install Magellan,
636
-## and that requires either calling the web app 
637
-## (http://wwwabi.snv.jussieu.fr/public/Magellan/) or
638
-## asking Magellan's authors for the software. This is of course
639
-## beyond the scope of the example and the package.
653
+## (Getting the statistics from Magellan requires either 
654
+## calling the web app (http://wwwabi.snv.jussieu.fr/public/Magellan/) 
655
+## or asking Magellan's authors for the software. This is of course
656
+## beyond the scope of the example and the package.)
640 657
 
641
-## 3. Simulate evolution in that landscape many times (here just 5)
658
+## Simulate evolution in that landscape many times (here just 10)
642 659
 simulrl <- oncoSimulPop(10, allFitnessEffects(genotFitness = rl),
643 660
                         keepPhylog = TRUE, keepEvery = 1,
644 661
                         initSize = 4000,
645 662
                         seed = NULL, ## for reproducibility
646 663
                         mc.cores = 2) ## adapt to your hardware
647 664
 
648
-## 4. Obtain measures of evolutionary predictability
665
+## Obtain measures of evolutionary predictability
649 666
 diversityLOD(LOD(simulrl))
650 667
 diversityPOM(POM(simulrl))
651 668
 sampledGenotypes(samplePop(simulrl, typeSample = "whole"))
... ...
@@ -657,10 +674,10 @@ set.seed(NULL)
657 674
 
658 675
 
659 676
 
660
-### Mutator and antimutator genes {#ex-mut-antimut}
677
+### Mutator and antimutator genes {#exmutantimut}
661 678
 
662 679
 The effects of mutator and antimutator genes have been examined both
663
-in cancer genetics [@nowak_evolutionary_2006,
680
+in cancer genetics [@nowak_evolutionary_2006;
664 681
 @tomlinson_mutation_1996] and in evolutionary genetics
665 682
 [@gerrish_complete_2007], and are related to wider issues such as
666 683
 Muller's ratchet and the evolution of sex. There are, thus, a large
... ...
@@ -675,17 +692,17 @@ in driver genes needed to reach cancer is large and if the mutator effect is
675 692
 large.
676 693
 
677 694
 
678
-We might want to ask, then, how long it takes before we reach
679
-cancer. This is stored in the component `FinalTime` of the
680
-output. We would specify different numbers and effects of mutator
681
-genes (argument `muEF`). We would also change the criteria for
695
+We might want to ask, then, how long it takes before to reach cancer under
696
+different scenarios. Time to reach cancer is stored in the component
697
+`FinalTime` of the output. We would specify different numbers and effects
698
+of mutator genes (argument `muEF`). We would also change the criteria for
682 699
 reaching cancer and in our case we can easily do that by specifying
683
-different numbers in `detectionDrivers`. Of course, we would also
684
-want to examine the effects of varying numbers of mutators, drivers,
685
-and possibly fitness consequences of mutators (below we assume
686
-mutators are neutral and we assume there are no additional genes
687
-with deleterious mutations, but this need not be so, of course; see
688
-also [@tomlinson_mutation_1996, @gerrish_complete_2007, @McFarland2014]).
700
+different numbers in `detectionDrivers`. Of course, we would also want to
701
+examine the effects of varying numbers of mutators, drivers, and possibly
702
+fitness consequences of mutators. Below we assume mutators are neutral and
703
+we assume there are no additional genes with deleterious mutations, but
704
+this need not be so, of course [see also
705
+@tomlinson_mutation_1996; @gerrish_complete_2007; @McFarland2014].
689 706
 
690 707
 
691 708
 Let us run an example. For the sake of simplicity, we assume no
... ...
@@ -721,6 +738,11 @@ RNGkind("L'Ecuyer-CMRG")
721 738
 ```
722 739
 
723 740
 ```{r ex-tomlin2}
741
+## For reproducibility
742
+set.seed(2)
743
+RNGkind("L'Ecuyer-CMRG")
744
+
745
+
724 746
 ddr <- 4
725 747
 st <- oncoSimulPop(4, ft, muEF = mt,
726 748
                    detectionDrivers = ddr,
... ...
@@ -743,9 +765,46 @@ set.seed(NULL)
743 765
 
744 766
 
745 767
 (Incidentally, notice that it is easy to get OncoSimulR to throw an
746
-exception if you do not think twice about the mutator effects you are using
747
-and specify a huge mutation rate if all mutator genes are mutated: see
748
-\@(tomlin-except).)
768
+exception if you accidentally specify a huge mutation rate when all
769
+mutator genes are mutated: see section \@ref(tomlinexcept).)
770
+
771
+
772
+<!-- More complex example where we look at accumulation of deleterious -->
773
+
774
+<!-- ```{r r ex-tomlin3} -->
775
+<!-- set.seed(2) ## for reproducibility -->
776
+<!-- RNGkind("L'Ecuyer-CMRG") -->
777
+
778
+<!-- sd <- 0.1 ## fitness effect of drivers -->
779
+<!-- sp <- -0.01 ## fitness effect of mildly deleterious passengers -->
780
+<!-- sm <- 0 ## fitness effect of mutator -->
781
+<!-- nd <- 20 ## number of drivers -->
782
+<!-- nm <- 5  ## number of mutators -->
783
+<!-- np <- 50 -->
784
+<!-- mut <- 5 ## mutator effect -->
785
+
786
+<!-- fitnessGenesVector <- c(rep(sd, nd), rep(sm, nm), rep(sp, np)) -->
787
+<!-- names(fitnessGenesVector) <- 1:(nd + nm + np) -->
788
+<!-- mutatorGenesVector <- rep(mut, nm) -->
789
+<!-- names(mutatorGenesVector) <- (nd + np + 1):(nd + np + nm) -->
790
+
791
+<!-- ft <- allFitnessEffects(noIntGenes = fitnessGenesVector, -->
792
+<!--                         drvNames = 1:nd) -->
793
+<!-- mt <- allMutatorEffects(noIntGenes = mutatorGenesVector) -->
794
+
795
+<!-- ddr <- 4 -->
796
+<!-- st <- oncoSimulPop(4, ft, muEF = mt, -->
797
+<!--                    detectionDrivers = ddr, -->
798
+<!--                    finalTime = NA, -->
799
+<!--                    detectionSize = NA, -->
800
+<!--                    detectionProb = NA, -->
801
+<!--                    onlyCancer = FALSE, -->
802
+<!--                    keepEvery = NA,  -->
803
+<!--                    mc.cores = 2, ## adapt to your hardware -->
804
+<!--                    seed = NULL) ## for reproducibility -->
805
+<!-- colSums(samplePop(st, timeSample = "last", typeSample = "single"), na.rm = TRUE) -->
806
+<!-- ``` -->
807
+
749 808
 
750 809
 
751 810
 
... ...
@@ -787,10 +846,13 @@ fbauer <- allFitnessEffects(bauer)
787 846
 plot(b1, use_ggrepel = TRUE) ## avoid overlapping labels
788 847
 ```
789 848
 
790
-Now run simulations and examine how frequently the runs do not end
791
-up in extinction (for real, you would of course increase the number
792
-of total populations, the range of initial population sizes, model,
793
-mutation rate, etc):
849
+Now run simulations and examine how frequently the runs end up with
850
+population sizes larger than a pre-specified threshold; for
851
+instance, below we look at increasing population size 4x in the
852
+default maximum number of 2281 time periods (for real, you would of
853
+course increase the number of total populations, the range of
854
+initial population sizes, model, mutation rate, required population
855
+size or number of drivers, etc):
794 856
 
795 857
 
796 858
 ```{r hiddenbau, echo=FALSE}
... ...
@@ -799,14 +861,20 @@ RNGkind("L'Ecuyer-CMRG")
799 861
 ```
800 862
 
801 863
 ```{r exusagebau2}
864
+## For reproducibility
865
+set.seed(2)
866
+RNGkind("L'Ecuyer-CMRG")
867
+
802 868
 totalpops <- 5
869
+initSize <- 100
803 870
 sb1 <- oncoSimulPop(totalpops, fbauer, model = "Exp",
804
-                    initSize = 100,
871
+                    initSize = initSize,
805 872
                     onlyCancer = FALSE, 
806 873
 					mc.cores = 2, ## adapt to your hardware
807 874
                     seed = NULL) ## for reproducibility
808
-## What proportion of the simulations do not end up extinct?
809
-sum(summary(sb1)[, "TotalPopSize"] > 0)/totalpops
875
+					
876
+## What proportion of the simulations reach 4x initSize?
877
+sum(summary(sb1)[, "TotalPopSize"] > (4 * initSize))/totalpops
810 878
 
811 879
 ```
812 880
 ```{r hidden-rng-exbau, echo = FALSE}
... ...
@@ -814,6 +882,36 @@ set.seed(NULL)
814 882
 ```
815 883
 
816 884
 
885
+Alternatively, to examine how long it takes to reach cancer for a
886
+pre-specified size, you could look at the value of `FinalTime` as we
887
+did above (section \@ref(exmutantimut)) after running simulations
888
+with `onlyCancer = TRUE` and `detectionSize` set to some reasonable value:
889
+
890
+
891
+```{r hiddenbau22, echo=FALSE}
892
+set.seed(2)
893
+RNGkind("L'Ecuyer-CMRG")
894
+```
895
+
896
+```{r hhhhbbbb22}
897
+
898
+totalpops <- 5
899
+initSize <- 100
900
+sb2 <- oncoSimulPop(totalpops, fbauer, model = "Exp",
901
+                    initSize = initSize,
902
+                    onlyCancer = TRUE,
903
+					detectionSize = 10 * initSize,
904
+					mc.cores = 2, ## adapt to your hardware
905
+                    seed = NULL) ## for reproducibility
906
+				
907
+## How long did it take to reach cancer?
908
+unlist(lapply(sb2, function(x) x$FinalTime))
909
+```
910
+
911
+```{r hidden-rng-exbau22, echo = FALSE}
912
+set.seed(NULL)
913
+```
914
+
817 915
 #### Consequences of order effects for cancer initiation {#exorder1intro}
818 916
 
819 917
 Instead of focusing on different models for epistatic interactions,
... ...
@@ -829,12 +927,13 @@ order effects):
829 927
 ```{r oex1intro}
830 928
 ## Order effects involving three genes.
831 929
 
832
-## Genotype D, M has different fitness effects
930
+## Genotype "D, M" has different fitness effects
833 931
 ## depending on whether M or D mutated first.
932
+## Ditto for genotype "F, D, M".
933
+
934
+## Meaning of specification: X > Y means
935
+## that X is mutated before Y.
834 936
 
835
-## Genotype F, D, M has different fitness
836
-## depending on which of the order in which the genes
837
-## mutated.
838 937
 
839 938
 o3 <- allFitnessEffects(orderEffects = c(
840 939
                             "F > D > M" = -0.3,
... ...
@@ -863,6 +962,10 @@ RNGkind("L'Ecuyer-CMRG")
863 962
 ```
864 963
 
865 964
 ```{r exusageoe2}
965
+## For reproducibility
966
+set.seed(2)
967
+RNGkind("L'Ecuyer-CMRG")
968
+
866 969
 totalpops <- 5
867 970
 soe1 <- oncoSimulPop(totalpops, o3, model = "Exp",
868 971
                     initSize = 500,
... ...
@@ -880,6 +983,10 @@ set.seed(NULL)
880 983
 ```
881 984
 
882 985
 
986
+As we just said, alternatively, to examine how long it takes to
987
+reach cancer you could run simulations with `onlyCancer = TRUE` and
988
+look at the value of `FinalTime` as we did above (section
989
+\@ref(exmutantimut)).
883 990
 
884 991
 
885 992
 
... ...
@@ -893,27 +1000,29 @@ and \@ref(whatfor) and running time and space consumption of OncoSimulR are
893 1000
 addressed in section \@ref(timings). You should be aware that **coalescent
894 1001
 simulations**, sometimes also called backward-time simulations, are much
895 1002
 more efficient for simulating neutral data as well as some special selection
896
-scenarios [@Yuan2012,@Carvajal-Rodriguez2010,@Hoban2011].  
897
-
898
-In addition, since OncoSimulR allows you to specify fitness with arbitrary
899
-epistatic and order effects, as well as mutator effects, you need to learn
900
-the syntax of how to specify those effects and you might be paying a
901
-performance penalty if your scenario does not require this complexity. For
902
-instance, in the models in @Beerenwinkel2007b or @Bozic2010, the fitness of
903
-a genotype depends only on the total number of drivers mutated, but not on
904
-which drivers are mutated (and, thus, not on the epistatic interactions nor
905
-the order of accumulation of the drivers). This means that the syntax for
906
-specifying those fitness models could probably be a lot simpler (e.g.,
907
-specify $s$ per driver). 
908
-
909
-But it also means that code written for just that case could probably run
910
-much faster. First, because fitness evaluation is easier. Second, and
911
-possibly much more important, because what we need to keep track of leads to
912
-much simpler and economic structures: we do not need to keep track of clones
913
-(where two cells are regarded as different clones if they differ anywhere in
914
-their genotype), but only of clone types as defined by the number of mutated
915
-drivers, and keeping track of clones can be expensive ---see sections
916
-\@ref(timings) and \@ref(trackindivs). 
1003
+scenarios [@Yuan2012; @Carvajal-Rodriguez2010; @Hoban2011].  
1004
+
1005
+In addition, since OncoSimulR allows you to specify fitness with
1006
+arbitrary epistatic and order effects, as well as mutator effects,
1007
+you need to learn the syntax of how to specify those effects and you
1008
+might be paying a performance penalty if your scenario does not
1009
+require this complexity. For instance, in the model of
1010
+@Beerenwinkel2007b, the fitness of a genotype depends only on the
1011
+total number of drivers mutated, but not on which drivers are
1012
+mutated (and, thus, not on the epistatic interactions nor the order
1013
+of accumulation of the drivers). This means that the syntax for
1014
+specifying that model could probably be a lot simpler
1015
+(e.g., specify $s$ per driver).
1016
+
1017
+But it also means that code written for just that case could
1018
+probably run much faster. First, because fitness evaluation is
1019
+easier. Second, and possibly much more important, because what we
1020
+need to keep track of leads to much simpler and economic structures:
1021
+we do not need to keep track of clones (where two cells are regarded
1022
+as different clones if they differ anywhere in their genotype), but
1023
+only of clone types or clone classes as defined by the number of
1024
+mutated drivers, and keeping track of clones can be expensive ---see
1025
+sections \@ref(timings) and \@ref(trackindivs).
917 1026
 
918 1027
 So for those cases where you do not need the full flexibility of OncoSimulR,
919 1028
 special purpose software might be easier to use and faster to run. Of
... ...
@@ -923,16 +1032,17 @@ be available, though.
923 1032
 
924 1033
 
925 1034
 
926
-## Steps in using OncoSimulR {#steps}
1035
+## Steps for using OncoSimulR {#steps}
927 1036
 
928 1037
 
929 1038
 Using this package will often involve the following steps:
930 1039
 
931 1040
 
932
-1. Specify the fitness effects: sections \@ref(specfit) and \@ref(litex).
1041
+1. Specify fitness effects: sections \@ref(specfit) and \@ref(litex).
933 1042
 
934
-2. Simulate cancer progression: section \@ref(simul). You can simulate
935
-  for a single subject or for a set of subjects. You will need to:
1043
+2. Simulate cancer progression: section \@ref(simul). You can
1044
+  simulate for a single individual or subject or for a set of
1045
+  subjects. You will need to:
936 1046
   
937 1047
     - Decide on a model. This basically amounts to choosing a model with
938 1048
     exponential growth ("Exp" or "Bozic") or a model with carrying capacity
... ...
@@ -1012,7 +1122,7 @@ sv2 <- allFitnessEffects(epistasis = c("-A : B" = sb,
1012 1122
 evalAllGenotypes(sv2, addwt = TRUE)
1013 1123
 
1014 1124
 ## 2. Simulate the data. Here we use the "McFL" model and set
1015
-##    explicitly parameters for mutation rate, initial size and size
1125
+##    explicitly parameters for mutation rate, initial size, size
1016 1126
 ##    of the population that will end the simulations, etc
1017 1127
 
1018 1128
 RNGkind("Mersenne-Twister")
... ...
@@ -1071,7 +1181,7 @@ ep1 <- oncoSimulIndiv(sv2, model = "McFL",
1071 1181
 <!-- %% @  -->
1072 1182
 
1073 1183
 
1074
-```{r iep1x1,fig.width=6.5, fig.height=4.5, fig.cap="Plot of drivers of an epistasis simulation"}
1184
+```{r iep1x1,fig.width=6.5, fig.height=4.5, fig.cap="Plot of drivers of an epistasis simulation."}
1075 1185
 ## 3. We will not analyze those data any further. We will only plot
1076 1186
 ## them.  For the sake of a small plot, we thin the data.
1077 1187
 plot(ep1, show = "drivers", xlim = c(0, 1500),
... ...
@@ -1093,7 +1203,7 @@ DAG** with the
1093 1203
 pancreatic cancer poset in @Gerstung2011 (see more
1094 1204
 details in section \@ref(pancreas)):
1095 1205
 
1096
-```{r, fig.width=5}
1206
+```{r fepancr1, fig.width=5}
1097 1207
 ## 1. Fitness effects: 
1098 1208
 pancr <- allFitnessEffects(
1099 1209
     data.frame(parent = c("Root", rep("KRAS", 4), 
... ...
@@ -1107,10 +1217,14 @@ pancr <- allFitnessEffects(
1107 1217
                typeDep = "MN"),
1108 1218
     drvNames = c("KRAS", "SMAD4", "CDNK2A", "TP53", 
1109 1219
 	             "MLL3", "TGFBR2", "PXDN"))
1220
+```
1110 1221
 
1222
+```{r figfpancr1, fig.width=5, fig.cap="Plot of DAG corresponding to fitnessEffects object."}
1111 1223
 ## Plot the DAG of the fitnessEffects object
1112 1224
 plot(pancr)
1113 1225
 ``` 
1226
+
1227
+
1114 1228
 ```{r}
1115 1229
 ## 2. Simulate from it. We change several possible options. 
1116 1230
 
... ...
@@ -1172,7 +1286,7 @@ paper:
1172 1286
 A PDF version of this vignette is available
1173 1287
 from <https://rdiaz02.github.io/OncoSimul/pdfs/OncoSimulR.pdf>.  And an HTML
1174 1288
 version from <https://rdiaz02.github.io/OncoSimul/OncoSimulR.html>. These
1175
-files should correspond to the most recent, github version, of the package
1289
+files should correspond to the most recent, GitHub version, of the package
1176 1290
 (i.e., they might include changes not yet available from the BioConductor
1177 1291
 package).
1178 1292
 
... ...
@@ -1201,18 +1315,17 @@ of "experiment before launching a large number of simulations"
1201 1315
 applies, but here we will walk through several cases to get a
1202 1316
 feeling for the major factors that affect speed and size. Many of
1203 1317
 the comments on this section need to use ideas discussed in other
1204
-places of this document. If you read this section first, you might
1318
+places of this document; if you read this section first, you might
1205 1319
 want to come back after reading the relevant parts.
1206 1320
 
1207 1321
 
1208 1322
 Speed will depend on:
1209 1323
 
1210 1324
 * Your hardware, of course.
1211
-* The evolutionary model: using the "McFL" model is often slower
1212
-  (this model includes density dependence and is more complex).
1325
+* The evolutionary model.
1213 1326
 * The granularity of how often you keep data (`keepEvery`
1214 1327
   argument). Note that the default, which is to keep as often as you
1215
-  sample (so that we preserve all history) can lead to very slow execution
1328
+  sample (so that we preserve all history) can lead to slow execution
1216 1329
   times.
1217 1330
 * The mutation rate, because higher mutation rates lead to more
1218 1331
   clones, and more clones means we need to iterate over, well, more clones,
... ...
@@ -1224,6 +1337,7 @@ Speed will depend on:
1224 1337
 * The stopping conditions (`detectionProb`, `detectionDrivers`,
1225 1338
   `detectionSize` arguments) and whether or not simulations are run until
1226 1339
   cancer is reached (`onlyCancer` argument).
1340
+* Most of the above factors can interact in complex ways.
1227 1341
 
1228 1342
 
1229 1343
 Size of returned objects will depend on:
... ...
@@ -1430,7 +1544,7 @@ t_mc2 <- system.time(
1430 1544
 
1431 1545
 ``` 
1432 1546
 
1433
-And we can obtain times, sizes of objects, and summaries of numbers
1547
+We can obtain times, sizes of objects, and summaries of numbers
1434 1548
 of clones, iterations, and final times doing, for instance:
1435 1549
 
1436 1550
 
... ...
@@ -1458,7 +1572,9 @@ panderOptions("table.split.cells", 900)  ## For HTML
1458 1572
 ## panderOptions("table.split.cells", 8) ## For PDF
1459 1573
 
1460 1574
 set.alignment('right')
1461
-panderOptions('round', 3)
1575
+panderOptions('round', 2)
1576
+panderOptions('big.mark', ',')
1577
+panderOptions('digits', 2)
1462 1578
 				          
1463 1579
 pander(benchmark_1[1:4, c("Elapsed Time, average per simulation (s)", 
1464 1580
  	              "Object Size, average per simulation (MB)",
... ...
@@ -1468,6 +1584,7 @@ pander(benchmark_1[1:4, c("Elapsed Time, average per simulation (s)",
1468 1584
  				  "Total Population Size, median",
1469 1585
  				  "Total Population Size, max.",
1470 1586
  				  "keepEvery")],
1587
+				  justify = c('left', rep('right', 8)), ##  o.w. hlines not right
1471 1588
 				  ## caption = "\\label{tab:bench1}Benchmarks of Exp and McFL  models using the default `detectionProb` with two settings of `keepEvery`."
1472 1589
 				  )
1473 1590
 ```
... ...
@@ -1494,16 +1611,17 @@ a sharp decrease in number of iterations and, thus, running time.
1494 1611
 
1495 1612
 
1496 1613
 
1497
-This table also shows that the `keepEvery = NA` setting, which was in effect
1498
-in simulations `exp2` and `mc2`, can make a difference especially for the
1499
-McFL models, as seen by the median number of clones and the size of the
1500
-returned object. Models `exp2` and `mc2` do not store any intermediate
1501
-population samples so clones that become extinct at any sampling period are
1502
-pruned and only the existing clones at the end of the simulation are
1503
-returned. In contrast, models `exp1` and `mc1` store population samples at
1504
-time intervals of 1 (`keepEvery = 1`), even if many of those clones
1505
-eventually become extinct. We will return to this issue below as execution
1506
-time and object size (which affects RAM usage) depend strongly on number of
1614
+This table also shows that the `keepEvery = NA` setting, which was
1615
+in effect in simulations `exp2` and `mc2`, can make a difference
1616
+especially for the McFL models, as seen by the median number of
1617
+clones and the size of the returned object. Models `exp2` and `mc2`
1618
+do not store any intermediate population samples so clones that
1619
+become extinct at any sampling period are pruned and only the
1620
+existing clones at the end of the simulation are returned. In
1621
+contrast, models `exp1` and `mc1` store population samples at time
1622
+intervals of 1 (`keepEvery = 1`), even if many of those clones
1623
+eventually become extinct. We will return to this issue below as
1624
+execution time and object size depend strongly on the number of
1507 1625
 clones tracked.
1508 1626
 
1509 1627
 
... ...
@@ -1517,9 +1635,12 @@ population size 1,000 times larger than the initial one (`PDBaseline = 5e4`,
1517 1635
 take place unless populations are at least 1,000 times larger than the
1518 1636
 initial population size, and probability of detection is 0.1 with a
1519 1637
 population size 100,000 times larger than the initial one (`PDBaseline =
1520
-5e5`, `n2 = 5e7`). In runs `exp3` and `exp5` we set `keepEvery = 1` and in
1638
+5e5`, `n2 = 5e7`)[^rva]. In runs `exp3` and `exp5` we set `keepEvery = 1` and in
1521 1639
 runs `exp4` and `exp6` we set `keepEvery = NA`.
1522 1640
 
1641
+[^rva]:Again, these are not necessarily reasonable or common
1642
+settings. We are using them to understand what and how affects
1643
+running time and space consumption.
1523 1644
 
1524 1645
 
1525 1646
 ```{r timing2, eval = FALSE}
... ...
@@ -1580,6 +1701,7 @@ Table: (\#tab:bench1b) Benchmarks of Exp models modifying the default `detection
1580 1701
 panderOptions("table.split.table", 99999999)
1581 1702
 panderOptions("table.split.cells", 900)  ## For HTML
1582 1703
 ## panderOptions("table.split.cells", 8) ## For PDF
1704
+
1583 1705
 set.alignment('right')
1584 1706
 panderOptions('round', 2)
1585 1707
 panderOptions('big.mark', ',')
... ...
@@ -1595,6 +1717,7 @@ pander(benchmark_1[5:8, c("Elapsed Time, average per simulation (s)",
1595 1717
  				  "keepEvery",
1596 1718
 				  "PDBaseline",
1597 1719
 				  "n2")], 
1720
+				  justify = c('left', rep('right', 10)), ##  o.w. hlines not right
1598 1721
 ## 				  round = c(rep(2, 3), rep(0, 7)),
1599 1722
 ## 				  digits = c(rep(2, 3), rep(1, 7)),
1600 1723
 	  ## caption = "\\label{tab:bench1b}Benchmarks of Exp and McFL models modifying the default `detectionProb` with two settings of `keepEvery`."
... ...
@@ -1611,7 +1734,8 @@ smaller object sizes and slightly smaller numbers of clones and
1611 1734
 execution times. Changing the exiting conditions (by changing
1612 1735
 `detectionProb` arguments) leads to large increases in number of
1613 1736
 iterations (in this case by factors of about 15x to 25x) and a
1614
-corresponding increase in execution time.
1737
+corresponding increase in execution time as well as much larger
1738
+population sizes (in some cases $>10^{10}$).
1615 1739
 
1616 1740
 
1617 1741
 In some of the runs of `exp5` and `exp6` we get the (recoverable)
... ...
@@ -1631,16 +1755,16 @@ that simulations will be repeated until the exiting conditions are reached
1631 1755
 that ends up in extinction will be repeated. This setting can thus have a
1632 1756
 large effect on the exponential models, because when the initial
1633 1757
 population size is not very large and we start from the wildtype, it is
1634
-not infrequent for simulations to become extinct (when birth and death
1758
+not uncommon for simulations to become extinct (when birth and death
1635 1759
 rates are equal and the population size is small, it is easy to reach
1636 1760
 extinction before a mutation in a gene that increases fitness occurs). But
1637
-this is rarely the case in the McFarland model (unless we use really small
1761
+this is rarely the case in the McFarland model (unless we use really tiny
1638 1762
 initial population sizes) because of the dependency of death rate on total
1639 1763
 population size (see section \@ref(mcfl)).
1640 1764
 
1641 1765
 
1642 1766
 The number of attempts until cancer was reached in the above
1643
-models is shown in the Table \@ref(tab:bench1c) (the values can be obtained from
1767
+models is shown in  Table \@ref(tab:bench1c) (the values can be obtained from
1644 1768
 any of the above runs doing, for instance, `median(unlist(lapply(exp1,
1645 1769
 function(x) x$other$attemptsUsed)))` ):
1646 1770
 
... ...
@@ -1648,7 +1772,7 @@ Table: (\#tab:bench1c) Number of attempts until cancer.
1648 1772
 ```{r bench1c, eval=TRUE, echo = FALSE}
1649 1773
 panderOptions("table.split.table", 99999999)
1650 1774
 panderOptions("table.split.cells", 900)  ## For HTML
1651
-## panderOptions("table.split.cells", 8) ## For PDF
1775
+## panderOptions("table.split.cells", 12) ## For PDF
1652 1776
 set.alignment('right')
1653 1777
 panderOptions('round', 2)
1654 1778
 panderOptions('big.mark', ',')
... ...
@@ -1660,9 +1784,10 @@ pander(benchmark_1[1:8, c(
1660 1784
 "Attempts until Cancer, max.", 
1661 1785
 				  "PDBaseline",
1662 1786
 				  "n2")], 
1787
+				  justify = c('left', rep('right', 5)), ##  o.w. hlines not right
1663 1788
 ## 				  round = c(rep(2, 3), rep(0, 7)),
1664 1789
 ## 				  digits = c(rep(2, 3), rep(1, 7)),
1665
-	  ## caption = "\\label{tab:bench1c}Median number of attempts until cancer."
1790
+	  ## caption = "\\label{tab:bench1c}Number of attempts until cancer."
1666 1791
     )
1667 1792
 ## ## data(benchmark_1)
1668 1793
 ## knitr::kable(benchmark_1[1:8, c("Attempts.Median",
... ...
@@ -1680,7 +1805,7 @@ The McFL models finish in a single attempt. The exponential model
1680 1805
 simulations where we can exit with small population sizes (`exp1`, `exp2`)
1681 1806
 need many fewer attempts to reach cancer than those where large population
1682 1807
 sizes are required (`exp3` to `exp6`). There is no relevant different
1683
-among those four, which is what we would expect: a population that has
1808
+among those last four, which is what we would expect: a population that has
1684 1809
 already reached a size of 50,000 cells from an initial population size of
1685 1810
 500 is obviously a growing population where there is at least one mutant
1686 1811
 with positive fitness; thus, it unlikely to go extinct and therefore
... ...
@@ -1690,7 +1815,7 @@ risk of extinction.
1690 1815
 
1691 1816
 We will now rerun all of the above models with argument `onlyCancer =
1692 1817
 FALSE`.  The results are shown in Table \@ref(tab:timing3) (note that the
1693
-differences between this table and table \@ref(tab:bench1) for the McFL
1818
+differences between this table and Table \@ref(tab:bench1) for the McFL
1694 1819
 models are due only to sampling variation).
1695 1820
 
1696 1821
 \bslandscape
... ...
@@ -1717,6 +1842,7 @@ pander(benchmark_1[9:16,
1717 1842
  				  "keepEvery",
1718 1843
 				  "PDBaseline",
1719 1844
 				  "n2")],
1845
+				  justify = c('left', rep('right', 11)), ##  o.w. hlines not right
1720 1846
 ## caption = "\\label{tab:timing3} Benchmarks of models in Table \\@ref(tab:bench1) and \\@ref(tab:bench1b) when run with `onlyCancer = FALSE`."
1721 1847
 				  )	
1722 1848
 	
... ...
@@ -1745,7 +1871,7 @@ To make it easier to compare results with those of the next section, Table
1745 1871
 
1746 1872
 \bslandscape
1747 1873
 
1748
-Table: (\#tab:allr1bck) Benchmarks of all models in Tables \@ref(tab:bench1), \@ref(tab:bench1b) and \@ref(tab:timing3).  
1874
+Table: (\#tab:allr1bck) Benchmarks of all models in Tables \@ref(tab:bench1), \@ref(tab:bench1b), and \@ref(tab:timing3).  
1749 1875
 ```{r bench1dx0, eval=TRUE, echo = FALSE}
1750 1876
 panderOptions("table.split.table", 99999999)
1751 1877
 ## panderOptions("table.split.cells", 900)  ## For HTML
... ...
@@ -1761,7 +1887,8 @@ pander(benchmark_1[ , c("Elapsed Time, average per simulation (s)",
1761 1887
 				  "Final Time, median", "Total Population Size, median", 
1762 1888
 				  "Total Population Size, mean", "Total Population Size, max.",
1763 1889
  	              "keepEvery", "PDBaseline", "n2", "onlyCancer")], 
1764
-				  ## caption = "\\label{tab:allr1bck}Benchmarks of all models in Tables \\@ref(tab:bench1), \\@ref(tab:bench1b)  and \\@ref(tab:timing3)."  
1890
+				  justify = c('left', rep('right', 12)), ##  o.w. hlines not right
1891
+				  ## caption = "\\label{tab:allr1bck}Benchmarks of all models in Tables \\@ref(tab:bench1), \\@ref(tab:bench1b),  and \\@ref(tab:timing3)."  
1765 1892
 				  )  
1766 1893
 ```
1767 1894
 
... ...
@@ -1773,11 +1900,12 @@ pander(benchmark_1[ , c("Elapsed Time, average per simulation (s)",
1773 1900
 
1774 1901
 ### Changing fitness: $s=0.1$ and $s=0.05$ {#bench1xf}
1775 1902
 
1776
-In the above fitness specification the fitness effect of each gene (when
1777
-its restrictions are satisfied) is $s = 0.1$ (see section \@ref(numfit)
1778
-for details). Here we rerun all the above benchmarks using $s= 0.05$ and
1779
-results are shown below in Table \@ref(tab:timing3xf); the results from these
1780
-benchmarks are available as `data(benchmark_1_0.05)`:
1903
+In the above fitness specification the fitness effect of each gene
1904
+(when its restrictions are satisfied) is $s = 0.1$ (see section
1905
+\@ref(numfit) for details). Here we rerun all the above benchmarks
1906
+using $s= 0.05$ (the results from these benchmarks are available as
1907
+`data(benchmark_1_0.05)`) and results are shown below in Table
1908
+\@ref(tab:timing3xf).
1781 1909
 
1782 1910
 \bslandscape
1783 1911
 
... ...
@@ -1821,6 +1949,7 @@ pander(benchmark_1_0.05[ , c("Elapsed Time, average per simulation (s)",
1821 1949
 				  "Total Population Size, median", 
1822 1950
 				  "Total Population Size, mean", "Total Population Size, max.",
1823 1951
  	              "keepEvery", "PDBaseline", "n2", "onlyCancer")], 
1952
+				  justify = c('left', rep('right', 12)), ##  o.w. hlines not right
1824 1953
  	              ## caption = "\\label{tab:timing3xf}Benchmarks of all models in Table \\@ref(tab:allr1bck) using $s=0.05$ (instead of $s=0.1$)."  
1825 1954
 )  
1826 1955
 				  
... ...
@@ -1877,10 +2006,10 @@ iterations and, thus, running time.
1877 2006
 
1878 2007
 
1879 2008
 
1880
-The moral here is that in complex simulations like this, the effects
1881
-of some parameters ($s$ in this case) might look counter-intuitive
1882
-at first. Thus the need to "experiment before launching a large
1883
-number of simulations". 
2009
+The moral here is that in complex simulations like this (and most
2010
+simulations are complex), the effects of some parameters ($s$ in this
2011
+case) might look counter-intuitive at first. Thus the need to "experiment
2012
+before launching a large number of simulations".
1884 2013
 
1885 2014
 
1886 2015
 ## Several "common use cases" runs {#benchusual}
... ...
@@ -1923,12 +2052,12 @@ rf12 <- allFitnessEffects(genotFitness = rfl12)
1923 2052
 
1924 2053
 
1925 2054
 ## Independent genes; positive fitness from exponential distribution
1926
-## mean around 0.1, and negative from exponential with mean around 
1927
-## 0.02. Half of genes positive fitness effects, half negative.
2055
+## with mean around 0.1, and negative from exponential with mean 
2056
+## around -0.02. Half of genes positive fitness effects, half 
2057
+## negative.
1928 2058
 
1929
-ng <- 200
1930
-re_200 <- allFitnessEffects(noIntGenes = c(rexp(ng/2, 10), 
1931
-                                           -rexp(ng/2, 50)))
2059
+ng <- 200 re_200 <- allFitnessEffects(noIntGenes = c(rexp(ng/2, 10),
2060
+-rexp(ng/2, 50)))
1932 2061
 
1933 2062
 ng <- 500
1934 2063
 re_500 <- allFitnessEffects(noIntGenes = c(rexp(ng/2, 10), 
... ...
@@ -2001,7 +2130,7 @@ oncoSimulPop(Nindiv,
2001 2130
 ```
2002 2131
 
2003 2132
 
2004
-For the exponential model we will stop simulations when the populations
2133
+For the exponential model we will stop simulations when populations
2005 2134
 have $>10^6$ cells (simulations start from 500 cells). For the McFarland
2006 2135
 model we will use the `detectionProb` mechanism (see section
2007 2136
 \@ref(detectprob) for details); we could have used as stopping mechanism
... ...
@@ -2009,45 +2138,46 @@ model we will use the `detectionProb` mechanism (see section
2009 2138
 reaching cancer, as argued in [@McFarland2013]) but we want to provide
2010 2139
 further examples under the `detectionProb` mechanism. We will start from 1000
2011 2140
 cells, not 500 (starting from 1000 we almost always reach cancer in a
2012
-single execution).
2141
+single run).
2013 2142
 
2014 2143
 
2015 2144
 Why not use the `detectionProb` mechanism with the `Exp` models?  Because
2016 2145
 it can be hard to intuitively understand what are reasonable settings for
2017 2146
 the parameters of the `detectionProb` mechanism when used in a population
2018 2147
 that is growing exponentially, especially if different genes have very
2019
-different effects on fitness and we are using fitness specifications that
2020
-are so different (compare the fitness landscape of six genes, the pancreas
2021
-specification, and the fitness specification with 4000 genes with fitness
2022
-effects drawn from an exponential distribution ---`re_4000`). In contrast,
2023
-the `detectionProb` mechanism might be simpler to reason about in a
2024
-population that is growing under a model of carrying capacity with
2025
-possibly large periods of stasis. Let us emphasize that it is not that the
2026
-`detectionProb` mechanism does not make sens with the Exp model; it is
2027
-simply that the parameters might need finer adjustment for them to make
2028
-sense, and in these benchmarks we are dealing with widely different
2029
-fitness specifications.
2148
+different effects on fitness. Moreover, we are using fitness
2149
+specifications that are very different (compare the fitness landscape of
2150
+six genes, the pancreas specification, and the fitness specification with
2151
+4000 genes with fitness effects drawn from an exponential distribution
2152
+---`re_4000`). In contrast, the `detectionProb` mechanism might be simpler
2153
+to reason about in a population that is growing under a model of carrying
2154
+capacity with possibly large periods of stasis. Let us emphasize that it
2155
+is not that the `detectionProb` mechanism does not make sense with the Exp
2156
+model; it is simply that the parameters might need finer adjustment for
2157
+them to make sense, and in these benchmarks we are dealing with widely
2158
+different fitness specifications.
2030 2159
 
2031 2160
 
2032 2161
 Note also that we specify `checkSizePEvery = 4` (instead of the default,
2033 2162
 which is 20). Why? Because the fitness specifications where fitness
2034 2163
 effects are drawn from exponential distributions (`re_200` to `re_4000`
2035 2164
 above) include many genes (well, up to 4000) some of them with possibly
2036
-very large effects. In these conditions, simulations will run very fast in
2165
+very large effects. In these conditions, simulations can run very fast in
2037 2166
 the sense of "units of time". If we check exiting conditions every 20
2038
-units the population could have increased size several orders of magnitude
2039
-in between checks (this is also discussed in sections \@ref(detectprob)
2040
-and \@ref(bench1xf)). You can verify this by running the script with other
2041
-settings for `checkSizePEvery` (and being aware that large settings might
2042
-require you to wait for a long time). To ensure that populations have
2043
-really grown, we have increased the setting of `PDBaseline` so that no
2044
-simulation can be considered for stopping unless its size is 1.4 times
2045
-larger than `initSize`.
2167
+units the population could have increased its size several orders of
2168
+magnitude in between checks (this is also discussed in sections
2169
+\@ref(bench1xf) and \@ref(detectprob)). You can verify this by running the
2170
+script with other settings for `checkSizePEvery` (and being aware that
2171
+large settings might require you to wait for a long time). To ensure that
2172
+populations have really grown, we have increased the setting of
2173
+`PDBaseline` so that no simulation can be considered for stopping unless
2174
+its size is 1.4 times larger than `initSize`.
2046 2175
 
2047
-In all cases we use `keepEvery = 1` and `keepPhylog = TRUE`. Finally, we
2048
-run all models with `errorHitWallTime = FALSE` and `errorHitMaxTries =
2049
-FALSE` so that we can see results even if stopping conditions are not
2050
-met.
2176
+In all cases we use `keepEvery = 1` and `keepPhylog = TRUE` (so we store
2177
+the population sizes of all clones every 1 time unit and we keep the
2178
+complete genealogy of clones). Finally, we run all models with
2179
+`errorHitWallTime = FALSE` and `errorHitMaxTries = FALSE` so that we can
2180
+see results even if stopping conditions are not met.
2051 2181
 
2052 2182
 
2053 2183
 
... ...
@@ -2120,7 +2250,7 @@ The McFL model with random fitness landscape `rf12` and with `pancr` does
2120 2250
 not satisfy the conditions of `detectionProb` in most cases: its median
2121 2251
 final time is 5000, which was the maximum final time specified. This
2122 2252
 suggests that the fitness landscape is such that it is unlikely that we
2123
-will reach population sizes $> 1400$ (remember we increase the setting for
2253
+will reach population sizes $> 1400$ (remember we the setting for
2124 2254
 `PDBaseline`) before 5000 time units. There is nothing particular about
2125 2255
 using a fitness landscape of 12 genes and other runs in other 12-gene
2126 2256
 random fitness landscapes do not show this pattern. However, complex
... ...
@@ -2161,14 +2291,15 @@ a `checkSizePEvery = 20`  probably would not have made sense.
2161 2291
 
2162 2292
 
2163 2293
 Finally, it is interesting that in the cases examined here, the two
2164
-slowest running simulations are from "Exp", with fitnesses `re_2000` and
2165
-`re_4000` (and the third slowest is also Exp, under `re_500`). These are
2166
-the cases with the largest number of clones. Why? In the "Exp" model there
2167
-is no competition, and fitness specifications `re_2000` and `re_4000` have
2168
-genomes with many genes with positive fitness contributions. It is thus
2169
-very easy to obtain, from the wildtype ancestor, a large number of clones
2170
-all of which have birth rates $>1$ and, thus, clones that are unlikely to
2171
-become extinct.
2294
+slowest running simulations are from "Exp", with fitnesses `re_2000`
2295
+and `re_4000` (and the third slowest is also Exp, under
2296
+`re_500`). These are also the cases with the largest number of
2297
+clones. Why? In the "Exp" model there is no competition, and fitness
2298
+specifications `re_2000` and `re_4000` have genomes with many genes
2299
+with positive fitness contributions. It is thus very easy to obtain,
2300
+from the wildtype ancestor, a large number of clones all of which
2301
+have birth rates $>1$ and, thus, clones that are unlikely to become
2302
+extinct.
2172 2303
 
2173 2304
 
2174 2305
 
... ...
@@ -2177,8 +2308,8 @@ become extinct.
2177 2308
 
2178 2309
 We will now rerun the simulations above changing the following:
2179 2310
 
2180
-- `finalTime` is  set to 25000.
2181
-- `onlyCancer` is  set to TRUE.
2311
+- `finalTime`  set to 25000.
2312
+- `onlyCancer`  set to TRUE.
2182 2313
 - The "Exp" models will stop when population size $> 10^5$.
2183 2314
 
2184 2315
 
... ...
@@ -2245,7 +2376,7 @@ iterations. Interestingly, notice that the median final time is close to
2245 2376
 10000, so the runs in \@ref(common1) with maximum final time of 5000 would
2246 2377
 have had a hard time finishing with `onlyCancer = TRUE`.
2247 2378
 
2248
-Forcing simulations to "reach cancer", and just random differences between
2379
+Forcing simulations to "reach cancer" and just random differences between
2249 2380
 the random fitness landscape also affect the McFL run under `rf12`: final
2250 2381
 time is below 5000 and the median number of iterations is about half of
2251 2382
 what was above.
... ...
@@ -2259,17 +2390,17 @@ is much smaller.
2259 2390
 ## Can we use a large number of genes? {#lnum}
2260 2391
 
2261 2392
 Yes. In fact, in OncoSimulR there is no pre-set limit on genome
2262
-size. However, large numbers of genes can lead to unacceptable
2263
-memory usage and/or running time. We discuss several examples next that
2264
-illustrate some of the major issues to consider. Another example
2393
+size. However, large numbers of genes can lead to unacceptably large
2394
+returned object sizes and/or running time. We discuss several examples
2395
+next that illustrate some of the major issues to consider. Another example
2265 2396
 with 50,000 genes is shown in section \@ref(mcf50070).
2266 2397
 
2267 2398
 We have seen in \@ref(bench1) and \@ref(common1) that for the Exp model,
2268 2399
 benchmark results using `detectionProb` require a lot of care and can be
2269
-misleading. Here, we will fix initial and final population
2270
-sizes. Moreover, to avoid the confounding factor of the `onlyCancer =
2271
-TRUE` argument, we will set it to FALSE, so we measure directly the time
2272
-of individual runs.
2400
+misleading. Here, we will fix initial population sizes (to 500) and all
2401
+final population sizes will be set to $\geq 10^6$. In addition, to avoid
2402
+the confounding factor of the `onlyCancer = TRUE` argument, we will set it
2403
+to FALSE, so we measure directly the time of individual runs.
2273 2404
 
2274 2405
 
2275 2406
 ### Exponential model with 10,000 and 50,000 genes {#exp50000}
... ...
@@ -2277,7 +2408,7 @@ of individual runs.
2277 2408
 #### Exponential, 10,000 genes, example 1  {#exp100001}
2278 2409
 
2279 2410
 We will start with 10000 genes and an exponential model, where we
2280
-stop when the population grows over $1e6$ individuals:
2411
+stop when the population grows over $10^6$ individuals:
2281 2412
 
2282 2413
 ```{r exp10000, echo = TRUE, eval = FALSE}
2283 2414
 ng <- 10000
... ...
@@ -2409,10 +2540,10 @@ print(object.size(e_50000), units = "MB")
2409 2540
 ```
2410 2541
 
2411 2542
 Of course, simulations now take longer and the size of the returned
2412
-object is over 7 GB (we are keeping over 7000 clones, even if when
2543
+object is over 7 GB (we are keeping more than 7,000 clones, even if when
2413 2544
 we prune all those that went extinct). 
2414 2545
 
2415
-### Exponential, example 4: 50,000 genes {#exp50000_2}
2546
+#### Exponential, 50,000 genes, example 2 {#exp50000_2}
2416 2547
 
2417 2548
 What if we had not pruned? 
2418 2549
 
... ...
@@ -2452,10 +2583,10 @@ print(object.size(e_50000np), units = "MB")
2452 2583
 
2453 2584
 ```
2454 2585
 The main effect is not on execution time but on object size (it has
2455
-grown by 5 GB).
2586
+grown by 5 GB). We are tracking more than 10,000 clones.
2456 2587
 
2457 2588
 
2458
-#### Exponential,  50,000 genes, example 2 {#exp50000_3}
2589
+#### Exponential,  50,000 genes, example 3 {#exp50000_3}
2459 2590
 
2460 2591
 What about the `mutationPropGrowth` setting? We will rerun the example in
2461 2592
 \@ref(exp500001) leaving `keepEvery = NA` but with the default
... ...
@@ -2512,6 +2643,54 @@ below).
2512 2643
 
2513 2644
 
2514 2645
 
2646
+#### Interlude: where is that 1 GB coming from? {#wheresizefrom}
2647
+
2648
+In section \@ref(exp100001) we have seen an apparently innocuous
2649
+simulation producing a returned object of almost 1 GB. Where is that coming
2650
+from? It means that each simulation produced almost 200 MB of output.
2651
+
2652
+Let us look at one simulation in more detail:
2653
+
2654
+```{r sizedetail, eval = FALSE, echo = TRUE}
2655
+r1 <- oncoSimulIndiv(u,
2656
+                     model = "Exp",
2657
+                     mu = 1e-7,
2658
+                     detectionSize = 1e6,
2659
+                     detectionDrivers = NA,
2660
+                     detectionProb = NA,
2661
+                     keepPhylog = TRUE,
2662
+                     onlyCancer = FALSE,
2663
+                     mutationPropGrowth = TRUE
2664
+                     )
2665
+summary(r1)[c(1, 8)]
2666
+##   NumClones  FinalTime
2667
+## 1      3887        345
2668
+
2669
+print(object.size(r1), units = "MB")
2670
+## 160 Mb
2671
+
2672
+## Size of the two largest objects inside:
2673
+sizes <- lapply(r1, function(x) object.size(x)/(1024^2))
2674
+sort(unlist(sizes), decreasing = TRUE)[1:2]
2675
+## Genotypes pops.by.time 
2676
+##       148.28        10.26 
2677
+
2678
+dim(r1$Genotypes)
2679
+## [1] 10000  3887
2680
+```
2681
+
2682
+The above shows the reason: the `Genotypes` matrix is a 10,000 by 3,887
2683
+integer matrix (with a 0 and 1 indicating not-mutated/mutated for each
2684
+gene in each genotype) and in R integers use 4 bytes each. The
2685
+`pops.by.time` matrix is 346 by 3,888 (the 1 in $346 = 345 + 1$ comes from
2686
+starting at 0 and going up to the final time, both included; the 1 in
2687
+$3888 = 3887 + 1$ is from the column of time) double matrix and doubles
2688
+use 8 bytes[^popsbytime].
2689
+
2690
+[^popsbytime]:These matrices do not exist during most of the
2691
+execution of the C++ code; they are generated right before returning
2692
+from the C++ code.
2693
+
2515 2694
 
2516 2695
 ### McFarland model with 50,000 genes; the effect of `keepEvery` {#mc50000}
2517 2696
 
... ...
@@ -2524,7 +2703,8 @@ other settings.
2524 2703
 
2525 2704
 #### McFarland, 50,000 genes, example 1 {#mc50000ex1}
2526 2705
 
2527
-Let's start with  `mutationPropGrowth = FALSE` and `keepEvery = NA`:
2706
+Let's start with  `mutationPropGrowth = FALSE` and `keepEvery =
2707
+NA`. Simulations end when population size $\geq 10^6$.
2528 2708
 
2529 2709
 ```{r mc50000_1, echo = TRUE, eval = FALSE}
2530 2710
 ng <- 50000
... ...
@@ -2602,17 +2782,18 @@ print(object.size(mc_50000_nmpg_k), units = "MB")
2602 2782
 ## 8101.4 Mb
2603 2783
 ```
2604 2784
 
2605
-Computing time increases slightly but the major effect is seen on
2606
-the size of the returned object, that increases by a factor of about
2607
-4x, up to 8 GB, corresponding to the increase in about 4x in the
2608
-number of clones being tracked (see details of where size comes from in
2785
+Computing time increases slightly but the major effect is seen on the size
2786
+of the returned object, that increases by a factor of about 4x, up to 8
2787
+GB, corresponding to the increase in about 4x in the number of clones
2788
+being tracked (see details of where the size of this object comes from in
2609 2789
 section \@ref(wheresizefrom)).
2610 2790
 
2611 2791
 
2612 2792
 #### McFarland, 50,000 genes, example 3 {#mc50000ex3}
2613 2793
 
2614 2794
 We will set `keepEvery = NA` again, but we will now increase
2615
-detection size by a factor of 3:
2795
+detection size by a factor of 3 (so we stop when total population
2796
+size becomes $\geq 3 * 10^6$).
2616 2797
 
2617 2798
 ```{r mc50000_popx, echo = TRUE, eval = FALSE}
2618 2799
 ng <- 50000
... ...
@@ -2746,15 +2927,15 @@ print(object.size(mc_50000_nmpg_5mu_k), units = "MB")
2746 2927
 
2747 2928
 We have already seen these effects before in section
2748 2929
 \@ref(mc50000ex2): using `keepEvery = 1` leads to a slight increase
2749
-in execution time. What is really affected is the size of the returned
2750
-object which increases by a factor of about 3x (and is now over
2751
-20GB). That 3x corresponds, of course, to the increase in the number
2752
-of clones being tracked. This, by the way, also allows us to
2753
-understand the comment above, where we said that in these two cases
2754