git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@125028 bc3139a8-67e5-0310-9ffc-ced21a209358

Ramon Diaz-Uriarte authored on 12/12/2016 14:47:32
Showing3 changed files

... ...
@@ -1,7 +1,7 @@
1 1
 Package: OncoSimulR
2 2
 Type: Package
3 3
 Title: Forward Genetic Simulation of Cancer Progression with Epistasis 
4
-Version: 2.5.3
4
+Version: 2.5.4
5 5
 Date: 2016-12-12
6 6
 Authors@R: c(person("Ramon", "Diaz-Uriarte", role = c("aut", "cre"),
7 7
 		     email = "rdiaz02@gmail.com"),
... ...
@@ -1,4 +1,7 @@
1
-Changes in version 2.5.2 (2016-12-12):
1
+Changes in version 2.5.4 (2016-12-12):
2
+	- Vignette: miscell changes (order of examples, typos, etc)
3
+
4
+Changes in version 2.5.3 (2016-12-12):
2 5
 	- Vignette uses pander in tables.
3 6
 	- Typos fixed and other enhancements in vignette.
4 7
 	
... ...
@@ -152,10 +152,11 @@ require(pander)
152 152
 
153 153
 # Introduction {#introdd}
154 154
  
155
-OncoSimulR is an individual-based forward-time genetic simulator for
156
-biallelic markers (wildtype vs. mutated) in asexually reproducing
157
-populations without spatial structure (perfect mixing). Its design
158
-emphasizes flexible specification of fitness and mutator effects.
155
+OncoSimulR is an individual- or clone-based forward-time genetic
156
+simulator for biallelic markers (wildtype vs. mutated) in asexually
157
+reproducing populations without spatial structure (perfect
158
+mixing). Its design emphasizes flexible specification of fitness and
159
+mutator effects.
159 160
  
160 161
  
161 162
  
... ...
@@ -512,24 +513,26 @@ set.seed(7)
512 513
 RNGkind("L'Ecuyer-CMRG")
513 514
 ```
514 515
 
515
-### Epistasis and predictability in rugged fitness landscapes in asexual populations {#ex-ochs}
516
+### Sign epistasis and probability of crossing fitness valleys {#ex-ochs}
516 517
 
517
-These questions encompass a wide range of specific issues that have been
518
-addressed in evolutionary genetics studies and which include from detailed
518
+These questions, together with the next one (\@ref(ex-predict)),
519
+encompass a wide range of specific issues that have been addressed
520
+in evolutionary genetics studies and which include from detailed
519 521
 analysis of simple models with a few uphill paths and valleys as in
520
-@Weissman2009 or @Ochs2015, to questions that refer to larger, more complex
521
-fitness landscapes as in @szendro_predictability_2013 or
522
-@franke_evolutionary_2011.
522
+@Weissman2009 or @Ochs2015, to questions that refer to larger, more
523
+complex fitness landscapes as in @szendro_predictability_2013 or
524
+@franke_evolutionary_2011 (see below).
523 525
 
524 526
 
525
-For the first case, we could specify the fitness landscape and run
526
-simulations until fixation (with argument `fixation` to `oncoSimulPop`
527
-model). We would then examine the proportion of genotypes fixed under
528
-different scenarios. For instance, we can use the example from @Ochs2015 (we
529
-will see this example also in section \@ref(ochsdesai), where we cover
530
-different ways of specifying fitness). And we can extend this example by
531
-adding mutator genes:
527
+Using as an example @Ochs2015, we could specify the fitness
528
+landscape and run simulations until fixation (with argument
529
+`fixation` to `oncoSimulPop` ---see more details in section
530
+\@ref(fixation) where we also use this same model). We would then
531
+examine the proportion of genotypes fixed under different
532
+scenarios. For instance, we can use the example from @Ochs2015 (we
533
+will see this example also in section \@ref(ochsdesai), where we
534
+cover different ways of specifying fitness). And we can extend this
535
+example by adding mutator genes:
532 536
 
533 537
 
534 538
 ```{r hiddenochs, echo=FALSE}
... ...
@@ -588,11 +591,13 @@ sampledGenotypes(samplePop(od_sim))
588 591
 set.seed(NULL)
589 592
 ```
590 593
 
594
+### Predictability of evolution in complex fitness landscapes {#ex-predict}
591 595
 
592
-For the second set of questions, we would run simulations under random
593
-fitness landscapes with varied ruggedness, and would then examine the
594
-evolutionary predictability of the trajectories with measures such as "Lines
595
-of Descent" and "Path of the Maximum" [@szendro_predictability_2013] and the
596
+Focusing now on predictability in more general fitness landscapes,
597
+we would run simulations under random fitness landscapes with varied
598
+ruggedness, and would then examine the evolutionary predictability
599
+of the trajectories with measures such as "Lines of Descent" and
600
+"Path of the Maximum" [@szendro_predictability_2013] and the
596 601
 diversity of the sampled genotypes under different sampling regimes.
597 602
 
598 603
 
... ...
@@ -651,7 +656,103 @@ set.seed(NULL)
651 656
 ```
652 657
 
653 658
 
654
-### Epistatic interactions between drivers and passengers in cancer initiation {#exbauer}
659
+
660
+### Mutator and antimutator genes {#ex-mut-antimut}
661
+
662
+The effects of mutator and antimutator genes have been examined both
663
+in cancer genetics [@nowak_evolutionary_2006,
664
+@tomlinson_mutation_1996] and in evolutionary genetics
665
+[@gerrish_complete_2007], and are related to wider issues such as
666
+Muller's ratchet and the evolution of sex. There are, thus, a large
667
+range of questions related to mutator and antimutator genes.
668
+
669
+
670
+One question addressed in @tomlinson_mutation_1996 concerns under what
671
+circumstances mutator genes are likely to play a role in cancer
672
+progression. For instance, @tomlinson_mutation_1996 find that an increased
673
+mutation rate is more likely to matter if the number of required mutations
674
+in driver genes needed to reach cancer is large and if the mutator effect is
675
+large.
676
+
677
+
678
+We might want to ask, then, how long it takes before we reach
679
+cancer. This is stored in the component `FinalTime` of the
680
+output. We would specify different numbers and effects of mutator
681
+genes (argument `muEF`). We would also change the criteria for
682
+reaching cancer and in our case we can easily do that by specifying
683
+different numbers in `detectionDrivers`. Of course, we would also
684
+want to examine the effects of varying numbers of mutators, drivers,
685
+and possibly fitness consequences of mutators (below we assume
686
+mutators are neutral and we assume there are no additional genes
687
+with deleterious mutations, but this need not be so, of course; see
688
+also [@tomlinson_mutation_1996, @gerrish_complete_2007, @McFarland2014]).
689
+
690
+
691
+Let us run an example. For the sake of simplicity, we assume no
692
+epistatic interactions.
693
+
694
+```{r ex-tomlin1}
695
+sd <- 0.1 ## fitness effect of drivers
696
+sm <- 0 ## fitness effect of mutator
697
+nd <- 20 ## number of drivers
698
+nm <- 5  ## number of mutators
699
+mut <- 10 ## mutator effect
700
+
701
+fitnessGenesVector <- c(rep(sd, nd), rep(sm, nm))
702
+names(fitnessGenesVector) <- 1:(nd + nm)
703
+mutatorGenesVector <- rep(mut, nm)
704
+names(mutatorGenesVector) <- (nd + 1):(nd + nm)
705
+
706
+ft <- allFitnessEffects(noIntGenes = fitnessGenesVector,
707
+                        drvNames = 1:nd)
708
+mt <- allMutatorEffects(noIntGenes = mutatorGenesVector)
709
+
710
+```
711
+
712
+
713
+Now, simulate using the fitness and mutator specification. We fix
714
+the number of drivers to cancer, and we stop when those numbers of
715
+drivers are reached. Since we only care about the time it takes to
716
+reach cancer, not the actual trajectories, we set `keepEvery = NA`:
717
+
718
+```{r hiddentom, echo=FALSE}
719
+set.seed(2)
720
+RNGkind("L'Ecuyer-CMRG")
721
+```
722
+
723
+```{r ex-tomlin2}
724
+ddr <- 4
725
+st <- oncoSimulPop(4, ft, muEF = mt,
726
+                   detectionDrivers = ddr,
727
+                   finalTime = NA,
728
+                   detectionSize = NA,
729
+                   detectionProb = NA,
730
+                   onlyCancer = TRUE,
731
+                   keepEvery = NA, 
732
+                   mc.cores = 2, ## adapt to your hardware
733
+                   seed = NULL) ## for reproducibility
734
+
735
+## How long did it take to reach cancer?
736
+unlist(lapply(st, function(x) x$FinalTime))
737
+
738
+```
739
+```{r hidden-rng-tom, echo = FALSE}
740
+set.seed(NULL)
741
+```
742
+
743
+
744
+
745
+(Incidentally, notice that it is easy to get OncoSimulR to throw an
746
+exception if you do not think twice about the mutator effects you are using
747
+and specify a huge mutation rate if all mutator genes are mutated: see
748
+\@(tomlin-except).)
749
+
750
+
751
+
752
+
753
+### Epistatic interactions between drivers and passengers in cancer and the consequences of order effects {#exbauer}
754
+
755
+#### Epistatic interactions between drivers and passengers
655 756
 
656 757
 @Bauer2014 have examined the effects of epistatic relationships
657 758
 between drivers and passengers in cancer initiation. We could use
... ...
@@ -711,7 +812,9 @@ sum(summary(sb1)[, "TotalPopSize"] > 0)/totalpops
711 812
 ```{r hidden-rng-exbau, echo = FALSE}
712 813
 set.seed(NULL)
713 814
 ```
714
-### Epistatic interactions between drivers and passengers in cancer initiation {#exorder1intro}
815
+
816
+
817
+#### Consequences of order effects for cancer initiation {#exorder1intro}
715 818
 
716 819
 Instead of focusing on different models for epistatic interactions,
717 820
 you might want to examine the consequences of order effects
... ...
@@ -779,97 +882,6 @@ set.seed(NULL)
779 882
 
780 883
 
781 884
 
782
-### Effects of mutator and antimutator genes {#ex-mut-antimut}
783
-
784
-The effects of mutator and antimutator genes have been examined both in
785
-cancer genetics [@nowak_evolutionary_2006, @tomlinson_mutation_1996] and in
786
-evolutionary genetics [@gerrish_complete_2007], and are related to wider
787
-issues such as Muller's ratchet and the evolution of sex. There are, thus, a
788
-large range of questions related to mutator and antimutator genes.
789
-
790
-
791
-One question addressed in @tomlinson_mutation_1996 concerns under what
792
-circumstances mutator genes are likely to play a role in cancer
793
-progression. For instance, @tomlinson_mutation_1996 find that an increased
794
-mutation rate is more likely to matter if the number of required mutations
795
-in driver genes needed to reach cancer is large and if the mutator effect is
796
-large.
797
-
798
-
799
-We might want to ask, then, how long it takes before we reach
800
-cancer. This is stored in the component `FinalTime` of the
801
-output. We would specify different numbers and effects of mutator
802
-genes (argument `muEF`). We would also change the criteria for
803
-reaching cancer and in our case we can easily do that by specifying
804
-different numbers in `detectionDrivers`. Of course, we would also
805
-want to examine the effects of varying numbers of mutators, drivers,
806
-and possibly fitness consequences of mutators (below we assume
807
-mutators are neutral and we assume there are no additional genes
808
-with deleterious mutations, but this need not be so, of course; see
809
-also [@tomlinson_mutation_1996, @gerrish_complete_2007, @McFarland2014]).
810
-
811
-
812
-Let us run an example. For the sake of simplicity, we assume no
813
-epistatic interactions.
814
-
815
-```{r ex-tomlin1}
816
-sd <- 0.1 ## fitness effect of drivers
817
-sm <- 0 ## fitness effect of mutator
818
-nd <- 20 ## number of drivers
819
-nm <- 5  ## number of mutators
820
-mut <- 10 ## mutator effect
821
-
822
-fitnessGenesVector <- c(rep(sd, nd), rep(sm, nm))
823
-names(fitnessGenesVector) <- 1:(nd + nm)
824
-mutatorGenesVector <- rep(mut, nm)
825
-names(mutatorGenesVector) <- (nd + 1):(nd + nm)
826
-
827
-ft <- allFitnessEffects(noIntGenes = fitnessGenesVector,
828
-                        drvNames = 1:nd)
829
-mt <- allMutatorEffects(noIntGenes = mutatorGenesVector)
830
-
831
-```
832
-
833
-
834
-Now, simulate using the fitness and mutator specification. We fix
835
-the number of drivers to cancer, and we stop when those numbers of
836
-drivers are reached. Since we only care about the time it takes to
837
-reach cancer, not the actual trajectories, we set `keepEvery = NA`:
838
-
839
-```{r hiddentom, echo=FALSE}
840
-set.seed(2)
841
-RNGkind("L'Ecuyer-CMRG")
842
-```
843
-
844
-```{r ex-tomlin2}
845
-ddr <- 4
846
-st <- oncoSimulPop(4, ft, muEF = mt,
847
-                   detectionDrivers = ddr,
848
-                   finalTime = NA,
849
-                   detectionSize = NA,
850
-                   detectionProb = NA,
851
-                   onlyCancer = TRUE,
852
-                   keepEvery = NA, 
853
-                   mc.cores = 2, ## adapt to your hardware
854
-                   seed = NULL) ## for reproducibility
855
-
856
-## How long did it take to reach cancer?
857
-unlist(lapply(st, function(x) x$FinalTime))
858
-
859
-```
860
-```{r hidden-rng-tom, echo = FALSE}
861
-set.seed(NULL)
862
-```
863
-
864
-
865
-
866
-(Incidentally, notice that it is easy to get OncoSimulR to throw an
867
-exception if you do not think twice about the mutator effects you are using
868
-and specify a huge mutation rate if all mutator genes are mutated: see
869
-\@(tomlin-except).)
870
-
871
-
872
-
873 885
 
874 886
 
875 887
 ## Trade-offs and what is OncoSimulR not well suited for {#whatnotfor}
... ...
@@ -1183,7 +1195,7 @@ version 2. Please note that **the functionality of version 1 will soon be remove
1183 1195
 
1184 1196
 # Running time and space consumption of OncoSimulR {#timings}
1185 1197
 
1186
-Time to complete the simulations and size of return objects (space
1198
+Time to complete the simulations and size of returned objects (space
1187 1199
 consumption) depend on several, interacting factors. The usual rule
1188 1200
 of "experiment before launching a large number of simulations"
1189 1201
 applies, but here we will walk through several cases to get a
... ...
@@ -1721,7 +1733,7 @@ Now most simulations under the exponential model end up in extinction, as
1721 1733
 seen by the median population size of 0 (but not all, as the mean and
1722 1734
 max. population size are clearly away from zero). Consequently,
1723 1735
 simulations under the exponential model are now faster (and the size of
1724
-the average return object is smaller). Of course, whether one should run
1736
+the average returned object is smaller). Of course, whether one should run
1725 1737
 simulations with `onlyCancer = TRUE` or `onlyCancer = FALSE` will depend
1726 1738
 on the question being asked (see, for example, section \@ref(exbauer) for
1727 1739
 a question where we will naturally want to use `onlyCancer = FALSE`).
... ...
@@ -2101,7 +2113,7 @@ pander(benchmark_2[ , c(
2101 2113
 \clearpage
2102 2114
 
2103 2115
 In most cases, simulations run reasonably fast (under 0.1 seconds per
2104
-individual simulation) and the return objects are small. I will only
2116
+individual simulation) and the returned objects are small. I will only
2105 2117
 focus on a few cases.
2106 2118
 
2107 2119
 The McFL model with random fitness landscape `rf12` and with `pancr` does
... ...
@@ -2248,7 +2260,7 @@ is much smaller.
2248 2260
 
2249 2261
 Yes. In fact, in OncoSimulR there is no pre-set limit on genome
2250 2262
 size. However, large numbers of genes can lead to unacceptable
2251
-memory usage or running time. We discuss several examples next that
2263
+memory usage and/or running time. We discuss several examples next that
2252 2264
 illustrate some of the major issues to consider. Another example
2253 2265
 with 50,000 genes is shown in section \@ref(mcf50070).
2254 2266
 
... ...
@@ -2269,7 +2281,8 @@ stop when the population grows over $1e6$ individuals:
2269 2281
 
2270 2282
 ```{r exp10000, echo = TRUE, eval = FALSE}
2271 2283
 ng <- 10000
2272
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2284
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2285
+                                      rep(-0.1, ng/2)))
2273 2286
 
2274 2287
 t_e_10000 <- system.time(
2275 2288
     e_10000 <- oncoSimulPop(5, u, model = "Exp", mu = 1e-7,
... ...
@@ -2302,14 +2315,14 @@ print(object.size(e_10000), units = "MB")
2302 2315
 ```
2303 2316
 
2304 2317
 Each simulation takes about 1 second but note that the number of clones
2305
-for most simulations is already over 4000 and that the size of the return
2318
+for most simulations is already over 4000 and that the size of the returned
2306 2319
 object is close to 1 GB (a more detailed explanation of where this 1 GB
2307 2320
 comes from is deferred until section \@ref(wheresizefrom)).
2308 2321
 
2309 2322
 
2310 2323
 #### Exponential, 10,000 genes, example 2 {#exp10000_2}
2311 2324
 
2312
-We can decrease the size of the return object if we use the `keepEvery =
2325
+We can decrease the size of the returned object if we use the `keepEvery =
2313 2326
 NA` argument (this setting was explained in detail in section
2314 2327
 \@ref(bench1)):
2315 2328
 
... ...
@@ -2361,7 +2374,8 @@ reasonable decision depends on the problem; see also below.
2361 2374
 
2362 2375
 ```{r exp50000, echo = TRUE, eval = FALSE}
2363 2376
 ng <- 50000
2364
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2377
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2378
+                                      rep(-0.1, ng/2)))
2365 2379
 t_e_50000 <- system.time(
2366 2380
     e_50000 <- oncoSimulPop(5,
2367 2381
                             u,
... ...
@@ -2394,7 +2408,7 @@ print(object.size(e_50000), units = "MB")
2394 2408
 ## 7598.6 Mb
2395 2409
 ```
2396 2410
 
2397
-Of course, simulations now take longer and the size of the return
2411
+Of course, simulations now take longer and the size of the returned
2398 2412
 object is over 7 GB (we are keeping over 7000 clones, even if when
2399 2413
 we prune all those that went extinct). 
2400 2414
 
... ...
@@ -2404,7 +2418,8 @@ What if we had not pruned?
2404 2418
 
2405 2419
 ```{r exp50000np, echo = TRUE, eval = FALSE}
2406 2420
 ng <- 50000
2407
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2421
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2422
+                                      rep(-0.1, ng/2)))
2408 2423
 t_e_50000np <- system.time(
2409 2424
     e_50000np <- oncoSimulPop(5,
2410 2425
                               u,
... ...
@@ -2449,7 +2464,8 @@ What about the `mutationPropGrowth` setting? We will rerun the example in
2449 2464
 ```{r exp50000mpg, echo = TRUE, eval = FALSE}
2450 2465
 
2451 2466
 ng <- 50000
2452
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2467
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2468
+                                      rep(-0.1, ng/2)))
2453 2469
 
2454 2470
 t_e_50000c <- system.time(
2455 2471
     e_50000c <- oncoSimulPop(5,
... ...
@@ -2491,7 +2507,7 @@ time and the size of the object increases by almost 3 GB.
2491 2507
 
2492 2508
 What about larger population sizes or larger mutation rates? The
2493 2509
 number of clones starts growing fast, which means much slower
2494
-execution times and much larger return objects (see also the examples
2510
+execution times and much larger returned objects (see also the examples
2495 2511
 below).
2496 2512
 
2497 2513
 
... ...
@@ -2512,7 +2528,8 @@ Let's start with  `mutationPropGrowth = FALSE` and `keepEvery = NA`:
2512 2528
 
2513 2529
 ```{r mc50000_1, echo = TRUE, eval = FALSE}
2514 2530
 ng <- 50000
2515
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2531
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2532
+                                      rep(-0.1, ng/2)))
2516 2533
 
2517 2534
 t_mc_50000_nmpg <- system.time(
2518 2535
     mc_50000_nmpg <- oncoSimulPop(5,
... ...
@@ -2586,7 +2603,7 @@ print(object.size(mc_50000_nmpg_k), units = "MB")
2586 2603
 ```
2587 2604
 
2588 2605
 Computing time increases slightly but the major effect is seen on
2589
-the size of the return object, that increases by a factor of about
2606
+the size of the returned object, that increases by a factor of about
2590 2607
 4x, up to 8 GB, corresponding to the increase in about 4x in the
2591 2608
 number of clones being tracked (see details of where size comes from in
2592 2609
 section \@ref(wheresizefrom)).
... ...
@@ -2599,7 +2616,8 @@ detection size by a factor of 3:
2599 2616
 
2600 2617
 ```{r mc50000_popx, echo = TRUE, eval = FALSE}
2601 2618
 ng <- 50000
2602
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2619
+u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), 
2620
+                                      rep(-0.1, ng/2)))
2603 2621
 
2604 2622
 t_mc_50000_nmpg_3e6 <- system.time(
2605 2623
     mc_50000_nmpg_3e6 <- oncoSimulPop(5,
... ...
@@ -2643,8 +2661,6 @@ Let us use the same `detectionSize = 1e6` as in the first example
2643 2661
 (\@ref(mc50000ex1)), but with 5x the mutation rate:
2644 2662
 
2645 2663
 ```{r mc50000_mux, echo = TRUE, eval = FALSE}
2646
-ng <- 50000
2647
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2648 2664
 
2649 2665
 t_mc_50000_nmpg_5mu <- system.time(
2650 2666
     mc_50000_nmpg_5mu <- oncoSimulPop(5,
... ...
@@ -2677,22 +2693,23 @@ print(object.size(mc_50000_nmpg_5mu), units = "MB")
2677 2693
 ## 8314.4 Mb
2678 2694
 ``` 
2679 2695
 
2680
-The number of clones we are tracking is about 4x the number of clones of
2681
-the first example (\@ref(mc50000ex1)), and roughly similar to the number
2682
-of clones of the second example (\@ref(mc50000ex2)), and size of return
2683
-object is similar to that of the second example.  But computing time has
2684
-increased by a factor of about 5x and iterations have increased by a
2685
-factor of about 2x. Iterations increase because mutation is more frequent;
2686
-in addition, at each sampling period each iteration needs to do more work
2687
-as it needs to loop over a larger number of clones and this larger number
2688
-includes clones that are not shown here, because they are pruned (they are
2689
-extinct by the time we exit the simulation ---again, pruning is discussed
2690
-with further details in \@ref(prune)).
2696
+The number of clones we are tracking is about 4x the number of
2697
+clones of the first example (\@ref(mc50000ex1)), and roughly similar
2698
+to the number of clones of the second example (\@ref(mc50000ex2)),
2699
+and size of the returned object is similar to that of the second
2700
+example.  But computing time has increased by a factor of about 5x
2701
+and iterations have increased by a factor of about 2x. Iterations
2702
+increase because mutation is more frequent; in addition, at each
2703
+sampling period each iteration needs to do more work as it needs to
2704
+loop over a larger number of clones and this larger number includes
2705
+clones that are not shown here, because they are pruned (they are
2706
+extinct by the time we exit the simulation ---again, pruning is
2707
+discussed with further details in \@ref(prune)).
2691 2708
 
2692 2709
 
2693 2710
 #### McFarland, 50,000 genes, example 5 {#mc50000ex5}
2694 2711
 
2695
-Finally, let's run the above example but with `keepEvery = 1`:
2712
+Now let's run the above example but with `keepEvery = 1`:
2696 2713
 
2697 2714
 ```{r mcf5muk, echo = TRUE, eval = FALSE}
2698 2715
 t_mc_50000_nmpg_5mu_k <- system.time(
... ...
@@ -2729,7 +2746,7 @@ print(object.size(mc_50000_nmpg_5mu_k), units = "MB")
2729 2746
 
2730 2747
 We have already seen these effects before in section
2731 2748
 \@ref(mc50000ex2): using `keepEvery = 1` leads to a slight increase
2732
-in execution time. What is really affected is the size of the return
2749
+in execution time. What is really affected is the size of the returned
2733 2750
 object which increases by a factor of about 3x (and is now over
2734 2751
 20GB). That 3x corresponds, of course, to the increase in the number
2735 2752
 of clones being tracked. This, by the way, also allows us to
... ...
@@ -2748,8 +2765,6 @@ default of `mutationPropGrowth = TRUE`:
2748 2765
 
2749 2766
 
2750 2767
 ```{r mc50000_2, echo = TRUE, eval = FALSE}
2751
-ng <- 50000
2752
-u <- allFitnessEffects(noIntGenes = c(rep(0.1, ng/2), rep(-0.1, ng/2)))
2753 2768
 
2754 2769
 t_mc_50000 <- system.time(
2755 2770
     mc_50000 <- oncoSimulPop(5,
... ...
@@ -2784,7 +2799,7 @@ print(object.size(mc_50000), units = "MB")
2784 2799
 ```
2785 2800
 
2786 2801
 Note the huge increase in computing time (related of course to the huge
2787
-increase in number of iterations) and in the size of the return object: we
2802
+increase in number of iterations) and in the size of the returned object: we
2788 2803
 have gone from having to track about 2000 clones to tracking over 12000
2789 2804
 clones even when we prune all clones without descendants.
2790 2805
 
... ...
@@ -2829,7 +2844,7 @@ print(object.size(mc_50000_nmpg_5mu_k), units = "MB")
2829 2844
 ```
2830 2845
 
2831 2846
 Note we use only two replicates, since those two already lead to a
2832
-24 GB return object as we are tracking more than 60,000 clones, more
2847
+24 GB returned object as we are tracking more than 60,000 clones, more
2833 2848
 than twice those with $s=0.1$.  The reason for the difference in
2834 2849
 number of clones and iterations is of course the change from $s=0.1$
2835 2850
 to $s=0.05$: under the McFarland model to reach population sizes of
... ...
@@ -2881,7 +2896,7 @@ print(object.size(mc_50000_nmpg), units = "MB")
2881 2896
 ```
2882 2897
 Using $s=0.05$ leads to a large increase in final time and number of
2883 2898
 iterations. However, as we are using the `keepEvery = NA` setting,
2884
-the increase in number of clones tracked and in size of return
2899
+the increase in number of clones tracked and in size of returned
2885 2900
 object is relatively small.
2886 2901
 
2887 2902
 
... ...
@@ -2889,7 +2904,7 @@ object is relatively small.
2889 2904
 
2890 2905
 ### The different consequences of `keepEvery = NA` in the Exp and McFL models {#kpexpmc}
2891 2906
 
2892
-We have seen that `keepEvery = NA` often leads to much smaller return
2907
+We have seen that `keepEvery = NA` often leads to much smaller returned
2893 2908
 objects when using the McFarland model than when using the Exp model. Why?
2894 2909
 Because in the McFarland model there is strong competition and there can
2895 2910
 be complete clonal sweeps so that in extreme cases a single clone might be
... ...
@@ -2978,7 +2993,7 @@ often as that in the other two matrices (`pops.by.time` and
2978 2993
 #### Interlude: where is that 1 GB coming from? {#wheresizefrom}
2979 2994
 
2980 2995
 In section \@ref(exp100001) we have seen an apparently innocuous
2981
-simulation producing a return object of almost 1 GB. Where is that coming
2996
+simulation producing a returned object of almost 1 GB. Where is that coming
2982 2997
 from? It means that each simulation produced almost 200 MB of output.
2983 2998
 
2984 2999
 Let us look at one simulation in more detail:
... ...
@@ -7942,7 +7957,7 @@ In @szendro_predictability_2013 "(...) paths defined as the time
7942 7957
 ordered sets of genotypes that at some time contain the largest
7943 7958
 subpopulation" are called "path of the maximum" (POM) (see their
7944 7959
 p. 572). In our case, POM are obtained from the `pops.by.time`
7945
-return object (i.e., from the genotypes at each of the sampling
7960
+returned object (i.e., from the genotypes at each of the sampling
7946 7961
 times) and, thus, the POMs will be affected by how often we sample
7947 7962
 and keep samples (arguments `sampleEvery` and `keepEvery`), since we
7948 7963
 are running a continuous time process.
... ...
@@ -8093,23 +8108,33 @@ Algorithm 5 in the supplementary material).
8093 8108
 
8094 8109
 ## Does OncoSimulR keep track of individuals or of clones? And how can it keep track of such large populations? {#trackindivs}
8095 8110
 
8096
-OncoSimulR keeps track of clones, where a clone is a set of cells that are
8097
-genetically identical (note that this means completely identical over the
8098
-whole set of genes/markers you are using; see section
8099
-\@ref(meaningclone)).  
8111
+OncoSimulR keeps track of clones, where a clone is a set of cells
8112
+that are genetically identical (note that this means completely
8113
+identical over the whole set of genes/markers you are using; see
8114
+section \@ref(meaningclone)). We do not need to keep track of
8115
+individual cells because, for all purposes, and since we do not
8116
+consider spatial structure, two or more cells that are genetically
8117
+identical are interchangeable. This means, for instance, that the
8118
+computational cost of keeping a population of a single clone with 1
8119
+individual or with $10^9$ individuals is exactly the same: we just
8120
+keep track of the genotype and the number of cells. (Sure, it is
8121
+much more likely we will see a mutation soon in a clone with $10^9$
8122
+cells than in a clone with 1, but that is a different issue.)
8123
+
8124
+
8125
+Of course, the entities that die, reproduce, and mutate are
8126
+individual cells. This is of course dealt with by tracking clones
8127
+(as is clearly shown by Algorithms 4 and 5 in @Mather2012). Tracking
8128
+individuals, as individuals, would provide no advantage, but would
8129
+increase the computational burden by many orders of magnitude.
8130
+
8131
+
8132
+
8100 8133
 
8101 8134
 <!-- As we are interested in examining the effects of selection, mutation, -->
8102 8135
 <!-- keeping track of the parent-child relationships between clones, etc, and -->
8103 8136
 <!-- thus we must keep track of the complete set of clones. -->
8104 8137
 
8105
-We do not need to keep track of individual cells because, for all
8106
-purposes, and since we do not consider spatial structure, two or more
8107
-cells that are genetically identical are interchangeable. This means, for
8108
-instance, that the computational cost of keeping a population of a single
8109
-clone with 1 individual or with $10^9$ individuals is exactly the same: we
8110
-just keep track of the genotype and the number of cells. (Sure, it is much
8111
-more likely we will see a mutation soon in a clone with $10^9$ cells than
8112
-in a clone with 1, but that is a different issue.)
8113 8138
 
8114 8139
 
8115 8140
 <!-- (or haplotype if -->