... | ... |
@@ -167,7 +167,9 @@ additive models.} |
167 | 167 |
routines can have trouble (specially if you log) with values <=0. Or |
168 | 168 |
we might have trouble if we want to log the fitness. This is done |
169 | 169 |
after possibly taking logs. Noise is added to prevent creating several |
170 |
- identical minimal fitness values.} |
|
170 |
+ identical minimal fitness values. Note that \code{\link{allFitnessEffects}} will remove from the table |
|
171 |
+ of genotypes any genotype with a fitness <= 1e-9, thus |
|
172 |
+ making it a non-viable genotype during simulations. } |
|
171 | 173 |
|
172 | 174 |
\item{K}{K for NK model; K is the number of loci with which each locus |
173 | 175 |
interacts, and the larger the K the larger the ruggedness of the |
... | ... |
@@ -288,6 +290,9 @@ Optimum model component.} |
288 | 290 |
of the data can be large, specially if \code{g} (the number of genes) |
289 | 291 |
is large. |
290 | 292 |
|
293 |
+ Note that \code{\link{allFitnessEffects}} will remove from the table |
|
294 |
+ of genotypes any genotype with a fitness <= 1e-9, thus |
|
295 |
+ making it a non-viable genotype during simulations. |
|
291 | 296 |
|
292 | 297 |
} |
293 | 298 |
|
... | ... |
@@ -115,7 +115,7 @@ additive models.} |
115 | 115 |
This option has no effect if you pass a three-element vector for |
116 | 116 |
\code{scale}. Using a three-element vector for \code{scale} is |
117 | 117 |
probably the most natural way of changing the scale and range of |
118 |
- fitness while setting the wildtype to value of your choice. |
|
118 |
+ fitness while setting the wildtype to a value of your choice. |
|
119 | 119 |
|
120 | 120 |
} |
121 | 121 |
|
... | ... |
@@ -55,9 +55,36 @@ additive models.} |
55 | 55 |
genotypes with that number of mutations have equal probability of |
56 | 56 |
being the reference). } |
57 | 57 |
|
58 |
-\item{scale}{Either NULL (nothing is done) or a two-element vector. If a |
|
59 |
- two-element vector, fitness is re-scaled between \code{scale[1]} (the |
|
60 |
- minimum) and \code{scale[2]} (the maximum).} |
|
58 |
+\item{scale}{Either NULL (nothing is done) or a two- or three-element |
|
59 |
+ vector. |
|
60 |
+ |
|
61 |
+ If a two-element vector, fitness is re-scaled between |
|
62 |
+ \code{scale[1]} (the minimum) and \code{scale[2]} (the maximum) and, |
|
63 |
+ later, if you have selected it, \code{wt_is_1} will be enforced. |
|
64 |
+ |
|
65 |
+ If you pass a three element vector, fitness is re-scaled so that the |
|
66 |
+ new maximum fitness is \code{scale[1]}, the new minimum is |
|
67 |
+ \code{scale[2]} and the new wildtype is \code{scale[3]}. If you pass a |
|
68 |
+ three element vector, none of the \code{wt_is_1} options apply in this |
|
69 |
+ case, to ensure you obtain the range you want. If you want the |
|
70 |
+ wildtype to be one, pass it as the third element of the vector. |
|
71 |
+ |
|
72 |
+ As a consequence of using a three element vector, the amount of |
|
73 |
+ stretching/compressing (i.e., scaling) of fitness values larger than |
|
74 |
+ that of the wildtype will likely be different from the scaling of |
|
75 |
+ fitness values smaller than that of the wildtype. In other words, |
|
76 |
+ this argument allows you to change the spread of the positive and |
|
77 |
+ negative fitness values (and you can make this difference extreme and |
|
78 |
+ make most fitness values less than wildtype be 0 by using a huge |
|
79 |
+ negative number --huge in absolute value-- for \code{scale[2]} if you |
|
80 |
+ then truncate at 0 --see \code{truncate_at_9}). |
|
81 |
+ |
|
82 |
+ Using a three element vector is probably the most natural way of |
|
83 |
+ changing the scale and range of fitness. |
|
84 |
+ |
|
85 |
+ See also \code{log} if you want the log-transformed values to respect |
|
86 |
+ the scale. |
|
87 |
+} |
|
61 | 88 |
|
62 | 89 |
\item{wt_is_1}{If "divide" the fitness of all genotypes is |
63 | 90 |
divided by the fitness of the wildtype (after possibly adding a value |
... | ... |
@@ -83,14 +110,28 @@ additive models.} |
83 | 110 |
option can easily lead to landscapes with no accessible genotypes |
84 | 111 |
(even if you also use \code{scale}). |
85 | 112 |
|
86 |
- If "no", the fitness of the wildtype is not modified. } |
|
113 |
+ If "no", the fitness of the wildtype is not modified. |
|
114 |
+ |
|
115 |
+ This option has no effect if you pass a three-element vector for |
|
116 |
+ \code{scale}. Using a three-element vector for \code{scale} is |
|
117 |
+ probably the most natural way of changing the scale and range of |
|
118 |
+ fitness while setting the wildtype to value of your choice. |
|
119 |
+ |
|
120 |
+} |
|
87 | 121 |
|
88 | 122 |
|
89 | 123 |
\item{log}{If TRUE, log-transform fitness. Actually, there are two |
90 | 124 |
cases: if \code{wt_is_1 = "no"} we simply log the fitness values; |
91 | 125 |
otherwise, we log the fitness values and add a 1, thus shifting all |
92 | 126 |
fitness values, because by decree the fitness (birth rate) of the |
93 |
- wildtype must be 1.} |
|
127 |
+ wildtype must be 1. |
|
128 |
+ |
|
129 |
+ If you pass a three-element vector for scale, you will want to pass |
|
130 |
+ \code{exp(desired_max)}, \code{exp(desired_min)}, and |
|
131 |
+ \code{exp(desired_wildtype)} to the \code{scale} argument. (We first |
|
132 |
+ scale values in the original scale and then log them). In this case, |
|
133 |
+ we ignore whatever you passed as \code{wt_is_1}, setting \code{wt_is_1 |
|
134 |
+ = "no"} to avoid modifying your requested value for the wildtype.} |
|
94 | 135 |
|
95 | 136 |
\item{min_accessible_genotypes}{If not NULL, the minimum number of |
96 | 137 |
accessible genotypes in the fitness landscape. A genotype is |
... | ... |
@@ -314,11 +314,6 @@ MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} |
314 | 314 |
## plotting and simulating an oncogenetic trajectory |
315 | 315 |
|
316 | 316 |
|
317 |
-r1 <- rfitness(4) |
|
318 |
-plot(r1) |
|
319 |
-oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) |
|
320 |
- |
|
321 |
- |
|
322 | 317 |
## NK model |
323 | 318 |
rnk <- rfitness(5, K = 3, model = "NK") |
324 | 319 |
plot(rnk) |
... | ... |
@@ -328,6 +323,8 @@ oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) |
328 | 323 |
radd <- rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) |
329 | 324 |
plot(radd) |
330 | 325 |
|
326 |
+ |
|
327 |
+\dontrun{ |
|
331 | 328 |
## Eggbox model |
332 | 329 |
regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) |
333 | 330 |
plot(regg) |
... | ... |
@@ -342,7 +339,8 @@ plot(ris) |
342 | 339 |
rfull = rfitness(g=4, model="Full", i = 0.002, I=2, |
343 | 340 |
K = 2, r = TRUE, |
344 | 341 |
p = 0.2, P = 0.3, o = 0.3, O = 1) |
345 |
-plot(rfull) |
|
342 |
+ plot(rfull) |
|
343 |
+ } |
|
346 | 344 |
} |
347 | 345 |
\keyword{ datagen } |
348 | 346 |
|
... | ... |
@@ -1,11 +1,12 @@ |
1 | 1 |
\name{rfitness} |
2 | 2 |
\alias{rfitness} |
3 |
- |
|
3 |
+\encoding{UTF-8} |
|
4 | 4 |
|
5 | 5 |
\title{Generate random fitness.} |
6 | 6 |
|
7 | 7 |
\description{ Generate random fitness landscapes under a House of Cards, |
8 |
- Rough Mount Fuji, additive model, and Kauffman's NK model. } |
|
8 |
+ Rough Mount Fuji (RMF), additive (multiplicative) model, Kauffman's NK |
|
9 |
+ model, Ising model, Eggbox model and Full model} |
|
9 | 10 |
|
10 | 11 |
|
11 | 12 |
\usage{ |
... | ... |
@@ -14,7 +15,9 @@ rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, |
14 | 15 |
wt_is_1 = c("subtract", "divide", "force", "no"), |
15 | 16 |
log = FALSE, min_accessible_genotypes = NULL, |
16 | 17 |
accessible_th = 0, truncate_at_0 = TRUE, |
17 |
- K = 1, r = TRUE, model = c("RMF", "NK")) |
|
18 |
+ K = 1, r = TRUE, i = 0, I = -1, circular = FALSE, e = 0, E = -1, |
|
19 |
+ H = -1, s = 0.1, S = -1, d = 0, o = 0, O = -1, p = 0, P = -1, |
|
20 |
+ model = c("RMF", "Additive", "NK", "Ising", "Eggbox", "Full")) |
|
18 | 21 |
} |
19 | 22 |
|
20 | 23 |
|
... | ... |
@@ -25,28 +28,32 @@ rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, |
25 | 28 |
\item{g}{Number of genes.} |
26 | 29 |
|
27 | 30 |
\item{c}{The decrease in fitness of a genotype per each unit increase |
28 |
- in Hamming distance from the reference genotype (see \code{reference}).} |
|
31 |
+ in Hamming distance from the reference genotype for the RMF model |
|
32 |
+ (see \code{reference}).} |
|
29 | 33 |
|
30 | 34 |
\item{sd}{The standard deviation of the random component (a normal |
31 |
- distribution of mean \code{mu} and standard deviation \code{sd}).} |
|
35 |
+ distribution of mean \code{mu} and standard deviation \code{sd}) for |
|
36 |
+ the RMF and additive models .} |
|
32 | 37 |
|
33 | 38 |
\item{mu}{The mean of the random component (a normal distribution of |
34 |
-mean \code{mu} and standard deviation \code{sd}).} |
|
35 |
- |
|
36 |
- |
|
37 |
-\item{reference}{The reference genotype: for the deterministic, additive |
|
38 |
- part, this is the genotype with maximal fitness, and all other |
|
39 |
- genotypes decrease their fitness by \code{c} for every unit of Hamming |
|
40 |
- distance from this reference. If "random" a genotype will be randomly |
|
41 |
- chosen as the reference. If "max" the genotype with all positions |
|
42 |
- mutated will be chosen as the reference. If you pass a vector (e.g., |
|
43 |
- \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype. |
|
44 |
- If "random2" a genotype will be randomly chosen as the reference. In |
|
45 |
- contrast to "random", however, not all genotypes have the same |
|
46 |
- probability of being chosen; here, what is equal is the probability |
|
47 |
- that the reference genotype has 1, 2, ..., g, mutations (and, once a |
|
48 |
- number mutations is chosen, all genotypes with that number of |
|
49 |
- mutations have equal probability of being the reference). } |
|
39 |
+mean \code{mu} and standard deviation \code{sd}) for the RMF and |
|
40 |
+additive models.} |
|
41 |
+ |
|
42 |
+ |
|
43 |
+\item{reference}{The reference genotype: in the RMF model, for the |
|
44 |
+ deterministic, additive part, this is the genotype with maximal |
|
45 |
+ fitness, and all other genotypes decrease their fitness by \code{c} |
|
46 |
+ for every unit of Hamming distance from this reference. If "random" a |
|
47 |
+ genotype will be randomly chosen as the reference. If "max" the |
|
48 |
+ genotype with all positions mutated will be chosen as the |
|
49 |
+ reference. If you pass a vector (e.g., \code{reference = c(1, 0, 1, |
|
50 |
+ 0)}) that will be the reference genotype. If "random2" a genotype |
|
51 |
+ will be randomly chosen as the reference. In contrast to "random", |
|
52 |
+ however, not all genotypes have the same probability of being chosen; |
|
53 |
+ here, what is equal is the probability that the reference genotype has |
|
54 |
+ 1, 2, ..., g, mutations (and, once a number mutations is chosen, all |
|
55 |
+ genotypes with that number of mutations have equal probability of |
|
56 |
+ being the reference). } |
|
50 | 57 |
|
51 | 58 |
\item{scale}{Either NULL (nothing is done) or a two-element vector. If a |
52 | 59 |
two-element vector, fitness is re-scaled between \code{scale[1]} (the |
... | ... |
@@ -127,9 +134,45 @@ mean \code{mu} and standard deviation \code{sd}).} |
127 | 134 |
|
128 | 135 |
\item{r}{For the NK model, whether interacting loci are chosen at random |
129 | 136 |
(\code{r = TRUE}) or are neighbors (\code{r = FALSE}).} |
137 |
+\item{i}{For de Ising model, i is the mean cost for incompatibility with which |
|
138 |
+ the genotype's fitness is penalized when in two adjacent genes, only one of |
|
139 |
+ them is mutated.} |
|
140 |
+ |
|
141 |
+\item{I}{For the Ising model, I is the standard deviation for the cost |
|
142 |
+ incompatibility (i).} |
|
143 |
+ |
|
144 |
+\item{circular}{For the Ising model, whether there is a circular arrangement, |
|
145 |
+ where the last and the first genes are adjacent to each other.} |
|
146 |
+ |
|
147 |
+\item{e}{For the Eggbox model, mean effect in fitness for the neighbor |
|
148 |
+ locus +/- e.} |
|
149 |
+ |
|
150 |
+\item{E}{For the Eggbox model, noise added to the mean effect in fitness (e).} |
|
151 |
+ |
|
152 |
+\item{H}{For Full models, standard deviation for the House of Cards model.} |
|
153 |
+ |
|
154 |
+\item{s}{For Full models, mean of the fitness for the Multiplicative model.} |
|
155 |
+ |
|
156 |
+\item{S}{For Full models, standard deviation for the Multiplicative model.} |
|
157 |
+ |
|
158 |
+\item{d}{For Full models, a disminishing (negative) or increasing |
|
159 |
+ (positive) return as the peak is approached for multiplicative model.} |
|
160 |
+ |
|
161 |
+\item{o}{For Full models, mean value for the optimum model.} |
|
162 |
+ |
|
163 |
+\item{O}{For Full models, standard deviation for the optimum model.} |
|
130 | 164 |
|
131 |
-\item{model}{One of "RMF" (default), for Rough Mount Fuji, or "NK", for |
|
132 |
- Kauffman's NK model.} |
|
165 |
+\item{p}{For Full models, the mean production value for each non 0 |
|
166 |
+ allele in the Optimum model component.} |
|
167 |
+ |
|
168 |
+\item{P}{For Full models, the associated stdev (of non 0 alleles) in the |
|
169 |
+Optimum model component.} |
|
170 |
+ |
|
171 |
+ |
|
172 |
+ |
|
173 |
+\item{model}{One of "RMF" (default) for Rough Mount Fuji, "Additive" for |
|
174 |
+ Additive model, "NK", for Kauffman's NK model, "Ising" for Ising model, |
|
175 |
+ "Eggbox" for Eggbox model or "Full" for Full models.} |
|
133 | 176 |
} |
134 | 177 |
|
135 | 178 |
|
... | ... |
@@ -146,14 +189,56 @@ mean \code{mu} and standard deviation \code{sd}).} |
146 | 189 |
random variable (in this case, a normal deviate of mean \code{mu} |
147 | 190 |
and standard deviation \code{sd}). |
148 | 191 |
|
149 |
- Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd |
|
150 |
- = 0} fitness is given by the distance from the reference and if the |
|
151 |
- reference is the genotype with all positions mutated, then we have a |
|
152 |
- fully additive model (fitness increases linearly with the number of |
|
153 |
- positions mutated). |
|
192 |
+ When using \code{model = "RMF"}, setting \eqn{c = 0} we obtain a House |
|
193 |
+ of Cards model. Setting \eqn{sd = 0} fitness is given by the |
|
194 |
+ distance from the reference and if the reference is the genotype |
|
195 |
+ with all positions mutated, then we have a fully additive model |
|
196 |
+ (fitness increases linearly with the number of positions mutated), |
|
197 |
+ where all mutations have the same effect. |
|
198 |
+ |
|
199 |
+ More flexible additive models can be used using \code{model = |
|
200 |
+ "Additive"}. This model is like the Rough Mount Fuji model in Szendro |
|
201 |
+ et al., 2013 or Franke et al., 2011, but in this case, each locus can |
|
202 |
+ have different contributions to the fitness evaluation. This model is |
|
203 |
+ also referred to as the "multiplicative" model in the literature as it |
|
204 |
+ is additive in the log-scale (e.g., see Brouillet et al., 2015 or |
|
205 |
+ Ferretti et al., 2016). The contribution of each mutated allele to the |
|
206 |
+ log-fitness is a random deviate from a Normal distribution with |
|
207 |
+ specified mean \code{mu} and standard deviation \code{sd}, and the |
|
208 |
+ log-fitness of a genotype is the sum of the contributions of each |
|
209 |
+ mutated allele. There is no "reference" genotype in the Additive |
|
210 |
+ model. There is no epistasis in the additve model because the effect |
|
211 |
+ of a mutation in a locus does not depend on the genetic background, or |
|
212 |
+ whether the rest of the loci are mutated or not. |
|
213 |
+ |
|
214 |
+ |
|
215 |
+ When using \code{model = "NK"} fitness is drawn from a uniform (0, 1) |
|
216 |
+ distribution. |
|
217 |
+ |
|
218 |
+ |
|
219 |
+ When using \code{model = "Ising"} for each pair of interacting loci, |
|
220 |
+ there is an associated cost if both alleles are not identical |
|
221 |
+ (and therefore 'compatible'). |
|
222 |
+ |
|
223 |
+ |
|
224 |
+ When using \code{model = "Eggbox"} each locus is either high or low fitness, |
|
225 |
+ with a systematic change between each neighbor. |
|
226 |
+ |
|
227 |
+ |
|
228 |
+ When using \code{model = "Full"}, the fitness is computed with different |
|
229 |
+ parts of the previous models depending on the choosen parameters described |
|
230 |
+ above. |
|
231 |
+ |
|
232 |
+ |
|
233 |
+ For \code{model = "NK" | "Ising" | "Eggbox" | "Full"} the fitness |
|
234 |
+ landscape is generated by directly calling the \code{fl_generate} |
|
235 |
+ function of MAGELLAN |
|
236 |
+ (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). See details in |
|
237 |
+ Ferretti et al. 2016, or Brouillet et al., 2015. |
|
238 |
+ |
|
154 | 239 |
|
155 | 240 |
For OncoSimulR, we often want the wildtype to have a mean of |
156 |
- 1. Reasonable settings are \code{mu = 1} and \code{wt_is_1 = |
|
241 |
+ 1. Reasonable settings when using RMF are \code{mu = 1} and \code{wt_is_1 = |
|
157 | 242 |
'subtract'} so that we simulate from a distribution centered in 1, and |
158 | 243 |
we make sure afterwards (via a simple shift) that the wildtype is |
159 | 244 |
actuall 1. The \code{sd} controls the standard deviation, with the |
... | ... |
@@ -162,14 +247,6 @@ mean \code{mu} and standard deviation \code{sd}).} |
162 | 247 |
of the data can be large, specially if \code{g} (the number of genes) |
163 | 248 |
is large. |
164 | 249 |
|
165 |
- |
|
166 |
- When using \code{model = "NK"}, the model used is Kauffman's NK model |
|
167 |
- (see details in Ferretti et al., or Brouillet et al., below), as |
|
168 |
- implemented in MAGELLAN |
|
169 |
- (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). This fitness |
|
170 |
- landscape is generated by directly calling the \code{fl_generate} |
|
171 |
- function of MAGELLAN. Fitness is drawn from a uniform (0, 1) |
|
172 |
- distribution. |
|
173 | 250 |
|
174 | 251 |
} |
175 | 252 |
|
... | ... |
@@ -214,10 +291,12 @@ MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} |
214 | 291 |
} |
215 | 292 |
|
216 | 293 |
\author{ Ramon Diaz-Uriarte for the RMF and general wrapping |
217 |
-code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti |
|
218 |
-for the MAGELLAN code. |
|
219 |
- |
|
220 |
-} |
|
294 |
+ code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and |
|
295 |
+ L. Ferreti for the MAGELLAN code. Further contributions to the |
|
296 |
+ additive model and to wrapping MAGELLAN code and documentation from |
|
297 |
+ Guillermo Gorines Cordero, Ivan Lorca Alonso, Francisco Muñoz Lopez, |
|
298 |
+ David Roncero Moroño, Alvaro Quevedo, Pablo Perez, Cristina Devesa, |
|
299 |
+ Alejandro Herrador.} |
|
221 | 300 |
|
222 | 301 |
\seealso{ |
223 | 302 |
|
... | ... |
@@ -234,6 +313,7 @@ for the MAGELLAN code. |
234 | 313 |
## Random fitness for four genes-genotypes, |
235 | 314 |
## plotting and simulating an oncogenetic trajectory |
236 | 315 |
|
316 |
+ |
|
237 | 317 |
r1 <- rfitness(4) |
238 | 318 |
plot(r1) |
239 | 319 |
oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) |
... | ... |
@@ -243,7 +323,26 @@ oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) |
243 | 323 |
rnk <- rfitness(5, K = 3, model = "NK") |
244 | 324 |
plot(rnk) |
245 | 325 |
oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) |
246 |
-} |
|
247 | 326 |
|
327 |
+## Additive model |
|
328 |
+radd <- rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) |
|
329 |
+plot(radd) |
|
330 |
+ |
|
331 |
+## Eggbox model |
|
332 |
+regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) |
|
333 |
+plot(regg) |
|
334 |
+ |
|
335 |
+ |
|
336 |
+## Ising model |
|
337 |
+ris = rfitness(g=4,model="Ising", i = 0.002, I=2) |
|
338 |
+plot(ris) |
|
339 |
+ |
|
340 |
+ |
|
341 |
+## Full model |
|
342 |
+rfull = rfitness(g=4, model="Full", i = 0.002, I=2, |
|
343 |
+ K = 2, r = TRUE, |
|
344 |
+ p = 0.2, P = 0.3, o = 0.3, O = 1) |
|
345 |
+plot(rfull) |
|
346 |
+} |
|
248 | 347 |
\keyword{ datagen } |
249 | 348 |
|
... | ... |
@@ -76,10 +76,14 @@ mean \code{mu} and standard deviation \code{sd}).} |
76 | 76 |
option can easily lead to landscapes with no accessible genotypes |
77 | 77 |
(even if you also use \code{scale}). |
78 | 78 |
|
79 |
- If "none", the fitness of the wildtype is not touched. } |
|
79 |
+ If "no", the fitness of the wildtype is not modified. } |
|
80 | 80 |
|
81 | 81 |
|
82 |
-\item{log}{If TRUE, log-transform fitness.} |
|
82 |
+\item{log}{If TRUE, log-transform fitness. Actually, there are two |
|
83 |
+ cases: if \code{wt_is_1 = "no"} we simply log the fitness values; |
|
84 |
+ otherwise, we log the fitness values and add a 1, thus shifting all |
|
85 |
+ fitness values, because by decree the fitness (birth rate) of the |
|
86 |
+ wildtype must be 1.} |
|
83 | 87 |
|
84 | 88 |
\item{min_accessible_genotypes}{If not NULL, the minimum number of |
85 | 89 |
accessible genotypes in the fitness landscape. A genotype is |
... | ... |
@@ -110,10 +114,12 @@ mean \code{mu} and standard deviation \code{sd}).} |
110 | 114 |
negative value for \code{accessible_th}. } |
111 | 115 |
|
112 | 116 |
\item{truncate_at_0}{If TRUE (the default) any fitness <= 0 is |
113 |
- substituted by a small positive constant (1e-9). Why? Because |
|
114 |
- MAGELLAN and some plotting routines can have trouble (specially if you |
|
115 |
- log) with values <=0. Or we might have trouble if we want to log the |
|
116 |
- fitness.} |
|
117 |
+ substituted by a small positive constant (a random uniform number |
|
118 |
+ between 1e-10 and 1e-9). Why? Because MAGELLAN and some plotting |
|
119 |
+ routines can have trouble (specially if you log) with values <=0. Or |
|
120 |
+ we might have trouble if we want to log the fitness. This is done |
|
121 |
+ after possibly taking logs. Noise is added to prevent creating several |
|
122 |
+ identical minimal fitness values.} |
|
117 | 123 |
|
118 | 124 |
\item{K}{K for NK model; K is the number of loci with which each locus |
119 | 125 |
interacts, and the larger the K the larger the ruggedness of the |
... | ... |
@@ -5,7 +5,7 @@ |
5 | 5 |
\title{Generate random fitness.} |
6 | 6 |
|
7 | 7 |
\description{ Generate random fitness landscapes under a House of Cards, |
8 |
- Rough Mount Fuji, or additive model. } |
|
8 |
+ Rough Mount Fuji, additive model, and Kauffman's NK model. } |
|
9 | 9 |
|
10 | 10 |
|
11 | 11 |
\usage{ |
... | ... |
@@ -13,7 +13,8 @@ |
13 | 13 |
rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, |
14 | 14 |
wt_is_1 = c("subtract", "divide", "force", "no"), |
15 | 15 |
log = FALSE, min_accessible_genotypes = NULL, |
16 |
- accessible_th = 0, truncate_at_0 = TRUE) |
|
16 |
+ accessible_th = 0, truncate_at_0 = TRUE, |
|
17 |
+ K = 1, r = TRUE, model = c("RMF", "NK")) |
|
17 | 18 |
} |
18 | 19 |
|
19 | 20 |
|
... | ... |
@@ -51,7 +52,7 @@ mean \code{mu} and standard deviation \code{sd}).} |
51 | 52 |
two-element vector, fitness is re-scaled between \code{scale[1]} (the |
52 | 53 |
minimum) and \code{scale[2]} (the maximum).} |
53 | 54 |
|
54 |
-\item{wt_is_1}{If "divide" (the default) the fitness of all genotypes is |
|
55 |
+\item{wt_is_1}{If "divide" the fitness of all genotypes is |
|
55 | 56 |
divided by the fitness of the wildtype (after possibly adding a value |
56 | 57 |
to ensure no negative fitness) so that the wildtype (the genotype with |
57 | 58 |
no mutations) has fitness 1. This is a case of scaling, and it is |
... | ... |
@@ -60,7 +61,7 @@ mean \code{mu} and standard deviation \code{sd}).} |
60 | 61 |
likely that the final fitness will not respect the limits in |
61 | 62 |
\code{scale}. |
62 | 63 |
|
63 |
- If "subtract" we shift all the fitness values (subtracting fitness of |
|
64 |
+ If "subtract" (the default) we shift all the fitness values (subtracting fitness of |
|
64 | 65 |
the wildtype and adding 1) so that the wildtype ends up with a fitness |
65 | 66 |
of 1. This is also applied after \code{scale}, so if you specify both |
66 | 67 |
"wt_is_1 = 'subtract'" and use an argument for \code{scale} it is most |
... | ... |
@@ -114,13 +115,23 @@ mean \code{mu} and standard deviation \code{sd}).} |
114 | 115 |
log) with values <=0. Or we might have trouble if we want to log the |
115 | 116 |
fitness.} |
116 | 117 |
|
118 |
+\item{K}{K for NK model; K is the number of loci with which each locus |
|
119 |
+ interacts, and the larger the K the larger the ruggedness of the |
|
120 |
+ landscape.} |
|
121 |
+ |
|
122 |
+\item{r}{For the NK model, whether interacting loci are chosen at random |
|
123 |
+ (\code{r = TRUE}) or are neighbors (\code{r = FALSE}).} |
|
124 |
+ |
|
125 |
+\item{model}{One of "RMF" (default), for Rough Mount Fuji, or "NK", for |
|
126 |
+ Kauffman's NK model.} |
|
117 | 127 |
} |
118 | 128 |
|
119 | 129 |
|
120 | 130 |
\details{ |
121 | 131 |
|
122 |
- The model used here follows the Rough Mount Fuji model in Szendro et |
|
123 |
- al., 2013 or Franke et al., 2011. Fitness is given as |
|
132 |
+ When using \code{model = "RMF"}, the model used here follows |
|
133 |
+ the Rough Mount Fuji model in Szendro et al., 2013 or Franke et al., |
|
134 |
+ 2011. Fitness is given as |
|
124 | 135 |
|
125 | 136 |
\deqn{f(i) = -c d(i, reference) + x_i} |
126 | 137 |
|
... | ... |
@@ -144,6 +155,15 @@ mean \code{mu} and standard deviation \code{sd}).} |
144 | 155 |
is different from zero. In this case, with \code{c} large, the range |
145 | 156 |
of the data can be large, specially if \code{g} (the number of genes) |
146 | 157 |
is large. |
158 |
+ |
|
159 |
+ |
|
160 |
+ When using \code{model = "NK"}, the model used is Kauffman's NK model |
|
161 |
+ (see details in Ferretti et al., or Brouillet et al., below), as |
|
162 |
+ implemented in MAGELLAN |
|
163 |
+ (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). This fitness |
|
164 |
+ landscape is generated by directly calling the \code{fl_generate} |
|
165 |
+ function of MAGELLAN. Fitness is drawn from a uniform (0, 1) |
|
166 |
+ distribution. |
|
147 | 167 |
|
148 | 168 |
} |
149 | 169 |
|
... | ... |
@@ -159,7 +179,12 @@ mean \code{mu} and standard deviation \code{sd}).} |
159 | 179 |
\code{accessible_th} that show the number of accessible |
160 | 180 |
genotypes under the specified threshold. |
161 | 181 |
} |
162 |
- |
|
182 |
+ |
|
183 |
+ |
|
184 |
+\note{MAGELLAN uses its own random number generating functions; using |
|
185 |
+ \code{set.seed} does not allow to obtain the same fitness landscape |
|
186 |
+ repeatedly.} |
|
187 |
+ |
|
163 | 188 |
\references{ |
164 | 189 |
|
165 | 190 |
Szendro I.~G. et al. (2013). Quantitative analyses of empirical |
... | ... |
@@ -169,9 +194,23 @@ fitness landscapes. \emph{Journal of Statistical Mehcanics: Theory and |
169 | 194 |
Franke, J. et al. (2011). Evolutionary accessibility of mutational |
170 | 195 |
pathways. \emph{PLoS Computational Biology\/}, \bold{7}(8), 1--9. |
171 | 196 |
|
197 |
+Brouillet, S. et al. (2015). MAGELLAN: a tool to explore small fitness |
|
198 |
+landscapes. \emph{bioRxiv}, |
|
199 |
+\bold{31583}. \url{http://doi.org/10.1101/031583} |
|
200 |
+ |
|
201 |
+Ferretti, L., Schmiegelt, B., Weinreich, D., Yamauchi, A., Kobayashi, |
|
202 |
+Y., Tajima, F., & Achaz, G. (2016). Measuring epistasis in fitness |
|
203 |
+landscapes: The correlation of fitness effects of mutations. \emph{Journal of |
|
204 |
+Theoretical Biology\/}, \bold{396}, 132--143. \url{https://doi.org/10.1016/j.jtbi.2016.01.037} |
|
205 |
+ |
|
206 |
+MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} |
|
207 |
+ |
|
172 | 208 |
} |
173 | 209 |
|
174 |
-\author{ Ramon Diaz-Uriarte |
|
210 |
+\author{ Ramon Diaz-Uriarte for the RMF and general wrapping |
|
211 |
+code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti |
|
212 |
+for the MAGELLAN code. |
|
213 |
+ |
|
175 | 214 |
} |
176 | 215 |
|
177 | 216 |
\seealso{ |
... | ... |
@@ -192,6 +231,11 @@ r1 <- rfitness(4) |
192 | 231 |
plot(r1) |
193 | 232 |
oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) |
194 | 233 |
|
234 |
+ |
|
235 |
+## NK model |
|
236 |
+rnk <- rfitness(5, K = 3, model = "NK") |
|
237 |
+plot(rnk) |
|
238 |
+oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) |
|
195 | 239 |
} |
196 | 240 |
|
197 | 241 |
\keyword{ datagen } |
... | ... |
@@ -73,7 +73,7 @@ mean \code{mu} and standard deviation \code{sd}).} |
73 | 73 |
it is up to you to make sure that the range of the scale argument |
74 | 74 |
includes 1 or you might not get what you want). Note that using this |
75 | 75 |
option can easily lead to landscapes with no accessible genotypes |
76 |
- (unless you also use \code{scale}). |
|
76 |
+ (even if you also use \code{scale}). |
|
77 | 77 |
|
78 | 78 |
If "none", the fitness of the wildtype is not touched. } |
79 | 79 |
|
- Several improvements to rfitness.
- simOGraph using transitive reduction properly.
- Miscell documentation improvements.
- Updated citation to Bioinformatics paper.
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@126818 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -10,9 +10,10 @@ |
10 | 10 |
|
11 | 11 |
\usage{ |
12 | 12 |
|
13 |
-rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
|
14 |
- wt_is_1 = TRUE, log = FALSE, min_accessible_genotypes = 0, |
|
15 |
- accessible_th = 0) |
|
13 |
+rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, |
|
14 |
+ wt_is_1 = c("subtract", "divide", "force", "no"), |
|
15 |
+ log = FALSE, min_accessible_genotypes = NULL, |
|
16 |
+ accessible_th = 0, truncate_at_0 = TRUE) |
|
16 | 17 |
} |
17 | 18 |
|
18 | 19 |
|
... | ... |
@@ -26,7 +27,11 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
26 | 27 |
in Hamming distance from the reference genotype (see \code{reference}).} |
27 | 28 |
|
28 | 29 |
\item{sd}{The standard deviation of the random component (a normal |
29 |
- distribution of mean 0 and standard deviation \code{sd}).} |
|
30 |
+ distribution of mean \code{mu} and standard deviation \code{sd}).} |
|
31 |
+ |
|
32 |
+\item{mu}{The mean of the random component (a normal distribution of |
|
33 |
+mean \code{mu} and standard deviation \code{sd}).} |
|
34 |
+ |
|
30 | 35 |
|
31 | 36 |
\item{reference}{The reference genotype: for the deterministic, additive |
32 | 37 |
part, this is the genotype with maximal fitness, and all other |
... | ... |
@@ -46,15 +51,36 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
46 | 51 |
two-element vector, fitness is re-scaled between \code{scale[1]} (the |
47 | 52 |
minimum) and \code{scale[2]} (the maximum).} |
48 | 53 |
|
49 |
-\item{wt_is_1}{If TRUE, fitness will be scaled so that the wildtype (the |
|
50 |
- genotype with no mutations) has fitness of 1. This is applied after |
|
51 |
- \code{scale}, so if you specify both it is most likely that the final |
|
52 |
- fitness will not respect the limits in \code{scale}.} |
|
54 |
+\item{wt_is_1}{If "divide" (the default) the fitness of all genotypes is |
|
55 |
+ divided by the fitness of the wildtype (after possibly adding a value |
|
56 |
+ to ensure no negative fitness) so that the wildtype (the genotype with |
|
57 |
+ no mutations) has fitness 1. This is a case of scaling, and it is |
|
58 |
+ applied after \code{scale}, so if you specify both |
|
59 |
+ "wt_is_1 = 'divide'" and use an argument for \code{scale} it is most |
|
60 |
+ likely that the final fitness will not respect the limits in |
|
61 |
+ \code{scale}. |
|
62 |
+ |
|
63 |
+ If "subtract" we shift all the fitness values (subtracting fitness of |
|
64 |
+ the wildtype and adding 1) so that the wildtype ends up with a fitness |
|
65 |
+ of 1. This is also applied after \code{scale}, so if you specify both |
|
66 |
+ "wt_is_1 = 'subtract'" and use an argument for \code{scale} it is most |
|
67 |
+ likely that the final fitness will not respect the limits in |
|
68 |
+ \code{scale} (though the distorsion might be simpler to see as just a |
|
69 |
+ shift up or down). |
|
70 |
+ |
|
71 |
+ If "force" we simply set the fitness of the wildtype to 1, without any |
|
72 |
+ divisions. This means that the \code{scale} argument would work (but |
|
73 |
+ it is up to you to make sure that the range of the scale argument |
|
74 |
+ includes 1 or you might not get what you want). Note that using this |
|
75 |
+ option can easily lead to landscapes with no accessible genotypes |
|
76 |
+ (unless you also use \code{scale}). |
|
77 |
+ |
|
78 |
+ If "none", the fitness of the wildtype is not touched. } |
|
53 | 79 |
|
54 | 80 |
|
55 | 81 |
\item{log}{If TRUE, log-transform fitness.} |
56 | 82 |
|
57 |
-\item{min_accessible_genotypes}{If larger than 0, the minimum number of |
|
83 |
+\item{min_accessible_genotypes}{If not NULL, the minimum number of |
|
58 | 84 |
accessible genotypes in the fitness landscape. A genotype is |
59 | 85 |
considered accessible if you can reach if from the wildtype by going |
60 | 86 |
through at least one path where all changes in fitness are larger or |
... | ... |
@@ -69,6 +95,10 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
69 | 95 |
If the condition is not satisfied, we continue generating random |
70 | 96 |
fitness landscapes with the specified parameters until the condition |
71 | 97 |
is satisfied. |
98 |
+ |
|
99 |
+ (Why check against NULL and not against zero? Because this allows you |
|
100 |
+ to count accessible genotypes even if you do not want to ensure a |
|
101 |
+ minimum number of accessible genotypes.) |
|
72 | 102 |
} |
73 | 103 |
|
74 | 104 |
\item{accessible_th}{The threshold for the minimal change in fitness at |
... | ... |
@@ -78,6 +108,12 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
78 | 108 |
allow small decreases in fitness in successive steps, use a small |
79 | 109 |
negative value for \code{accessible_th}. } |
80 | 110 |
|
111 |
+\item{truncate_at_0}{If TRUE (the default) any fitness <= 0 is |
|
112 |
+ substituted by a small positive constant (1e-9). Why? Because |
|
113 |
+ MAGELLAN and some plotting routines can have trouble (specially if you |
|
114 |
+ log) with values <=0. Or we might have trouble if we want to log the |
|
115 |
+ fitness.} |
|
116 |
+ |
|
81 | 117 |
} |
82 | 118 |
|
83 | 119 |
|
... | ... |
@@ -90,14 +126,25 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
90 | 126 |
|
91 | 127 |
where \eqn{d(i, j)} is the Hamming distance between genotypes \eqn{i} |
92 | 128 |
and \eqn{j} (the number of positions that differ) and \eqn{x_i} is a |
93 |
- random variable (in this case, a normal deviate of mean 0 and standard |
|
94 |
- deviation \code{sd}). |
|
129 |
+ random variable (in this case, a normal deviate of mean \code{mu} |
|
130 |
+ and standard deviation \code{sd}). |
|
95 | 131 |
|
96 | 132 |
Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd |
97 | 133 |
= 0} fitness is given by the distance from the reference and if the |
98 | 134 |
reference is the genotype with all positions mutated, then we have a |
99 | 135 |
fully additive model (fitness increases linearly with the number of |
100 | 136 |
positions mutated). |
137 |
+ |
|
138 |
+ For OncoSimulR, we often want the wildtype to have a mean of |
|
139 |
+ 1. Reasonable settings are \code{mu = 1} and \code{wt_is_1 = |
|
140 |
+ 'subtract'} so that we simulate from a distribution centered in 1, and |
|
141 |
+ we make sure afterwards (via a simple shift) that the wildtype is |
|
142 |
+ actuall 1. The \code{sd} controls the standard deviation, with the |
|
143 |
+ usual working and meaning as in a normal distribution, unless \code{c} |
|
144 |
+ is different from zero. In this case, with \code{c} large, the range |
|
145 |
+ of the data can be large, specially if \code{g} (the number of genes) |
|
146 |
+ is large. |
|
147 |
+ |
|
101 | 148 |
} |
102 | 149 |
|
103 | 150 |
\value{ |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@121246 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -34,7 +34,13 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
34 | 34 |
distance from this reference. If "random" a genotype will be randomly |
35 | 35 |
chosen as the reference. If "max" the genotype with all positions |
36 | 36 |
mutated will be chosen as the reference. If you pass a vector (e.g., |
37 |
- \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype.} |
|
37 |
+ \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype. |
|
38 |
+ If "random2" a genotype will be randomly chosen as the reference. In |
|
39 |
+ contrast to "random", however, not all genotypes have the same |
|
40 |
+ probability of being chosen; here, what is equal is the probability |
|
41 |
+ that the reference genotype has 1, 2, ..., g, mutations (and, once a |
|
42 |
+ number mutations is chosen, all genotypes with that number of |
|
43 |
+ mutations have equal probability of being the reference). } |
|
38 | 44 |
|
39 | 45 |
\item{scale}{Either NULL (nothing is done) or a two-element vector. If a |
40 | 46 |
two-element vector, fitness is re-scaled between \code{scale[1]} (the |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@120020 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -23,7 +23,7 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
23 | 23 |
\item{g}{Number of genes.} |
24 | 24 |
|
25 | 25 |
\item{c}{The decrease in fitness of a genotype per each unit increase |
26 |
- in Hamming distance from the reference genotype (\code{reference}).} |
|
26 |
+ in Hamming distance from the reference genotype (see \code{reference}).} |
|
27 | 27 |
|
28 | 28 |
\item{sd}{The standard deviation of the random component (a normal |
29 | 29 |
distribution of mean 0 and standard deviation \code{sd}).} |
... | ... |
@@ -34,7 +34,7 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
34 | 34 |
distance from this reference. If "random" a genotype will be randomly |
35 | 35 |
chosen as the reference. If "max" the genotype with all positions |
36 | 36 |
mutated will be chosen as the reference. If you pass a vector (e.g., |
37 |
- \code{fittest = c(1, 0, 1, 0)}) that will be the reference genotype.} |
|
37 |
+ \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype.} |
|
38 | 38 |
|
39 | 39 |
\item{scale}{Either NULL (nothing is done) or a two-element vector. If a |
40 | 40 |
two-element vector, fitness is re-scaled between \code{scale[1]} (the |
... | ... |
@@ -101,7 +101,12 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
101 | 101 |
column denotes gene mutated/not-mutated. (For ease of use in other |
102 | 102 |
functions, this matrix has class "genotype_fitness_matrix".) |
103 | 103 |
|
104 |
+ If you have specified \code{min_accessible_genotypes > 0}, the return |
|
105 |
+ object has added attributes \code{accessible_genotypes} and |
|
106 |
+ \code{accessible_th} that show the number of accessible |
|
107 |
+ genotypes under the specified threshold. |
|
104 | 108 |
} |
109 |
+ |
|
105 | 110 |
\references{ |
106 | 111 |
|
107 | 112 |
Szendro I.~G. et al. (2013). Quantitative analyses of empirical |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@119231 bc3139a8-67e5-0310-9ffc-ced21a209358
... | ... |
@@ -4,16 +4,15 @@ |
4 | 4 |
|
5 | 5 |
\title{Generate random fitness.} |
6 | 6 |
|
7 |
-\description{ |
|
8 |
- Generate random fitness under a House of Cards, Rough Mount Fuji, or |
|
9 |
- additive model. |
|
10 |
-} |
|
7 |
+\description{ Generate random fitness landscapes under a House of Cards, |
|
8 |
+ Rough Mount Fuji, or additive model. } |
|
11 | 9 |
|
12 | 10 |
|
13 | 11 |
\usage{ |
14 | 12 |
|
15 | 13 |
rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
16 |
- wt_is_1 = TRUE, log = FALSE) |
|
14 |
+ wt_is_1 = TRUE, log = FALSE, min_accessible_genotypes = 0, |
|
15 |
+ accessible_th = 0) |
|
17 | 16 |
} |
18 | 17 |
|
19 | 18 |
|
... | ... |
@@ -48,7 +47,34 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
48 | 47 |
|
49 | 48 |
|
50 | 49 |
\item{log}{If TRUE, log-transform fitness.} |
50 |
+ |
|
51 |
+\item{min_accessible_genotypes}{If larger than 0, the minimum number of |
|
52 |
+ accessible genotypes in the fitness landscape. A genotype is |
|
53 |
+ considered accessible if you can reach if from the wildtype by going |
|
54 |
+ through at least one path where all changes in fitness are larger or |
|
55 |
+ equal to \code{accessible_th}. The changes in fitness are considered |
|
56 |
+ at each mutational step, i.e., at each addition of one mutation we |
|
57 |
+ compute the difference between the genotype with \code{k + 1} |
|
58 |
+ mutations minus the ancestor genotype with \code{k} mutations. Thus, a |
|
59 |
+ genotype is considered accessible if there is at least one path where |
|
60 |
+ fitness increases at each mutational step by at least |
|
61 |
+ \code{accessible_th}. |
|
62 |
+ |
|
63 |
+ If the condition is not satisfied, we continue generating random |
|
64 |
+ fitness landscapes with the specified parameters until the condition |
|
65 |
+ is satisfied. |
|
51 | 66 |
} |
67 |
+ |
|
68 |
+\item{accessible_th}{The threshold for the minimal change in fitness at |
|
69 |
+ each mutation step (i.e., between successive genotypes) that allows a |
|
70 |
+ genotype to be regarded as accessible. This only applies if |
|
71 |
+ \code{min_accessible_genotypes} is larger than 0. So if you want to |
|
72 |
+ allow small decreases in fitness in successive steps, use a small |
|
73 |
+ negative value for \code{accessible_th}. } |
|
74 |
+ |
|
75 |
+} |
|
76 |
+ |
|
77 |
+ |
|
52 | 78 |
\details{ |
53 | 79 |
|
54 | 80 |
The model used here follows the Rough Mount Fuji model in Szendro et |
git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@118909 bc3139a8-67e5-0310-9ffc-ced21a209358
1 | 1 |
new file mode 100644 |
... | ... |
@@ -0,0 +1,114 @@ |
1 |
+\name{rfitness} |
|
2 |
+\alias{rfitness} |
|
3 |
+ |
|
4 |
+ |
|
5 |
+\title{Generate random fitness.} |
|
6 |
+ |
|
7 |
+\description{ |
|
8 |
+ Generate random fitness under a House of Cards, Rough Mount Fuji, or |
|
9 |
+ additive model. |
|
10 |
+} |
|
11 |
+ |
|
12 |
+ |
|
13 |
+\usage{ |
|
14 |
+ |
|
15 |
+rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, |
|
16 |
+ wt_is_1 = TRUE, log = FALSE) |
|
17 |
+} |
|
18 |
+ |
|
19 |
+ |
|
20 |
+ |
|
21 |
+ |
|
22 |
+\arguments{ |
|
23 |
+ |
|
24 |
+ \item{g}{Number of genes.} |
|
25 |
+ |
|
26 |
+ \item{c}{The decrease in fitness of a genotype per each unit increase |
|
27 |
+ in Hamming distance from the reference genotype (\code{reference}).} |
|
28 |
+ |
|
29 |
+ \item{sd}{The standard deviation of the random component (a normal |
|
30 |
+ distribution of mean 0 and standard deviation \code{sd}).} |
|
31 |
+ |
|
32 |
+\item{reference}{The reference genotype: for the deterministic, additive |
|
33 |
+ part, this is the genotype with maximal fitness, and all other |
|
34 |
+ genotypes decrease their fitness by \code{c} for every unit of Hamming |
|
35 |
+ distance from this reference. If "random" a genotype will be randomly |
|
36 |
+ chosen as the reference. If "max" the genotype with all positions |
|
37 |
+ mutated will be chosen as the reference. If you pass a vector (e.g., |
|
38 |
+ \code{fittest = c(1, 0, 1, 0)}) that will be the reference genotype.} |
|
39 |
+ |
|
40 |
+\item{scale}{Either NULL (nothing is done) or a two-element vector. If a |
|
41 |
+ two-element vector, fitness is re-scaled between \code{scale[1]} (the |
|
42 |
+ minimum) and \code{scale[2]} (the maximum).} |
|
43 |
+ |
|
44 |
+\item{wt_is_1}{If TRUE, fitness will be scaled so that the wildtype (the |
|
45 |
+ genotype with no mutations) has fitness of 1. This is applied after |
|
46 |
+ \code{scale}, so if you specify both it is most likely that the final |
|
47 |
+ fitness will not respect the limits in \code{scale}.} |
|
48 |
+ |
|
49 |
+ |
|
50 |
+\item{log}{If TRUE, log-transform fitness.} |
|
51 |
+} |
|
52 |
+\details{ |
|
53 |
+ |
|
54 |
+ The model used here follows the Rough Mount Fuji model in Szendro et |
|
55 |
+ al., 2013 or Franke et al., 2011. Fitness is given as |
|
56 |
+ |
|
57 |
+ \deqn{f(i) = -c d(i, reference) + x_i} |
|
58 |
+ |
|
59 |
+ where \eqn{d(i, j)} is the Hamming distance between genotypes \eqn{i} |
|
60 |
+ and \eqn{j} (the number of positions that differ) and \eqn{x_i} is a |
|
61 |
+ random variable (in this case, a normal deviate of mean 0 and standard |
|
62 |
+ deviation \code{sd}). |
|
63 |
+ |
|
64 |
+ Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd |
|
65 |
+ = 0} fitness is given by the distance from the reference and if the |
|
66 |
+ reference is the genotype with all positions mutated, then we have a |
|
67 |
+ fully additive model (fitness increases linearly with the number of |
|
68 |
+ positions mutated). |
|
69 |
+} |
|
70 |
+ |
|
71 |
+\value{ |
|
72 |
+ |
|
73 |
+ An matrix with \code{g + 1} columns. Each column corresponds to a |
|
74 |
+ gene, except the last one that corresponds to fitness. 1/0 in a gene |
|
75 |
+ column denotes gene mutated/not-mutated. (For ease of use in other |
|
76 |
+ functions, this matrix has class "genotype_fitness_matrix".) |
|
77 |
+ |
|
78 |
+} |
|
79 |
+\references{ |
|
80 |
+ |
|
81 |
+ Szendro I.~G. et al. (2013). Quantitative analyses of empirical |
|
82 |
+fitness landscapes. \emph{Journal of Statistical Mehcanics: Theory and |
|
83 |
+ Experiment\/}, \bold{01}, P01005. |
|
84 |
+ |
|
85 |
+Franke, J. et al. (2011). Evolutionary accessibility of mutational |
|
86 |
+pathways. \emph{PLoS Computational Biology\/}, \bold{7}(8), 1--9. |
|
87 |
+ |
|
88 |
+} |
|
89 |
+ |
|
90 |
+\author{ Ramon Diaz-Uriarte |
|
91 |
+} |
|
92 |
+ |
|
93 |
+\seealso{ |
|
94 |
+ |
|
95 |
+ \code{\link{oncoSimulIndiv}}, |
|
96 |
+ \code{\link{plot.genotype_fitness_matrix}}, |
|
97 |
+ \code{\link{evalAllGenotypes}} |
|
98 |
+ \code{\link{allFitnessEffects}} |
|
99 |
+ \code{\link{plotFitnessLandscape}} |
|
100 |