...  ... 
@@ 167,7 +167,9 @@ additive models.} 
167  167 
routines can have trouble (specially if you log) with values <=0. Or 
168  168 
we might have trouble if we want to log the fitness. This is done 
169  169 
after possibly taking logs. Noise is added to prevent creating several 
170 
 identical minimal fitness values.} 

170 
+ identical minimal fitness values. Note that \code{\link{allFitnessEffects}} will remove from the table 

171 
+ of genotypes any genotype with a fitness <= 1e9, thus 

172 
+ making it a nonviable genotype during simulations. } 

171  173 

172  174 
\item{K}{K for NK model; K is the number of loci with which each locus 
173  175 
interacts, and the larger the K the larger the ruggedness of the 
...  ... 
@@ 288,6 +290,9 @@ Optimum model component.} 
288  290 
of the data can be large, specially if \code{g} (the number of genes) 
289  291 
is large. 
290  292 

293 
+ Note that \code{\link{allFitnessEffects}} will remove from the table 

294 
+ of genotypes any genotype with a fitness <= 1e9, thus 

295 
+ making it a nonviable genotype during simulations. 

291  296 

292  297 
} 
293  298 

...  ... 
@@ 115,7 +115,7 @@ additive models.} 
115  115 
This option has no effect if you pass a threeelement vector for 
116  116 
\code{scale}. Using a threeelement vector for \code{scale} is 
117  117 
probably the most natural way of changing the scale and range of 
118 
 fitness while setting the wildtype to value of your choice. 

118 
+ fitness while setting the wildtype to a value of your choice. 

119  119 

120  120 
} 
121  121 

...  ... 
@@ 55,9 +55,36 @@ additive models.} 
55  55 
genotypes with that number of mutations have equal probability of 
56  56 
being the reference). } 
57  57 

58 
\item{scale}{Either NULL (nothing is done) or a twoelement vector. If a 

59 
 twoelement vector, fitness is rescaled between \code{scale[1]} (the 

60 
 minimum) and \code{scale[2]} (the maximum).} 

58 
+\item{scale}{Either NULL (nothing is done) or a two or threeelement 

59 
+ vector. 

60 
+ 

61 
+ If a twoelement vector, fitness is rescaled between 

62 
+ \code{scale[1]} (the minimum) and \code{scale[2]} (the maximum) and, 

63 
+ later, if you have selected it, \code{wt_is_1} will be enforced. 

64 
+ 

65 
+ If you pass a three element vector, fitness is rescaled so that the 

66 
+ new maximum fitness is \code{scale[1]}, the new minimum is 

67 
+ \code{scale[2]} and the new wildtype is \code{scale[3]}. If you pass a 

68 
+ three element vector, none of the \code{wt_is_1} options apply in this 

69 
+ case, to ensure you obtain the range you want. If you want the 

70 
+ wildtype to be one, pass it as the third element of the vector. 

71 
+ 

72 
+ As a consequence of using a three element vector, the amount of 

73 
+ stretching/compressing (i.e., scaling) of fitness values larger than 

74 
+ that of the wildtype will likely be different from the scaling of 

75 
+ fitness values smaller than that of the wildtype. In other words, 

76 
+ this argument allows you to change the spread of the positive and 

77 
+ negative fitness values (and you can make this difference extreme and 

78 
+ make most fitness values less than wildtype be 0 by using a huge 

79 
+ negative number huge in absolute value for \code{scale[2]} if you 

80 
+ then truncate at 0 see \code{truncate_at_9}). 

81 
+ 

82 
+ Using a three element vector is probably the most natural way of 

83 
+ changing the scale and range of fitness. 

84 
+ 

85 
+ See also \code{log} if you want the logtransformed values to respect 

86 
+ the scale. 

87 
+} 

61  88 

62  89 
\item{wt_is_1}{If "divide" the fitness of all genotypes is 
63  90 
divided by the fitness of the wildtype (after possibly adding a value 
...  ... 
@@ 83,14 +110,28 @@ additive models.} 
83  110 
option can easily lead to landscapes with no accessible genotypes 
84  111 
(even if you also use \code{scale}). 
85  112 

86 
 If "no", the fitness of the wildtype is not modified. } 

113 
+ If "no", the fitness of the wildtype is not modified. 

114 
+ 

115 
+ This option has no effect if you pass a threeelement vector for 

116 
+ \code{scale}. Using a threeelement vector for \code{scale} is 

117 
+ probably the most natural way of changing the scale and range of 

118 
+ fitness while setting the wildtype to value of your choice. 

119 
+ 

120 
+} 

87  121 

88  122 

89  123 
\item{log}{If TRUE, logtransform fitness. Actually, there are two 
90  124 
cases: if \code{wt_is_1 = "no"} we simply log the fitness values; 
91  125 
otherwise, we log the fitness values and add a 1, thus shifting all 
92  126 
fitness values, because by decree the fitness (birth rate) of the 
93 
 wildtype must be 1.} 

127 
+ wildtype must be 1. 

128 
+ 

129 
+ If you pass a threeelement vector for scale, you will want to pass 

130 
+ \code{exp(desired_max)}, \code{exp(desired_min)}, and 

131 
+ \code{exp(desired_wildtype)} to the \code{scale} argument. (We first 

132 
+ scale values in the original scale and then log them). In this case, 

133 
+ we ignore whatever you passed as \code{wt_is_1}, setting \code{wt_is_1 

134 
+ = "no"} to avoid modifying your requested value for the wildtype.} 

94  135 

95  136 
\item{min_accessible_genotypes}{If not NULL, the minimum number of 
96  137 
accessible genotypes in the fitness landscape. A genotype is 
...  ... 
@@ 314,11 +314,6 @@ MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} 
314  314 
## plotting and simulating an oncogenetic trajectory 
315  315 

316  316 

317 
r1 < rfitness(4) 

318 
plot(r1) 

319 
oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) 

320 
 

321 
 

322  317 
## NK model 
323  318 
rnk < rfitness(5, K = 3, model = "NK") 
324  319 
plot(rnk) 
...  ... 
@@ 328,6 +323,8 @@ oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) 
328  323 
radd < rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) 
329  324 
plot(radd) 
330  325 

326 
+ 

327 
+\dontrun{ 

331  328 
## Eggbox model 
332  329 
regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) 
333  330 
plot(regg) 
...  ... 
@@ 342,7 +339,8 @@ plot(ris) 
342  339 
rfull = rfitness(g=4, model="Full", i = 0.002, I=2, 
343  340 
K = 2, r = TRUE, 
344  341 
p = 0.2, P = 0.3, o = 0.3, O = 1) 
345 
plot(rfull) 

342 
+ plot(rfull) 

343 
+ } 

346  344 
} 
347  345 
\keyword{ datagen } 
348  346 

...  ... 
@@ 1,11 +1,12 @@ 
1  1 
\name{rfitness} 
2  2 
\alias{rfitness} 
3 
 

3 
+\encoding{UTF8} 

4  4 

5  5 
\title{Generate random fitness.} 
6  6 

7  7 
\description{ Generate random fitness landscapes under a House of Cards, 
8 
 Rough Mount Fuji, additive model, and Kauffman's NK model. } 

8 
+ Rough Mount Fuji (RMF), additive (multiplicative) model, Kauffman's NK 

9 
+ model, Ising model, Eggbox model and Full model} 

9  10 

10  11 

11  12 
\usage{ 
...  ... 
@@ 14,7 +15,9 @@ rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, 
14  15 
wt_is_1 = c("subtract", "divide", "force", "no"), 
15  16 
log = FALSE, min_accessible_genotypes = NULL, 
16  17 
accessible_th = 0, truncate_at_0 = TRUE, 
17 
 K = 1, r = TRUE, model = c("RMF", "NK")) 

18 
+ K = 1, r = TRUE, i = 0, I = 1, circular = FALSE, e = 0, E = 1, 

19 
+ H = 1, s = 0.1, S = 1, d = 0, o = 0, O = 1, p = 0, P = 1, 

20 
+ model = c("RMF", "Additive", "NK", "Ising", "Eggbox", "Full")) 

18  21 
} 
19  22 

20  23 

...  ... 
@@ 25,28 +28,32 @@ rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, 
25  28 
\item{g}{Number of genes.} 
26  29 

27  30 
\item{c}{The decrease in fitness of a genotype per each unit increase 
28 
 in Hamming distance from the reference genotype (see \code{reference}).} 

31 
+ in Hamming distance from the reference genotype for the RMF model 

32 
+ (see \code{reference}).} 

29  33 

30  34 
\item{sd}{The standard deviation of the random component (a normal 
31 
 distribution of mean \code{mu} and standard deviation \code{sd}).} 

35 
+ distribution of mean \code{mu} and standard deviation \code{sd}) for 

36 
+ the RMF and additive models .} 

32  37 

33  38 
\item{mu}{The mean of the random component (a normal distribution of 
34 
mean \code{mu} and standard deviation \code{sd}).} 

35 
 

36 
 

37 
\item{reference}{The reference genotype: for the deterministic, additive 

38 
 part, this is the genotype with maximal fitness, and all other 

39 
 genotypes decrease their fitness by \code{c} for every unit of Hamming 

40 
 distance from this reference. If "random" a genotype will be randomly 

41 
 chosen as the reference. If "max" the genotype with all positions 

42 
 mutated will be chosen as the reference. If you pass a vector (e.g., 

43 
 \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype. 

44 
 If "random2" a genotype will be randomly chosen as the reference. In 

45 
 contrast to "random", however, not all genotypes have the same 

46 
 probability of being chosen; here, what is equal is the probability 

47 
 that the reference genotype has 1, 2, ..., g, mutations (and, once a 

48 
 number mutations is chosen, all genotypes with that number of 

49 
 mutations have equal probability of being the reference). } 

39 
+mean \code{mu} and standard deviation \code{sd}) for the RMF and 

40 
+additive models.} 

41 
+ 

42 
+ 

43 
+\item{reference}{The reference genotype: in the RMF model, for the 

44 
+ deterministic, additive part, this is the genotype with maximal 

45 
+ fitness, and all other genotypes decrease their fitness by \code{c} 

46 
+ for every unit of Hamming distance from this reference. If "random" a 

47 
+ genotype will be randomly chosen as the reference. If "max" the 

48 
+ genotype with all positions mutated will be chosen as the 

49 
+ reference. If you pass a vector (e.g., \code{reference = c(1, 0, 1, 

50 
+ 0)}) that will be the reference genotype. If "random2" a genotype 

51 
+ will be randomly chosen as the reference. In contrast to "random", 

52 
+ however, not all genotypes have the same probability of being chosen; 

53 
+ here, what is equal is the probability that the reference genotype has 

54 
+ 1, 2, ..., g, mutations (and, once a number mutations is chosen, all 

55 
+ genotypes with that number of mutations have equal probability of 

56 
+ being the reference). } 

50  57 

51  58 
\item{scale}{Either NULL (nothing is done) or a twoelement vector. If a 
52  59 
twoelement vector, fitness is rescaled between \code{scale[1]} (the 
...  ... 
@@ 127,9 +134,45 @@ mean \code{mu} and standard deviation \code{sd}).} 
127  134 

128  135 
\item{r}{For the NK model, whether interacting loci are chosen at random 
129  136 
(\code{r = TRUE}) or are neighbors (\code{r = FALSE}).} 
137 
+\item{i}{For de Ising model, i is the mean cost for incompatibility with which 

138 
+ the genotype's fitness is penalized when in two adjacent genes, only one of 

139 
+ them is mutated.} 

140 
+ 

141 
+\item{I}{For the Ising model, I is the standard deviation for the cost 

142 
+ incompatibility (i).} 

143 
+ 

144 
+\item{circular}{For the Ising model, whether there is a circular arrangement, 

145 
+ where the last and the first genes are adjacent to each other.} 

146 
+ 

147 
+\item{e}{For the Eggbox model, mean effect in fitness for the neighbor 

148 
+ locus +/ e.} 

149 
+ 

150 
+\item{E}{For the Eggbox model, noise added to the mean effect in fitness (e).} 

151 
+ 

152 
+\item{H}{For Full models, standard deviation for the House of Cards model.} 

153 
+ 

154 
+\item{s}{For Full models, mean of the fitness for the Multiplicative model.} 

155 
+ 

156 
+\item{S}{For Full models, standard deviation for the Multiplicative model.} 

157 
+ 

158 
+\item{d}{For Full models, a disminishing (negative) or increasing 

159 
+ (positive) return as the peak is approached for multiplicative model.} 

160 
+ 

161 
+\item{o}{For Full models, mean value for the optimum model.} 

162 
+ 

163 
+\item{O}{For Full models, standard deviation for the optimum model.} 

130  164 

131 
\item{model}{One of "RMF" (default), for Rough Mount Fuji, or "NK", for 

132 
 Kauffman's NK model.} 

165 
+\item{p}{For Full models, the mean production value for each non 0 

166 
+ allele in the Optimum model component.} 

167 
+ 

168 
+\item{P}{For Full models, the associated stdev (of non 0 alleles) in the 

169 
+Optimum model component.} 

170 
+ 

171 
+ 

172 
+ 

173 
+\item{model}{One of "RMF" (default) for Rough Mount Fuji, "Additive" for 

174 
+ Additive model, "NK", for Kauffman's NK model, "Ising" for Ising model, 

175 
+ "Eggbox" for Eggbox model or "Full" for Full models.} 

133  176 
} 
134  177 

135  178 

...  ... 
@@ 146,14 +189,56 @@ mean \code{mu} and standard deviation \code{sd}).} 
146  189 
random variable (in this case, a normal deviate of mean \code{mu} 
147  190 
and standard deviation \code{sd}). 
148  191 

149 
 Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd 

150 
 = 0} fitness is given by the distance from the reference and if the 

151 
 reference is the genotype with all positions mutated, then we have a 

152 
 fully additive model (fitness increases linearly with the number of 

153 
 positions mutated). 

192 
+ When using \code{model = "RMF"}, setting \eqn{c = 0} we obtain a House 

193 
+ of Cards model. Setting \eqn{sd = 0} fitness is given by the 

194 
+ distance from the reference and if the reference is the genotype 

195 
+ with all positions mutated, then we have a fully additive model 

196 
+ (fitness increases linearly with the number of positions mutated), 

197 
+ where all mutations have the same effect. 

198 
+ 

199 
+ More flexible additive models can be used using \code{model = 

200 
+ "Additive"}. This model is like the Rough Mount Fuji model in Szendro 

201 
+ et al., 2013 or Franke et al., 2011, but in this case, each locus can 

202 
+ have different contributions to the fitness evaluation. This model is 

203 
+ also referred to as the "multiplicative" model in the literature as it 

204 
+ is additive in the logscale (e.g., see Brouillet et al., 2015 or 

205 
+ Ferretti et al., 2016). The contribution of each mutated allele to the 

206 
+ logfitness is a random deviate from a Normal distribution with 

207 
+ specified mean \code{mu} and standard deviation \code{sd}, and the 

208 
+ logfitness of a genotype is the sum of the contributions of each 

209 
+ mutated allele. There is no "reference" genotype in the Additive 

210 
+ model. There is no epistasis in the additve model because the effect 

211 
+ of a mutation in a locus does not depend on the genetic background, or 

212 
+ whether the rest of the loci are mutated or not. 

213 
+ 

214 
+ 

215 
+ When using \code{model = "NK"} fitness is drawn from a uniform (0, 1) 

216 
+ distribution. 

217 
+ 

218 
+ 

219 
+ When using \code{model = "Ising"} for each pair of interacting loci, 

220 
+ there is an associated cost if both alleles are not identical 

221 
+ (and therefore 'compatible'). 

222 
+ 

223 
+ 

224 
+ When using \code{model = "Eggbox"} each locus is either high or low fitness, 

225 
+ with a systematic change between each neighbor. 

226 
+ 

227 
+ 

228 
+ When using \code{model = "Full"}, the fitness is computed with different 

229 
+ parts of the previous models depending on the choosen parameters described 

230 
+ above. 

231 
+ 

232 
+ 

233 
+ For \code{model = "NK"  "Ising"  "Eggbox"  "Full"} the fitness 

234 
+ landscape is generated by directly calling the \code{fl_generate} 

235 
+ function of MAGELLAN 

236 
+ (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). See details in 

237 
+ Ferretti et al. 2016, or Brouillet et al., 2015. 

238 
+ 

154  239 

155  240 
For OncoSimulR, we often want the wildtype to have a mean of 
156 
 1. Reasonable settings are \code{mu = 1} and \code{wt_is_1 = 

241 
+ 1. Reasonable settings when using RMF are \code{mu = 1} and \code{wt_is_1 = 

157  242 
'subtract'} so that we simulate from a distribution centered in 1, and 
158  243 
we make sure afterwards (via a simple shift) that the wildtype is 
159  244 
actuall 1. The \code{sd} controls the standard deviation, with the 
...  ... 
@@ 162,14 +247,6 @@ mean \code{mu} and standard deviation \code{sd}).} 
162  247 
of the data can be large, specially if \code{g} (the number of genes) 
163  248 
is large. 
164  249 

165 
 

166 
 When using \code{model = "NK"}, the model used is Kauffman's NK model 

167 
 (see details in Ferretti et al., or Brouillet et al., below), as 

168 
 implemented in MAGELLAN 

169 
 (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). This fitness 

170 
 landscape is generated by directly calling the \code{fl_generate} 

171 
 function of MAGELLAN. Fitness is drawn from a uniform (0, 1) 

172 
 distribution. 

173  250 

174  251 
} 
175  252 

...  ... 
@@ 214,10 +291,12 @@ MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} 
214  291 
} 
215  292 

216  293 
\author{ Ramon DiazUriarte for the RMF and general wrapping 
217 
code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti 

218 
for the MAGELLAN code. 

219 
 

220 
} 

294 
+ code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and 

295 
+ L. Ferreti for the MAGELLAN code. Further contributions to the 

296 
+ additive model and to wrapping MAGELLAN code and documentation from 

297 
+ Guillermo Gorines Cordero, Ivan Lorca Alonso, Francisco MuÃ±oz Lopez, 

298 
+ David Roncero MoroÃ±o, Alvaro Quevedo, Pablo Perez, Cristina Devesa, 

299 
+ Alejandro Herrador.} 

221  300 

222  301 
\seealso{ 
223  302 

...  ... 
@@ 234,6 +313,7 @@ for the MAGELLAN code. 
234  313 
## Random fitness for four genesgenotypes, 
235  314 
## plotting and simulating an oncogenetic trajectory 
236  315 

316 
+ 

237  317 
r1 < rfitness(4) 
238  318 
plot(r1) 
239  319 
oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) 
...  ... 
@@ 243,7 +323,26 @@ oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) 
243  323 
rnk < rfitness(5, K = 3, model = "NK") 
244  324 
plot(rnk) 
245  325 
oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) 
246 
} 

247  326 

327 
+## Additive model 

328 
+radd < rfitness(4, model = "Additive", mu = 0.2, sd = 0.5) 

329 
+plot(radd) 

330 
+ 

331 
+## Eggbox model 

332 
+regg = rfitness(g=4,model="Eggbox", e = 2, E=2.4) 

333 
+plot(regg) 

334 
+ 

335 
+ 

336 
+## Ising model 

337 
+ris = rfitness(g=4,model="Ising", i = 0.002, I=2) 

338 
+plot(ris) 

339 
+ 

340 
+ 

341 
+## Full model 

342 
+rfull = rfitness(g=4, model="Full", i = 0.002, I=2, 

343 
+ K = 2, r = TRUE, 

344 
+ p = 0.2, P = 0.3, o = 0.3, O = 1) 

345 
+plot(rfull) 

346 
+} 

248  347 
\keyword{ datagen } 
249  348 

...  ... 
@@ 76,10 +76,14 @@ mean \code{mu} and standard deviation \code{sd}).} 
76  76 
option can easily lead to landscapes with no accessible genotypes 
77  77 
(even if you also use \code{scale}). 
78  78 

79 
 If "none", the fitness of the wildtype is not touched. } 

79 
+ If "no", the fitness of the wildtype is not modified. } 

80  80 

81  81 

82 
\item{log}{If TRUE, logtransform fitness.} 

82 
+\item{log}{If TRUE, logtransform fitness. Actually, there are two 

83 
+ cases: if \code{wt_is_1 = "no"} we simply log the fitness values; 

84 
+ otherwise, we log the fitness values and add a 1, thus shifting all 

85 
+ fitness values, because by decree the fitness (birth rate) of the 

86 
+ wildtype must be 1.} 

83  87 

84  88 
\item{min_accessible_genotypes}{If not NULL, the minimum number of 
85  89 
accessible genotypes in the fitness landscape. A genotype is 
...  ... 
@@ 110,10 +114,12 @@ mean \code{mu} and standard deviation \code{sd}).} 
110  114 
negative value for \code{accessible_th}. } 
111  115 

112  116 
\item{truncate_at_0}{If TRUE (the default) any fitness <= 0 is 
113 
 substituted by a small positive constant (1e9). Why? Because 

114 
 MAGELLAN and some plotting routines can have trouble (specially if you 

115 
 log) with values <=0. Or we might have trouble if we want to log the 

116 
 fitness.} 

117 
+ substituted by a small positive constant (a random uniform number 

118 
+ between 1e10 and 1e9). Why? Because MAGELLAN and some plotting 

119 
+ routines can have trouble (specially if you log) with values <=0. Or 

120 
+ we might have trouble if we want to log the fitness. This is done 

121 
+ after possibly taking logs. Noise is added to prevent creating several 

122 
+ identical minimal fitness values.} 

117  123 

118  124 
\item{K}{K for NK model; K is the number of loci with which each locus 
119  125 
interacts, and the larger the K the larger the ruggedness of the 
...  ... 
@@ 5,7 +5,7 @@ 
5  5 
\title{Generate random fitness.} 
6  6 

7  7 
\description{ Generate random fitness landscapes under a House of Cards, 
8 
 Rough Mount Fuji, or additive model. } 

8 
+ Rough Mount Fuji, additive model, and Kauffman's NK model. } 

9  9 

10  10 

11  11 
\usage{ 
...  ... 
@@ 13,7 +13,8 @@ 
13  13 
rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, 
14  14 
wt_is_1 = c("subtract", "divide", "force", "no"), 
15  15 
log = FALSE, min_accessible_genotypes = NULL, 
16 
 accessible_th = 0, truncate_at_0 = TRUE) 

16 
+ accessible_th = 0, truncate_at_0 = TRUE, 

17 
+ K = 1, r = TRUE, model = c("RMF", "NK")) 

17  18 
} 
18  19 

19  20 

...  ... 
@@ 51,7 +52,7 @@ mean \code{mu} and standard deviation \code{sd}).} 
51  52 
twoelement vector, fitness is rescaled between \code{scale[1]} (the 
52  53 
minimum) and \code{scale[2]} (the maximum).} 
53  54 

54 
\item{wt_is_1}{If "divide" (the default) the fitness of all genotypes is 

55 
+\item{wt_is_1}{If "divide" the fitness of all genotypes is 

55  56 
divided by the fitness of the wildtype (after possibly adding a value 
56  57 
to ensure no negative fitness) so that the wildtype (the genotype with 
57  58 
no mutations) has fitness 1. This is a case of scaling, and it is 
...  ... 
@@ 60,7 +61,7 @@ mean \code{mu} and standard deviation \code{sd}).} 
60  61 
likely that the final fitness will not respect the limits in 
61  62 
\code{scale}. 
62  63 

63 
 If "subtract" we shift all the fitness values (subtracting fitness of 

64 
+ If "subtract" (the default) we shift all the fitness values (subtracting fitness of 

64  65 
the wildtype and adding 1) so that the wildtype ends up with a fitness 
65  66 
of 1. This is also applied after \code{scale}, so if you specify both 
66  67 
"wt_is_1 = 'subtract'" and use an argument for \code{scale} it is most 
...  ... 
@@ 114,13 +115,23 @@ mean \code{mu} and standard deviation \code{sd}).} 
114  115 
log) with values <=0. Or we might have trouble if we want to log the 
115  116 
fitness.} 
116  117 

118 
+\item{K}{K for NK model; K is the number of loci with which each locus 

119 
+ interacts, and the larger the K the larger the ruggedness of the 

120 
+ landscape.} 

121 
+ 

122 
+\item{r}{For the NK model, whether interacting loci are chosen at random 

123 
+ (\code{r = TRUE}) or are neighbors (\code{r = FALSE}).} 

124 
+ 

125 
+\item{model}{One of "RMF" (default), for Rough Mount Fuji, or "NK", for 

126 
+ Kauffman's NK model.} 

117  127 
} 
118  128 

119  129 

120  130 
\details{ 
121  131 

122 
 The model used here follows the Rough Mount Fuji model in Szendro et 

123 
 al., 2013 or Franke et al., 2011. Fitness is given as 

132 
+ When using \code{model = "RMF"}, the model used here follows 

133 
+ the Rough Mount Fuji model in Szendro et al., 2013 or Franke et al., 

134 
+ 2011. Fitness is given as 

124  135 

125  136 
\deqn{f(i) = c d(i, reference) + x_i} 
126  137 

...  ... 
@@ 144,6 +155,15 @@ mean \code{mu} and standard deviation \code{sd}).} 
144  155 
is different from zero. In this case, with \code{c} large, the range 
145  156 
of the data can be large, specially if \code{g} (the number of genes) 
146  157 
is large. 
158 
+ 

159 
+ 

160 
+ When using \code{model = "NK"}, the model used is Kauffman's NK model 

161 
+ (see details in Ferretti et al., or Brouillet et al., below), as 

162 
+ implemented in MAGELLAN 

163 
+ (\url{http://wwwabi.snv.jussieu.fr/public/Magellan/}). This fitness 

164 
+ landscape is generated by directly calling the \code{fl_generate} 

165 
+ function of MAGELLAN. Fitness is drawn from a uniform (0, 1) 

166 
+ distribution. 

147  167 

148  168 
} 
149  169 

...  ... 
@@ 159,7 +179,12 @@ mean \code{mu} and standard deviation \code{sd}).} 
159  179 
\code{accessible_th} that show the number of accessible 
160  180 
genotypes under the specified threshold. 
161  181 
} 
162 
 

182 
+ 

183 
+ 

184 
+\note{MAGELLAN uses its own random number generating functions; using 

185 
+ \code{set.seed} does not allow to obtain the same fitness landscape 

186 
+ repeatedly.} 

187 
+ 

163  188 
\references{ 
164  189 

165  190 
Szendro I.~G. et al. (2013). Quantitative analyses of empirical 
...  ... 
@@ 169,9 +194,23 @@ fitness landscapes. \emph{Journal of Statistical Mehcanics: Theory and 
169  194 
Franke, J. et al. (2011). Evolutionary accessibility of mutational 
170  195 
pathways. \emph{PLoS Computational Biology\/}, \bold{7}(8), 19. 
171  196 

197 
+Brouillet, S. et al. (2015). MAGELLAN: a tool to explore small fitness 

198 
+landscapes. \emph{bioRxiv}, 

199 
+\bold{31583}. \url{http://doi.org/10.1101/031583} 

200 
+ 

201 
+Ferretti, L., Schmiegelt, B., Weinreich, D., Yamauchi, A., Kobayashi, 

202 
+Y., Tajima, F., & Achaz, G. (2016). Measuring epistasis in fitness 

203 
+landscapes: The correlation of fitness effects of mutations. \emph{Journal of 

204 
+Theoretical Biology\/}, \bold{396}, 132143. \url{https://doi.org/10.1016/j.jtbi.2016.01.037} 

205 
+ 

206 
+MAGELLAN web site: \url{http://wwwabi.snv.jussieu.fr/public/Magellan/} 

207 
+ 

172  208 
} 
173  209 

174 
\author{ Ramon DiazUriarte 

210 
+\author{ Ramon DiazUriarte for the RMF and general wrapping 

211 
+code. S. Brouillet, G. Achaz, S. Matuszewski, H. Annoni, and L. Ferreti 

212 
+for the MAGELLAN code. 

213 
+ 

175  214 
} 
176  215 

177  216 
\seealso{ 
...  ... 
@@ 192,6 +231,11 @@ r1 < rfitness(4) 
192  231 
plot(r1) 
193  232 
oncoSimulIndiv(allFitnessEffects(genotFitness = r1)) 
194  233 

234 
+ 

235 
+## NK model 

236 
+rnk < rfitness(5, K = 3, model = "NK") 

237 
+plot(rnk) 

238 
+oncoSimulIndiv(allFitnessEffects(genotFitness = rnk)) 

195  239 
} 
196  240 

197  241 
\keyword{ datagen } 
...  ... 
@@ 73,7 +73,7 @@ mean \code{mu} and standard deviation \code{sd}).} 
73  73 
it is up to you to make sure that the range of the scale argument 
74  74 
includes 1 or you might not get what you want). Note that using this 
75  75 
option can easily lead to landscapes with no accessible genotypes 
76 
 (unless you also use \code{scale}). 

76 
+ (even if you also use \code{scale}). 

77  77 

78  78 
If "none", the fitness of the wildtype is not touched. } 
79  79 

 Several improvements to rfitness.
 simOGraph using transitive reduction properly.
 Miscell documentation improvements.
 Updated citation to Bioinformatics paper.
gitsvnid: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@126818 bc3139a867e503109ffcced21a209358
...  ... 
@@ 10,9 +10,10 @@ 
10  10 

11  11 
\usage{ 
12  12 

13 
rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 

14 
 wt_is_1 = TRUE, log = FALSE, min_accessible_genotypes = 0, 

15 
 accessible_th = 0) 

13 
+rfitness(g, c = 0.5, sd = 1, mu = 1, reference = "random", scale = NULL, 

14 
+ wt_is_1 = c("subtract", "divide", "force", "no"), 

15 
+ log = FALSE, min_accessible_genotypes = NULL, 

16 
+ accessible_th = 0, truncate_at_0 = TRUE) 

16  17 
} 
17  18 

18  19 

...  ... 
@@ 26,7 +27,11 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
26  27 
in Hamming distance from the reference genotype (see \code{reference}).} 
27  28 

28  29 
\item{sd}{The standard deviation of the random component (a normal 
29 
 distribution of mean 0 and standard deviation \code{sd}).} 

30 
+ distribution of mean \code{mu} and standard deviation \code{sd}).} 

31 
+ 

32 
+\item{mu}{The mean of the random component (a normal distribution of 

33 
+mean \code{mu} and standard deviation \code{sd}).} 

34 
+ 

30  35 

31  36 
\item{reference}{The reference genotype: for the deterministic, additive 
32  37 
part, this is the genotype with maximal fitness, and all other 
...  ... 
@@ 46,15 +51,36 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
46  51 
twoelement vector, fitness is rescaled between \code{scale[1]} (the 
47  52 
minimum) and \code{scale[2]} (the maximum).} 
48  53 

49 
\item{wt_is_1}{If TRUE, fitness will be scaled so that the wildtype (the 

50 
 genotype with no mutations) has fitness of 1. This is applied after 

51 
 \code{scale}, so if you specify both it is most likely that the final 

52 
 fitness will not respect the limits in \code{scale}.} 

54 
+\item{wt_is_1}{If "divide" (the default) the fitness of all genotypes is 

55 
+ divided by the fitness of the wildtype (after possibly adding a value 

56 
+ to ensure no negative fitness) so that the wildtype (the genotype with 

57 
+ no mutations) has fitness 1. This is a case of scaling, and it is 

58 
+ applied after \code{scale}, so if you specify both 

59 
+ "wt_is_1 = 'divide'" and use an argument for \code{scale} it is most 

60 
+ likely that the final fitness will not respect the limits in 

61 
+ \code{scale}. 

62 
+ 

63 
+ If "subtract" we shift all the fitness values (subtracting fitness of 

64 
+ the wildtype and adding 1) so that the wildtype ends up with a fitness 

65 
+ of 1. This is also applied after \code{scale}, so if you specify both 

66 
+ "wt_is_1 = 'subtract'" and use an argument for \code{scale} it is most 

67 
+ likely that the final fitness will not respect the limits in 

68 
+ \code{scale} (though the distorsion might be simpler to see as just a 

69 
+ shift up or down). 

70 
+ 

71 
+ If "force" we simply set the fitness of the wildtype to 1, without any 

72 
+ divisions. This means that the \code{scale} argument would work (but 

73 
+ it is up to you to make sure that the range of the scale argument 

74 
+ includes 1 or you might not get what you want). Note that using this 

75 
+ option can easily lead to landscapes with no accessible genotypes 

76 
+ (unless you also use \code{scale}). 

77 
+ 

78 
+ If "none", the fitness of the wildtype is not touched. } 

53  79 

54  80 

55  81 
\item{log}{If TRUE, logtransform fitness.} 
56  82 

57 
\item{min_accessible_genotypes}{If larger than 0, the minimum number of 

83 
+\item{min_accessible_genotypes}{If not NULL, the minimum number of 

58  84 
accessible genotypes in the fitness landscape. A genotype is 
59  85 
considered accessible if you can reach if from the wildtype by going 
60  86 
through at least one path where all changes in fitness are larger or 
...  ... 
@@ 69,6 +95,10 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
69  95 
If the condition is not satisfied, we continue generating random 
70  96 
fitness landscapes with the specified parameters until the condition 
71  97 
is satisfied. 
98 
+ 

99 
+ (Why check against NULL and not against zero? Because this allows you 

100 
+ to count accessible genotypes even if you do not want to ensure a 

101 
+ minimum number of accessible genotypes.) 

72  102 
} 
73  103 

74  104 
\item{accessible_th}{The threshold for the minimal change in fitness at 
...  ... 
@@ 78,6 +108,12 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
78  108 
allow small decreases in fitness in successive steps, use a small 
79  109 
negative value for \code{accessible_th}. } 
80  110 

111 
+\item{truncate_at_0}{If TRUE (the default) any fitness <= 0 is 

112 
+ substituted by a small positive constant (1e9). Why? Because 

113 
+ MAGELLAN and some plotting routines can have trouble (specially if you 

114 
+ log) with values <=0. Or we might have trouble if we want to log the 

115 
+ fitness.} 

116 
+ 

81  117 
} 
82  118 

83  119 

...  ... 
@@ 90,14 +126,25 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
90  126 

91  127 
where \eqn{d(i, j)} is the Hamming distance between genotypes \eqn{i} 
92  128 
and \eqn{j} (the number of positions that differ) and \eqn{x_i} is a 
93 
 random variable (in this case, a normal deviate of mean 0 and standard 

94 
 deviation \code{sd}). 

129 
+ random variable (in this case, a normal deviate of mean \code{mu} 

130 
+ and standard deviation \code{sd}). 

95  131 

96  132 
Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd 
97  133 
= 0} fitness is given by the distance from the reference and if the 
98  134 
reference is the genotype with all positions mutated, then we have a 
99  135 
fully additive model (fitness increases linearly with the number of 
100  136 
positions mutated). 
137 
+ 

138 
+ For OncoSimulR, we often want the wildtype to have a mean of 

139 
+ 1. Reasonable settings are \code{mu = 1} and \code{wt_is_1 = 

140 
+ 'subtract'} so that we simulate from a distribution centered in 1, and 

141 
+ we make sure afterwards (via a simple shift) that the wildtype is 

142 
+ actuall 1. The \code{sd} controls the standard deviation, with the 

143 
+ usual working and meaning as in a normal distribution, unless \code{c} 

144 
+ is different from zero. In this case, with \code{c} large, the range 

145 
+ of the data can be large, specially if \code{g} (the number of genes) 

146 
+ is large. 

147 
+ 

101  148 
} 
102  149 

103  150 
\value{ 
gitsvnid: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@121246 bc3139a867e503109ffcced21a209358
Ramon DiazUriarte authored on 22/09/2016 16:47:10...  ... 
@@ 34,7 +34,13 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
34  34 
distance from this reference. If "random" a genotype will be randomly 
35  35 
chosen as the reference. If "max" the genotype with all positions 
36  36 
mutated will be chosen as the reference. If you pass a vector (e.g., 
37 
 \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype.} 

37 
+ \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype. 

38 
+ If "random2" a genotype will be randomly chosen as the reference. In 

39 
+ contrast to "random", however, not all genotypes have the same 

40 
+ probability of being chosen; here, what is equal is the probability 

41 
+ that the reference genotype has 1, 2, ..., g, mutations (and, once a 

42 
+ number mutations is chosen, all genotypes with that number of 

43 
+ mutations have equal probability of being the reference). } 

38  44 

39  45 
\item{scale}{Either NULL (nothing is done) or a twoelement vector. If a 
40  46 
twoelement vector, fitness is rescaled between \code{scale[1]} (the 
gitsvnid: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@120020 bc3139a867e503109ffcced21a209358
Ramon DiazUriarte authored on 10/08/2016 15:47:33...  ... 
@@ 23,7 +23,7 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
23  23 
\item{g}{Number of genes.} 
24  24 

25  25 
\item{c}{The decrease in fitness of a genotype per each unit increase 
26 
 in Hamming distance from the reference genotype (\code{reference}).} 

26 
+ in Hamming distance from the reference genotype (see \code{reference}).} 

27  27 

28  28 
\item{sd}{The standard deviation of the random component (a normal 
29  29 
distribution of mean 0 and standard deviation \code{sd}).} 
...  ... 
@@ 34,7 +34,7 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
34  34 
distance from this reference. If "random" a genotype will be randomly 
35  35 
chosen as the reference. If "max" the genotype with all positions 
36  36 
mutated will be chosen as the reference. If you pass a vector (e.g., 
37 
 \code{fittest = c(1, 0, 1, 0)}) that will be the reference genotype.} 

37 
+ \code{reference = c(1, 0, 1, 0)}) that will be the reference genotype.} 

38  38 

39  39 
\item{scale}{Either NULL (nothing is done) or a twoelement vector. If a 
40  40 
twoelement vector, fitness is rescaled between \code{scale[1]} (the 
...  ... 
@@ 101,7 +101,12 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
101  101 
column denotes gene mutated/notmutated. (For ease of use in other 
102  102 
functions, this matrix has class "genotype_fitness_matrix".) 
103  103 

104 
+ If you have specified \code{min_accessible_genotypes > 0}, the return 

105 
+ object has added attributes \code{accessible_genotypes} and 

106 
+ \code{accessible_th} that show the number of accessible 

107 
+ genotypes under the specified threshold. 

104  108 
} 
109 
+ 

105  110 
\references{ 
106  111 

107  112 
Szendro I.~G. et al. (2013). Quantitative analyses of empirical 
gitsvnid: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@119231 bc3139a867e503109ffcced21a209358
Ramon DiazUriarte authored on 09/07/2016 13:43:10...  ... 
@@ 4,16 +4,15 @@ 
4  4 

5  5 
\title{Generate random fitness.} 
6  6 

7 
\description{ 

8 
 Generate random fitness under a House of Cards, Rough Mount Fuji, or 

9 
 additive model. 

10 
} 

7 
+\description{ Generate random fitness landscapes under a House of Cards, 

8 
+ Rough Mount Fuji, or additive model. } 

11  9 

12  10 

13  11 
\usage{ 
14  12 

15  13 
rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
16 
 wt_is_1 = TRUE, log = FALSE) 

14 
+ wt_is_1 = TRUE, log = FALSE, min_accessible_genotypes = 0, 

15 
+ accessible_th = 0) 

17  16 
} 
18  17 

19  18 

...  ... 
@@ 48,7 +47,34 @@ rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 
48  47 

49  48 

50  49 
\item{log}{If TRUE, logtransform fitness.} 
50 
+ 

51 
+\item{min_accessible_genotypes}{If larger than 0, the minimum number of 

52 
+ accessible genotypes in the fitness landscape. A genotype is 

53 
+ considered accessible if you can reach if from the wildtype by going 

54 
+ through at least one path where all changes in fitness are larger or 

55 
+ equal to \code{accessible_th}. The changes in fitness are considered 

56 
+ at each mutational step, i.e., at each addition of one mutation we 

57 
+ compute the difference between the genotype with \code{k + 1} 

58 
+ mutations minus the ancestor genotype with \code{k} mutations. Thus, a 

59 
+ genotype is considered accessible if there is at least one path where 

60 
+ fitness increases at each mutational step by at least 

61 
+ \code{accessible_th}. 

62 
+ 

63 
+ If the condition is not satisfied, we continue generating random 

64 
+ fitness landscapes with the specified parameters until the condition 

65 
+ is satisfied. 

51  66 
} 
67 
+ 

68 
+\item{accessible_th}{The threshold for the minimal change in fitness at 

69 
+ each mutation step (i.e., between successive genotypes) that allows a 

70 
+ genotype to be regarded as accessible. This only applies if 

71 
+ \code{min_accessible_genotypes} is larger than 0. So if you want to 

72 
+ allow small decreases in fitness in successive steps, use a small 

73 
+ negative value for \code{accessible_th}. } 

74 
+ 

75 
+} 

76 
+ 

77 
+ 

52  78 
\details{ 
53  79 

54  80 
The model used here follows the Rough Mount Fuji model in Szendro et 
gitsvnid: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/OncoSimulR@118909 bc3139a867e503109ffcced21a209358
Ramon DiazUriarte authored on 23/06/2016 16:43:511  1 
new file mode 100644 
...  ... 
@@ 0,0 +1,114 @@ 
1 
+\name{rfitness} 

2 
+\alias{rfitness} 

3 
+ 

4 
+ 

5 
+\title{Generate random fitness.} 

6 
+ 

7 
+\description{ 

8 
+ Generate random fitness under a House of Cards, Rough Mount Fuji, or 

9 
+ additive model. 

10 
+} 

11 
+ 

12 
+ 

13 
+\usage{ 

14 
+ 

15 
+rfitness(g, c = 0.5, sd = 1, reference = "random", scale = NULL, 

16 
+ wt_is_1 = TRUE, log = FALSE) 

17 
+} 

18 
+ 

19 
+ 

20 
+ 

21 
+ 

22 
+\arguments{ 

23 
+ 

24 
+ \item{g}{Number of genes.} 

25 
+ 

26 
+ \item{c}{The decrease in fitness of a genotype per each unit increase 

27 
+ in Hamming distance from the reference genotype (\code{reference}).} 

28 
+ 

29 
+ \item{sd}{The standard deviation of the random component (a normal 

30 
+ distribution of mean 0 and standard deviation \code{sd}).} 

31 
+ 

32 
+\item{reference}{The reference genotype: for the deterministic, additive 

33 
+ part, this is the genotype with maximal fitness, and all other 

34 
+ genotypes decrease their fitness by \code{c} for every unit of Hamming 

35 
+ distance from this reference. If "random" a genotype will be randomly 

36 
+ chosen as the reference. If "max" the genotype with all positions 

37 
+ mutated will be chosen as the reference. If you pass a vector (e.g., 

38 
+ \code{fittest = c(1, 0, 1, 0)}) that will be the reference genotype.} 

39 
+ 

40 
+\item{scale}{Either NULL (nothing is done) or a twoelement vector. If a 

41 
+ twoelement vector, fitness is rescaled between \code{scale[1]} (the 

42 
+ minimum) and \code{scale[2]} (the maximum).} 

43 
+ 

44 
+\item{wt_is_1}{If TRUE, fitness will be scaled so that the wildtype (the 

45 
+ genotype with no mutations) has fitness of 1. This is applied after 

46 
+ \code{scale}, so if you specify both it is most likely that the final 

47 
+ fitness will not respect the limits in \code{scale}.} 

48 
+ 

49 
+ 

50 
+\item{log}{If TRUE, logtransform fitness.} 

51 
+} 

52 
+\details{ 

53 
+ 

54 
+ The model used here follows the Rough Mount Fuji model in Szendro et 

55 
+ al., 2013 or Franke et al., 2011. Fitness is given as 

56 
+ 

57 
+ \deqn{f(i) = c d(i, reference) + x_i} 

58 
+ 

59 
+ where \eqn{d(i, j)} is the Hamming distance between genotypes \eqn{i} 

60 
+ and \eqn{j} (the number of positions that differ) and \eqn{x_i} is a 

61 
+ random variable (in this case, a normal deviate of mean 0 and standard 

62 
+ deviation \code{sd}). 

63 
+ 

64 
+ Setting \eqn{c = 0} we obtain a House of Cards model. Setting \eqn{sd 

65 
+ = 0} fitness is given by the distance from the reference and if the 

66 
+ reference is the genotype with all positions mutated, then we have a 

67 
+ fully additive model (fitness increases linearly with the number of 

68 
+ positions mutated). 

69 
+} 

70 
+ 

71 
+\value{ 

72 
+ 

73 
+ An matrix with \code{g + 1} columns. Each column corresponds to a 

74 
+ gene, except the last one that corresponds to fitness. 1/0 in a gene 

75 
+ column denotes gene mutated/notmutated. (For ease of use in other 

76 
+ functions, this matrix has class "genotype_fitness_matrix".) 

77 
+ 

78 
+} 

79 
+\references{ 

80 
+ 

81 
+ Szendro I.~G. et al. (2013). Quantitative analyses of empirical 

82 
+fitness landscapes. \emph{Journal of Statistical Mehcanics: Theory and 

83 
+ Experiment\/}, \bold{01}, P01005. 

84 
+ 

85 
+Franke, J. et al. (2011). Evolutionary accessibility of mutational 

86 
+pathways. \emph{PLoS Computational Biology\/}, \bold{7}(8), 19. 

87 
+ 

88 
+} 

89 
+ 

90 
+\author{ Ramon DiazUriarte 

91 
+} 

92 
+ 

93 
+\seealso{ 

94 
+ 

95 
+ \code{\link{oncoSimulIndiv}}, 

96 
+ \code{\link{plot.genotype_fitness_matrix}}, 

97 
+ \code{\link{evalAllGenotypes}} 

98 
+ \code{\link{allFitnessEffects}} 

99 
+ \code{\link{plotFitnessLandscape}} 

100 