ms+seqgen
-- Simulation sequences according a tree.
In the following, we demonstrate the ms+seqgen
approach
to generate sequences. The result is a character
vector of class seqgen
, which contains 5 sequences,
each of 40 bases (seq-gen
option -l40
).
The option -mHKY
specifies the HKY85
model of evolution (Hasegawa et al. 1985).
Without further options, HKY85 is equivalent to the JC69
model (Jukes and Cantor 1969).
Note that the seqgen()
function requires a rooted tree
in NEWICK format or an object of class phylo
.
> set.seed(123)
> ret.ms <- ms(nsam = 5, nreps = 1, opts = "-T")
> tree.anc <- read.tree(text = ret.ms[3])
> set.seed(123)
> seqgen(opts = "-mHKY -l40", newick.tree = ret.ms[3])
5 40
s1 CTCTCATTGGACGCACACTTTAGGGGGGGATTGCACTGCA
s5 CTCTCTCTGGACGCACACTTTAAGGGGGGATTGAACTACA
s2 CTCTTCGGGCTCGGATAAGTTTGGAGGGTTGTTCTCTACA
s3 CTCTGAGTGCTCGGATTAGTTAGGGGGAATGACGTCTACA
s4 CTCTTATCTCTCGGATAAGTTGGGGGTGATGGCTTTTACA
> set.seed(123)
> (ret.seq <- seqgen(opts = "-mHKY -l40", rooted.tree = tree.anc))
5 40
s1 CTCTCATTGGACGCACACTTTAGGGGGGGATTGCACTGCA
s5 CTCTCTCTGGACGCACACTTTAAGGGGGGATTGAACTACA
s2 CTCTTCGGGCTCGGATAAGTTTGGAGGGTTGTTCTCTACA
s3 CTCTGAGTGCTCGGATTAGTTAGGGGGAATGACGTCTACA
s4 CTCTTATCTCTCGGATAAGTTGGGGGTGATGGCTTTTACA
> str(ret.seq)
Class 'seqgen' chr [1:6] " 5 40" "s1 CTCTCATTGGACGCACACTTTAGGGGGG ...
The following example generates a tree and provides an ancestral
sequence. Function seqgen()
will use
parameters $\kappa$ (kappa
) and
$\pi_{A}, \pi_{G}, \pi_{C}, \pi_{T}$
(pi.HKY
)
to evolve the ancestral sequence
(anc.HKY
) down the tree.
The function read.seqgen()
reads the seqgen()
object and returns a new data set of class seq.data
for use by the function phyclust()
.
> # Generate a tree
> set.seed(1234)
> ret.ms <- ms(nsam = 5, nreps = 1, opts = "-T")
> tree.ms <- read.tree(text = ret.ms[3])
>
> # Generate nucleotide sequences
> (anc.HKY <- rep(0:3, 3))
[1] 0 1 2 3 0 1 2 3 0 1 2 3
> paste(nid2code(anc.HKY, lower.case = FALSE), collapse = "")
[1] "AGCTAGCTAGCT"
> pi.HKY <- c(0.2, 0.2, 0.3, 0.3)
> kappa <- 1.1
> L <- length(anc.HKY)
> set.seed(1234)
> (HKY.1 <- gen.seq.HKY(tree.ms, pi.HKY, kappa, L, anc.seq = anc.HKY))
5 12
s1 AGCTTGACCGGC
s3 AGCTTCACCGGT
s2 ACCTCGCTAGCT
s4 ACGACGCTCGCT
s5 CCTACGCTAGCT
> (ret <- read.seqgen(HKY.1))
code.type: NUCLEOTIDE, n.seq: 5, seq.len: 12.