phyclust {phyclust} | R Documentation |
The main function of phyclust implements finite mixture models for sequence data that the mutation processes are modeled by evolution processes based on Continuous Time Markov Chain theory.
phyclust(X, K, EMC = .EMC, manual.id = NULL, label = NULL, byrow = TRUE)
X |
nid/sid matrix with N rows/sequences and L columns/sites. |
K |
number of clusters. |
EMC |
EM control. |
manual.id |
manually input class ids. |
label |
label of sequences for semi-supervised clustering |
byrow |
advanced option for |
X
should be a numerical matrix containing sequence data that
can be transfered by code2nid
or code2sid
.
EMC
contains all options used for EM algorithms.
manual.id
manually input class ids as an initialization only for
the initialization method, 'manualMu'.
label
indicates the known clusters for labeled sequences which is a
vector with length N
and has values from 0
to K
.
0
indicates clusters are unknown. label = NULL
is for
unsupervised clustering. Only un- and semi-supervised clustering are
implemented.
byrow
used in bootstraps to avoid transposing matrix 'X'. If
FALSE, then the 'X' should be have the dimension L*K.
A list with class phyclust
will be returned containing
several elements as the following:
'N.X.org' |
number of sequences in the | |||||||||||||||
'N.X.unique' |
number of unique sequences in the | |||||||||||||||
'L' |
number of sites, length of sequences, number of column of the | |||||||||||||||
'K' |
number of clusters. | |||||||||||||||
'Eta' |
proportion of subpopulations, eta_k, length = | |||||||||||||||
'Z.normalized' |
posterior probabilities, Z_nk, each row sums to 1. | |||||||||||||||
'Mu' |
centers of subpopulations, dim = K*L, each row is a center. | |||||||||||||||
'QA' |
Q matrix array, information for the evolution model, a list contains:
| |||||||||||||||
'logL' |
log likelihood values. | |||||||||||||||
'p' |
number of parameters. | |||||||||||||||
'bic' |
BIC, -2logL + plogN. | |||||||||||||||
'aic' |
AIC, -2logL + 2p. | |||||||||||||||
'N.seq.site' |
number of segregating sites. | |||||||||||||||
'class.id' |
class id for each sequences based on the maximum posterior. | |||||||||||||||
'n.class' |
number of sequences in each cluster. | |||||||||||||||
'conv' |
convergence information, a list contains:
| |||||||||||||||
'init.procedure' |
initialization procedure. | |||||||||||||||
'init.method' |
initialization method. | |||||||||||||||
'substitution.model' |
substitution model. | |||||||||||||||
'edist.model' |
evolution distance model. | |||||||||||||||
'code.type' |
code type. | |||||||||||||||
'em.method' |
EM algorithm. | |||||||||||||||
'boundary.method' |
boundary method. | |||||||||||||||
'label.method' |
label method. |
make a general class for Q
and QA
.
Wei-Chen Chen wccsnow@gmail.com
Phylogenetic Clustering Website: http://snoweye.github.io/phyclust/
.EMC
,
.EMControl
,
find.best
,
phyclust.se
.
phyclust.se.update
.
## Not run: library(phyclust, quiet = TRUE) X <- seq.data.toy$org set.seed(1234) (ret.1 <- phyclust(X, 3)) EMC.2 <- .EMC EMC.2$substitution.model <- "HKY85" # the same as EMC.2 <- .EMControl(substitution.model = "HKY85") (ret.2 <- phyclust(X, 3, EMC = EMC.2)) # for semi-supervised clustering semi.label <- rep(0, nrow(X)) semi.label[1:3] <- 1 (ret.3 <- phyclust(X, 3, EMC = EMC.2, label = semi.label)) ## End(Not run)