Overview

Phylogenetic clustering (phyloclustering) is an evolutionary Continuous Time Markov Chain (CTMC) model-based approach that identifies population structure from molecular data without assuming linkage equilibrium. The goal is to use a statistical approach to find the population structure from tones of sequences which can be SNPs, DNAs, codons, ... etc, to cluster individuals into subpopulations, and to identify molecular sequences representative of those subpopulations. It is an approximate solution to the NP-complete problem of estimating phylogenetic trees. It also benefits varied research fields such as

Details and references can be found in Method and Document.


Purpose

The major goals of phyloclustering are:

  1. to distinguish ancestors where sequences evolve from,
  2. to determine population structure based on classifications,
  3. to avoid possible sequencing or alignment discrepancy, and
  4. to aggregate trustworthy sequence information.

In phyloclusterng, the similarity of sequences in a group is characterized by mutation processes rather than nucleotide frequency. A naive example is illustrated in the table below to illustrate phyloclustering.