Loading [Contrib]/a11y/accessibility-menu.js

HIV -- Human Immunodeficiency Virus, an example of Next Generation Sequencing

This is an application to the Next Generation Sequencing which produces large amount of short-read fragments. Also, this shows the capability of Phyloclustering to deal with large data. The data are available from the paper (contact author to obtain files)

This paper sequences HIV viruses collected from several patients participating in a clinic trial study in order to determinate the escaping ability of HIV virus.

Here, we focus on the patient "V11909". The detail steps of this phyloclustering reanalysis are the following:

  1. obtain alignment for each fragments to a reference sequence,
  2. insert common gaps into the reference sequence and all fragments accordingly,
  3. run phyclust to analyze the commonly aligned fragments containing large number of gaps, and
  4. plot the clustering results for the aligned fragments.

The data for phyclust have the size of 5177 fragments and each has 1617 bps after resembling the data. We conservatively pick $K = 3$ as the results and display in the figures (click to enlarge.) The colored dots represent four different nucleotides and the gray areas are all gaps. In the first figure, we may say that the first cluster is the major population in this patient. Further, it possibly exists other two small populations. In the second figure, the three central sequences for each cluster are summarized where we compare the second and third central sequences against the first central sequence (top bar) and plot the mutation sites which are denoted by the origin dots in the bottom.


Maintained: Wei-Chen Chen
E-Mail: wccsnow at gmail dot com
Last Updated: December 30, 2016
Created: November 20, 2009