File I/O -- Input and output sequence data.
Input Phyclust
accepts three types of input:
- Data read from a text file in PHYLIP format.
- Data read from a text file in FASTA format.
- Data simulated by the
ms+seqgen
approach.
The data reading functions read.*()
will return a list object of
class seq.data
.
Suppose we call the returned list object ret
.
Then, ret$org.code
and ret$org
are two matrices that
store the data.
Matrix ret$org.code
contains the original data, e.g. A,G,C,T for
nucleotide, and
ret$org
contains the data formatted for
the computer, e.g. 0,1,2,3 for nucleotide.
Matrix ret$org
is translated from ret$org.code
according to the standard encoding of the chosen data type, and most
calculations are done with ret$org
.
Output Phyclust
outputs sequence data in two formats: PHYLIP or FASTA.
We use "Great pony EIAV rev datasets
" as examples,
pony524.phy
in PHYLIP
format and
pony625.fas
in FASTA
format.
The other example for the data sets can be found at
here.
The following code will read in two file, create objects with class
seq.data
, and save the data matrix in two new files in
the working directory.
Read a PHYLIP file
> data.path <- paste(.libPaths()[1], "/phyclust/data/pony524.phy", sep = "")
> (my.pony.524 <- read.phylip(data.path))
code.type: NUCLEOTIDE, n.seq: 146, seq.len: 405.
> str(my.pony.524)
List of 7
$ code.type: chr "NUCLEOTIDE"
$ info : chr " 146 405"
$ nseq : num 146
$ seqlen : num 405
$ seqname : Named chr [1:146] "AF314258" "AF314259" "AF314260" "AF314261" ...
..- attr(*, "names")= chr [1:146] "1" "2" "3" "4" ...
$ org.code : chr [1:146, 1:405] "g" "g" "g" "g" ...
$ org : num [1:146, 1:405] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "seq.data"
Read a FASTA file
> data.path <- paste(.libPaths()[1], "/phyclust/data/pony625.fas", sep = "")
> (my.pony.625 <- read.fasta.nucleotide(data.path))
code.type: NUCLEOTIDE, n.seq: 62, seq.len: 406.
> str(my.pony.625)
List of 6
$ code.type: chr "NUCLEOTIDE"
$ nseq : num 62
$ seqlen : int 406
$ seqname : chr [1:62] "AF512608" "AF512609" "AF512610" "AF512611" ...
$ org.code : chr [1:62, 1:406] "G" "G" "G" "G" ...
$ org : num [1:62, 1:406] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "class")= chr "seq.data"
Save files
> # PHYLIp
> write.phylip(my.pony.625$org, "new.625.txt")
> edit(file = "new.625.txt")
> # FASTA
> write.fasta(my.pony.524$org, "new.524.txt")
> edit(file = "new.524.txt")