This section shows you how to write a fast loop, and how efficient it is. Five examples will be demonstrated different ways o write the loop for the same purpose which is "sum a large matrix several times", as the following,
- "Sum by for 1" --
use
for()
loop to sum up the matrix by column. - "Sum by for 2" --
use
for()
loop to sum up the matrix by row. - "Sum by apply" --
use
apply()
function to sum up the matrix. - "Sum by rowSums" --
use
rowSums()
internal function to sum up the matrix. - "Sum by dyn" -- use "dynamical loading" to load external function to sum up the matrix.
- Sum by for 1
First, create an R code file "loop_for_1.r" contains this
# File name: loop_for_1.r my.loop <- 20 m.dim <- list(nrow = 200000, ncol = 10) m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol) ret <- 0 start <- Sys.time() for(k in 1 : my.loop){ for (i in 1 : m.dim$nrow){ for (j in 1 : m.dim$ncol){ ret <- ret + m[i, j] } } } Sys.time() - start
- Sum by for 2
First, create an R code file "loop_for_2.r" contains this
# File name: loop_for_2.r my.loop <- 20 m.dim <- list(nrow = 200000, ncol = 10) m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol) ret <- 0 start <- Sys.time() for(k in 1 : my.loop){ for (j in 1 : m.dim$ncol){ for (i in 1 : m.dim$nrow){ ret <- ret + m[i, j] } } } Sys.time() - start
- Sum by apply
And then, create an R code file "loop_apply.r" contains this
# File name: loop_apply.r my.loop <- 20 m.dim <- list(nrow = 200000, ncol = 10) m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol) ret <- 0 start <- Sys.time() for(k in 1 : my.loop){ ret <- ret + sum(apply(m, 1, sum)) } Sys.time() - start
- Sum by rowSums
And then, create an R code file "loop_rowSums.r" contains this
# File name: loop_rowSums.r my.loop <- 20 m.dim <- list(nrow = 200000, ncol = 10) m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol) ret <- 0 start <- Sys.time() for(k in 1 : my.loop){ ret <- ret + sum(rowSums(m)) } Sys.time() - start
- Sum by dyn
Create a Fortran code file "loop_dyn.f" contains this
c File name: lood_dyn.f c For dynamical load compile by g77. c SHELL> g77 -c loop_dyn.f ; g77 -shared -o loop_dyn.so loop_dyn.o subroutine dynsum(nrow, ncol, m, ret) integer i, j, nrow, ncol real*8 m(nrow, ncol), ret ret = 0 do j = 1, ncol do i = 1, nrow ret = ret + m(i, j) end do end do return end c Output is a shared library "loop_dyn.so" can called by R.
And, create an R code file "loop_dyn.r" contains this
# File name: loop_dyn.r dyn.load("loop_dyn.so") # For Windows will like this # dyn.load("C:/Windows/Desktop/loop_dyn.dll") my.loop <- 20 m.dim <- list(nrow = 200000, ncol = 10) m <- matrix(1, nrow = m.dim$nrow, ncol = m.dim$ncol) ret <- 0 dynsum.f <- function(m) { ret <- .Fortran("dynsum", nrow = nrow(m), ncol = ncol(m), m = as.double(m), ret = as.double(m)) ret$ret } start <- Sys.time() for(k in 1 : my.loop){ ret <- ret + dynsum.f(m) } Sys.time() - start dyn.unload("loop_dyn.so") # For Windows will like this # dyn.unload("C:/Windows/Desktop/loop_dyn.dll")
For test, download the example "loop_dyn.dll" to "C:\Windows\Desktop\".
- Computing time
For PIII-1.4G PC, the test computing time as follows,
Sum by
Loopfor 1 for 2 apply rowSums dyn Time
(secs)331 307 117 2 19
- Conclusion
Use default internal function.
Use external compiled function.
Use "apply" to substitute "for loop".
See "apply", "lapply", "tapply", "sapply".
Use a column-wise data structure in R and Fortran.
Use a row-wise data structure in C.