Quantile -- Quantile and Percentile.
Quantile and percentile are also fundamental tool in Statistics.
It is easy to compute in one processer for small data.
According to the definition, a fast implementation can be done in R by
utilizing uniroot()
.
Note that the cost of sort()
may be too high for large data
since parallel merging is required.
Based on the same idea demonstrated here, a lot of optimization functions
in R can be utilized in the same way for large datasets.
Serial code: (ex_quantile_serial.r)
# File name: ex_quantile_serial.r
# Run: Rscript --vanilla ex_quantile_serial.r
### Main codes start from here.
set.seed(1234)
N <- 100
y <- rnorm(N)
### Obtain 95% quantile.
quantile(y, probs = 0.95, names = FALSE)
Parallel (SPMD) code: (ex_quantile_spmd.r for ultra-large/unlimited $N$)
# File name: ex_quantile_spmd.r
# Run: mpiexec -np 2 Rscript --vanilla ex_quantile_spmd.r
### Load pbdMPI and initial the communicator.
library(pbdMPI, quiet = TRUE)
init()
### Main codes start from here.
set.seed(1234)
N <- 100
y <- rnorm(N)
### Load data partially by processors if N is ultra-large.
id.get <- get.jid(N)
y.spmd <- y[id.get]
### A function for uniroot.
f.quantile <- function(x, data.spmd, N, prob = 0.5){
allreduce(sum(data.spmd <= x), op = "sum") / N - prob
} # End of f.quantile().
### Obtain 95% quantile.
ret <- uniroot(f.quantile, c(1.5, 2), y.spmd, N, prob = 0.95)
### Output from RANK 0 since mpi.reduce(...) will dump only to 0 by default.
comm.print(ret$root)
finalize()
Exercise:
- Try other optimization functions in R such as
optim()
,nlm()
, ... etc.