Loading [Contrib]/a11y/accessibility-menu.js

Quantile -- Quantile and Percentile.

Quantile and percentile are also fundamental tool in Statistics. It is easy to compute in one processer for small data. According to the definition, a fast implementation can be done in R by utilizing uniroot(). Note that the cost of sort() may be too high for large data since parallel merging is required. Based on the same idea demonstrated here, a lot of optimization functions in R can be utilized in the same way for large datasets.


Serial code: (ex_quantile_serial.r)

# File name: ex_quantile_serial.r
# Run: Rscript --vanilla ex_quantile_serial.r

### Main codes start from here.
set.seed(1234)
N <- 100
y <- rnorm(N)

### Obtain 95% quantile.
quantile(y, probs = 0.95, names = FALSE)

Parallel (SPMD) code: (ex_quantile_spmd.r for ultra-large/unlimited $N$)

# File name: ex_quantile_spmd.r
# Run: mpiexec -np 2 Rscript --vanilla ex_quantile_spmd.r

### Load pbdMPI and initial the communicator.
library(pbdMPI, quiet = TRUE)
init()

### Main codes start from here.
set.seed(1234)
N <- 100
y <- rnorm(N)

### Load data partially by processors if N is ultra-large.
id.get <- get.jid(N)
y.spmd <- y[id.get]

### A function for uniroot.
f.quantile <- function(x, data.spmd, N, prob = 0.5){
  allreduce(sum(data.spmd <= x), op = "sum") / N - prob
} # End of f.quantile().

### Obtain 95% quantile.
ret <- uniroot(f.quantile, c(1.5, 2), y.spmd, N, prob = 0.95)

### Output from RANK 0 since mpi.reduce(...) will dump only to 0 by default.
comm.print(ret$root)
finalize()

Exercise:

  1. Try other optimization functions in R such as optim(), nlm(), ... etc.