#### Quantile -- Quantile and Percentile. Quantile and percentile are also fundamental tool in Statistics. It is easy to compute in one processer for small data. According to the definition, a fast implementation can be done in R by utilizing <code>uniroot()</code>. Note that the cost of <code>sort()</code> may be too high for large data since parallel merging is required. Based on the same idea demonstrated here, a lot of optimization functions in R can be utilized in the same way for large datasets. --- #### Serial code: (<a href="./ex_quantile_serial.r" target="_blank">ex_quantile_serial.r</a>) ``` # File name: ex_quantile_serial.r # Run: Rscript --vanilla ex_quantile_serial.r ### Main codes start from here. set.seed(1234) N <- 100 y <- rnorm(N) ### Obtain 95% quantile. quantile(y, probs = 0.95, names = FALSE) ``` --- #### Parallel (SPMD) code: (<a href="./ex_quantile_spmd.r" target="_blank">ex_quantile_spmd.r</a> for ultra-large/unlimited $N$) ``` # File name: ex_quantile_spmd.r # Run: mpiexec -np 2 Rscript --vanilla ex_quantile_spmd.r ### Load pbdMPI and initial the communicator. library(pbdMPI, quiet = TRUE) init() ### Main codes start from here. set.seed(1234) N <- 100 y <- rnorm(N) ### Load data partially by processors if N is ultra-large. id.get <- get.jid(N) y.spmd <- y[id.get] ### A function for uniroot. f.quantile <- function(x, data.spmd, N, prob = 0.5){ allreduce(sum(data.spmd <= x), op = "sum") / N - prob } # End of f.quantile(). ### Obtain 95% quantile. ret <- uniroot(f.quantile, c(1.5, 2), y.spmd, N, prob = 0.95) ### Output from RANK 0 since mpi.reduce(...) will dump only to 0 by default. comm.print(ret$root) finalize() ``` --- #### Exercise: 1. Try other optimization functions in R such as <code>optim()</code>, <code>nlm()</code>, ... etc. --- <div w3-include-html="../preamble_tail_date.html"></div>