Note that parallelized codes provided in this page are designed for Single Program Multiple Data (SPMD) and pbdMPI as default. The parallel codes may not be optimized and may only perform efficiently in some circumstances, so the reports of performance are all skipped.
The essential purpose is to show how to utilize existed serial codes and rewrite/rethink them in parallel. Although the examples given here are all extremely simplified for illustration, these ideas are able to extend to more complex cases for real data and in real situation. We aim to explain parallel ideas from the view of statistics, and the better sense of statistics can help to think in applications and redesign algorithms. Some recipes for analyzing ultra-large/unlimited datasets are available in the following. Click titles to show pages in the bottom. For running codes see Rscript in details.
Binning -- Table Cutting and Binning, simple nonparametric method.
Basic -- Sample Mean and Sample Variance.
Quantile -- Quantile or Percentile.
OLS -- Ordinary Least Squares for Linear Models.
MVN -- Log Likelihood of Multivariate Normal Distribution.
PCA -- Principal Component Analysis.
Model-Based Clustering -- Finite Mixture Model and EGM Algorithm, and its older brother K-Means (Distance-Based Clustering).
More examples are in Tutorial 1, Tutorial 2, and Tutorial 3 of pbdR Tech web page including Bootstrap, MCMC, Bayesian Statistics, Logistic Regression, and Generalized Linear Mixed-effect Model.