Note that parallelized codes provided in this page are designed for
Single Program Multiple Data (SPMD) and
pbdMPI
as default.
The parallel codes may not be optimized and may only perform efficiently in
some circumstances, so the reports of performance are all skipped.
The essential purpose is to show how to utilize existed
serial codes and rewrite/rethink them in parallel.
Although the examples given here are all extremely simplified
for illustration, these ideas are able to
extend to more complex cases for real data and in real situation.
We aim to explain parallel ideas from the view of statistics, and
the better sense of statistics can help to think in applications
and redesign algorithms.
Some recipes for analyzing ultra-large/unlimited datasets are
available in the following. Click titles to show pages in the bottom.
For running codes see
Rscript in details.
1. Binning
-- Table Cutting and Binning, simple nonparametric method.
2. Basic
-- Sample Mean and Sample Variance.
3. Quantile
-- Quantile or Percentile.
4. OLS
-- Ordinary Least Squares for Linear Models.
5. MVN
-- Log Likelihood of Multivariate Normal Distribution.
6. PCA
-- Principal Component Analysis.
7. Model-Based Clustering
-- Finite Mixture Model and EGM Algorithm,
and its older brother K-Means
(Distance-Based Clustering).
8. More examples are in
Tutorial 1,
Tutorial 2, and
Tutorial 3
of
pbdR Tech web page
including Bootstrap, MCMC, Bayesian Statistics, Logistic Regression, and
Generalized Linear Mixed-effect Model.
---