OLS -- Ordinary Least Squares for Linear Models.
Ordinary Least Squares (OLS) is a popular technique in Statistics to solve the problem $\boldsymbol{Y} = \boldsymbol{X\beta} + \boldsymbol{\epsilon}$ where $\boldsymbol{\beta}$ is unknown. The dimensions are $\boldsymbol{Y}_{N\times 1}$, $\boldsymbol{X}_{N\times p}$, $\boldsymbol{\beta}_{p\times 1}$ and $\boldsymbol{\epsilon}_{N\times 1}$ respectively, and $N >> p$. Under some fair assumptions, a standard solution with a few linear algebra is $\boldsymbol{\hat{\beta}} = (\boldsymbol{X}^t\boldsymbol{X})^{-1} \boldsymbol{X}^t\boldsymbol{Y}$. This estimate also has several good statistical properties.
The extensions and applications of OLS include regression,
linear model (lm
),
generalized linear model (glm
),
mixed model (lme
), ... etc.
More details about OLS can be found at
"Ordinary Least Squares" on Wikipedia.
Serial code: (ex_ols_serial.r)
# File name: ex_ols_serial.r
# Run: Rscript --vanilla ex_ols_serial.r
### Main codes stat from here.
set.seed(1234)
N <- 100
p <- 2
X <- matrix(rnorm(N * p), ncol = p)
beta <- c(1, 2)
epsilon <- rnorm(N)
Y <- X %*% beta + epsilon
### Obtain beta hat.
(beta.hat <- solve(t(X) %*% X) %*% t(X) %*% Y)
Parallel (SPMD) code: (ex_ols_spmd.r for ultra-large/unlimited $N$)
# File name: ex_ols_spmd.r
# Run: mpiexec -np 2 Rscript --vanilla ex_ols_spmd.r
### Load pbdMPI and initial the communicator.
library(pbdMPI, quiet = TRUE)
init()
### Main codes start from here.
set.seed(1234)
N <- 100 # Pretend N is large.
p <- 2
X <- matrix(rnorm(N * p), ncol = p)
beta <- c(1, 2)
epsilon <- rnorm(N)
Y <- X %*% beta + epsilon
### Load data partially by processors if N is ultra-large.
id.get <- get.jid(N)
X.spmd <- X[id.get,]
Y.spmd <- Y[id.get]
### Obtain beta hat.
t.X.spmd <- t(X.spmd)
A <- allreduce(t.X.spmd %*% X.spmd, op = "sum")
B <- allreduce(t.X.spmd %*% Y.spmd, op = "sum")
beta.hat <- solve(matrix(A, ncol = p)) %*% B
### Output.
comm.print(beta.hat)
finalize()