Wikimaths

A collaborative site on maths but not only!

Outils pour utilisateurs

Outils du site


world:leaveoneout

$$ \newcommand{\arginf}{\mathrm{arginf}} \newcommand{\argmin}{\mathrm{argmin}} \newcommand{\argmax}{\mathrm{argmax}} \newcommand{\asconv}[1]{\stackrel{#1-a.s.}{\rightarrow}} \newcommand{\Aset}{\mathsf{A}} \newcommand{\b}[1]{{\mathbf{#1}}} \newcommand{\ball}[1]{\mathsf{B}(#1)} \newcommand{\bproof}{\textbf{Proof :}\quad} \newcommand{\bmuf}[2]{b_{#1,#2}} \newcommand{\card}{\mathrm{card}} \newcommand{\chunk}[3]{{#1}_{#2:#3}} \newcommand{\convprob}[1]{\stackrel{#1-\text{prob}}{\rightarrow}} \newcommand{\Cov}{\mathbb{C}\mathrm{ov}} \newcommand{\CPE}[2]{\PE\lr{#1| #2}} \renewcommand{\det}{\mathrm{det}} \newcommand{\dimlabel}{\mathsf{m}} \newcommand{\dimU}{\mathsf{q}} \newcommand{\dimX}{\mathsf{d}} \newcommand{\dimY}{\mathsf{p}} \newcommand{\dlim}{\Rightarrow} \newcommand{\e}[1]{{\left\lfloor #1 \right\rfloor}} \newcommand{\eproof}{\quad \Box} \newcommand{\eremark}{</WRAP>} \newcommand{\eqdef}{:=} \newcommand{\eqlaw}{\stackrel{\mathcal{L}}{=}} \newcommand{\eqsp}{\;} \newcommand{\Eset}{ {\mathsf E}} \newcommand{\esssup}{\mathrm{essup}} \newcommand{\fr}[1]{{\left\langle #1 \right\rangle}} \newcommand{\falph}{f} \renewcommand{\geq}{\geqslant} \newcommand{\hchi}{\hat \chi} \newcommand{\Id}{\mathrm{Id}} \newcommand{\img}{\text{Im}} \newcommand{\indi}[1]{\mathbf{1}_{#1}} \newcommand{\indiacc}[1]{\mathbf{1}_{\{#1\}}} \newcommand{\indin}[1]{\mathbf{1}\{#1\}} \newcommand{\itemm}{\quad \quad \blacktriangleright \;} \newcommand{\ker}{\text{Ker}} \newcommand{\klbck}[2]{\mathrm{K}\lr{#1||#2}} \newcommand{\law}{\mathcal{L}} \newcommand{\labelinit}{\pi} \newcommand{\labelkernel}{Q} \renewcommand{\leq}{\leqslant} \newcommand{\lone}{\mathsf{L}_1} \newcommand{\lrav}[1]{\left|#1 \right|} \newcommand{\lr}[1]{\left(#1 \right)} \newcommand{\lrb}[1]{\left[#1 \right]} \newcommand{\lrc}[1]{\left\{#1 \right\}} \newcommand{\lrcb}[1]{\left\{#1 \right\}} \newcommand{\ltwo}[1]{\PE^{1/2}\lrb{\lrcb{#1}^2}} \newcommand{\Ltwo}{\mathrm{L}^2} \newcommand{\mc}[1]{\mathcal{#1}} \newcommand{\mcbb}{\mathcal B} \newcommand{\mcf}{\mathcal{F}} \newcommand{\meas}[1]{\mathrm{M}_{#1}} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\normmat}[1]{{\left\vert\kern-0.25ex\left\vert\kern-0.25ex\left\vert #1 \right\vert\kern-0.25ex\right\vert\kern-0.25ex\right\vert}} \newcommand{\nset}{\mathbb N} \newcommand{\one}{\mathsf{1}} \newcommand{\PE}{\mathbb E} \newcommand{\PP}{\mathbb P} \newcommand{\Psif}{\Psi_f} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\pscal}[2]{\langle #1,#2\rangle} \newcommand{\psconv}{\stackrel{\PP-a.s.}{\rightarrow}} \newcommand{\qset}{\mathbb Q} \newcommand{\rmd}{\mathrm d} \newcommand{\rme}{\mathrm e} \newcommand{\rmi}{\mathrm i} \newcommand{\Rset}{\mathbb{R}} \newcommand{\rset}{\mathbb{R}} \newcommand{\rti}{\sigma} \newcommand{\section}[1]{==== #1 ====} \newcommand{\seq}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\set}[2]{\lrc{#1\eqsp: \eqsp #2}} \newcommand{\sg}{\mathrm{sgn}} \newcommand{\supnorm}[1]{\left\|#1\right\|_{\infty}} \newcommand{\thv}{{\theta_\star}} \newcommand{\tmu}{ {\tilde{\mu}}} \newcommand{\Tset}{ {\mathsf{T}}} \newcommand{\Tsigma}{ {\mathcal{T}}} \newcommand{\ttheta}{{\tilde \theta}} \newcommand{\tv}[1]{\left\|#1\right\|_{\mathrm{TV}}} \newcommand{\unif}{\mathrm{Unif}} \newcommand{\Xset}{{\mathsf X}} \newcommand{\Xsigma}{\mathcal X} \newcommand{\Yset}{{\mathsf Y}} \newcommand{\Ysigma}{\mathcal Y} \newcommand{\Var}{\mathbb{V}\mathrm{ar}} \newcommand{\zset}{\mathbb{Z}} \newcommand{\Zset}{\mathsf{Z}} $$

2017/10/07 23:39 · douc

Leave One Out Cross Validation (LOOCV) and Cook's distance

Denote the complete data set by $X \in \rset^{n \times p}$ and $y \in \rset^n$ and the leave-one-out data set by $X_{-i} \in \rset^{(n-1) \times p}$ and $y_{-i} \in \rset^{n-1}$. Compute the associated full and leave-one-out regression fits

\begin{align*} \hat \beta&=(X^T X)^{-1} X^T y\\ \hat \beta_{-i}&=(X_{-i}^T X_{-i})^{-1} X_{-i}^T y_{-i}\\ H&=X (X^T X)^{-1} X^T\\ H_{-i}&=X_{-i} (X_{-i}^T X_{-i})^{-1} X_{-i}^T \end{align*} Denote by $x_i$ the row vectors of $X$ and let $\hat y_i=x_i \hat \beta$ and $\hat y_{-i}=x_i \hat \beta_{-i}$ be the fitted values at $x_i$ when using all the data and when leaving out point $x_i$ respectively.

Fundamental Lemma For all $i \in \{1,\ldots,n\}$, \begin{equation} \label{eq:fond} y_i -\hat y_{-i}=\frac{y_i-\hat y_i}{1-H_{ii}} \end{equation}

This allows us to compute the leave-one-out cross-validation error for any leave-one-out data set using only a single fit, obtained from all the data.

Proof

The trick is to consider the augmented data set including the point $(x_i,\hat y _{-i})$. Define $\tilde y$ the vector $y$ where the i-th component $y_i$ is replaced by $\hat y_{-i}$ and note that for every $\beta \in \rset^p$, $$ \sum_{j} (\tilde y_j -x_j^T \beta)^2 \geq \sum_{j \neq i} (y_j -x_j^T \beta)^2 \geq \sum_{j \neq i} (y_j -x_j^T \hat \beta_{-i})^2 $$ with equality if $\beta=\hat \beta_{-i}$. This shows that $\hat \beta_{-i}$ is obtained by the regression of $\tilde y$ wrt the predictors $X$ (this is the TRICK!!!!). Therefore since by definition $\hat{y}_{-i}= x_i \hat \beta_{-i}$, \begin{align*} \hat{y}_{-i} &= x_i \hat \beta_{-i}=(H \tilde y)_i=\sum_{j\neq i}H_{ij}y_{j}+H_{ii}\hat{y}_{-i}= \hat{y}_{i}-H_{ii}y_{i}+H_{ii}\hat{y}_{-i} \end{align*} Rearranging the terms, we get \eqref{eq:fond}. $\eproof$

As a straightforward consequence of \eqref{eq:fond}, the Leave-One-Out Cross Validation coefficient can be easily written: \begin{align*} & LOOCV= n^{-1} \sum_{i=1}^n (y_i - \hat y_{-i})^2=n^{-1} \sum_{i=1}^n \frac{(y_i - \hat y_i)^2}{(1-H_{ii})^2} \end{align*}

A consequence of the fundamental Lemma on the Cook's distance

Lemma We have

\begin{align} &\hat \beta_{-i}=\hat \beta-\frac{(X^TX)^{-1} x_i^T (y_i-\hat y_{i})}{1-H_{ii}} \label{eq:fond:2}\\ &\label{eq:sigma} (n-p-1) \hat \sigma_{-i}^2=(n-p) \hat \sigma^2- \frac{(y_i-\hat y_i)^2}{1-H_{ii}} \end{align}

The last item of the lemma is not needed for establishing the expression of the Cook's distance. Still it gives an expression that allows to compare the error by taking into account the $i$-th observation or not.

Proof

  • (i). Denote $\epsilon_i\in \rset^n$ with null entries except for the i-th coefficient which is equal to 1. Note that $X^T \epsilon_i=x_i^T$. Then,

$$ \hat \beta_{-i}=(X^TX)^{-1} X^T \tilde y=(X^TX)^{-1} X^T (y-\epsilon_i(y_i-\hat y_{-i})) =\hat \beta-(X^TX)^{-1} x_i^T (y_i-\hat y_{-i})=\hat \beta-\frac{(X^TX)^{-1} x_i^T (y_i-\hat y_{i})}{1-H_{ii}} $$ This completes the proof of \eqref{eq:fond:2}.

  • (ii). Denote $u_i=y_i-\hat y_i$, $u_{-i}=y_i-\hat y_{-i}$ and $h_i=H_{ii}$. We now write

\begin{align*} \|P_{\img X^\perp}\tilde y\|^2&= \tilde y^T P_{\img X^\perp} \tilde y= (y-u_{-i} \epsilon_i)^T P_{\img X^\perp}(y-u_{-i} \epsilon_i)=\|P_{\img X^\perp} y\|^2+u_{-i}^2 \epsilon_i^T P_{\img X^\perp} \epsilon_i -2 u_{-i}\epsilon_i^T P_{\img X^\perp} y \\ & = \|P_{\img X^\perp} y\|^2+ \frac{u_i^2}{(1-h_i)^2} (1-h_i)-2 \frac{u_i}{1-h_i} \epsilon_i^T(y-\hat y)= \|P_{\img X^\perp} y\|^2-\frac{u_i^2}{1-h_i} \end{align*} which concludes the proof of \eqref{eq:sigma}. $\eproof$

This in turn, allows to express easily the Cook's distance (from \eqref{eq:fond:2}), \begin{align*} D_i=\frac{\|X\hat \beta-X\hat \beta_{-i}\|^2}{p \hat \sigma^2}=\frac{H_{ii}}{p(1-H_{ii})} r_i^2 \quad \mbox{where} \quad r_i=\frac{y_i-\hat y_i}{\hat \sigma \sqrt{1-H_{ii}}}\,, \quad \hat \sigma^2=\frac{RSS}{n-p}=\frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{n-p} \end{align*}

world/leaveoneout.txt · Dernière modification: 2020/03/19 17:50 (modification externe)