# 最小平方法 2

$$A\in M_{m\times n}, \quad b \in M_{m\times 1}, \quad x\in M_{n\times 1}.$$

## 1. Motivation

### 1.1 Non-uniqueness

#### 1.1.1 Sensitivity in prediction

$$a + b = 0,$$

$$f_1(x,y) = 10000x-10000y.$$

$$f_2(x,y) = x-y.$$

## 2. Ridge regression and its dual problem

$$\newcommand{\argmin}{\arg\min} \tag{1} \hat{x} = \argmin_{x\in\mathbb{R}^n}\left(\|Ax - b\|^2+\|x\|^2\right).$$

$$\tag{2} \|Ax - b\|^2+\|x\|^2 = \left\|\begin{bmatrix}A\\ I \end{bmatrix}x - \begin{bmatrix}b\\ 0 \end{bmatrix}\right\|^2.$$

$$\tag{3} \begin{bmatrix}A^T & I \end{bmatrix} \begin{bmatrix}A\\ I \end{bmatrix}\hat{x} = \begin{bmatrix}A^T & I \end{bmatrix} \begin{bmatrix}b\\ 0 \end{bmatrix},$$

$$\tag{4} (A^TA + I)\hat{x} = A^Tb.$$

$$\tag{5} \hat{x} = A^T(b-A\hat{x}),$$

$$\tag{6} \alpha = b-A\hat{x},$$

$$\tag{7} \hat{x} = A^T\alpha.$$

$$\tag{8} \alpha = b-A\hat{x} = b-AA^T\alpha,$$

$$\tag{9} (AA^T+ I)\alpha = b.$$

$$\tag{10} \hat{x} = A^T(AA^T + I)^{-1}b.$$

### 2.1 QR decomposition

$$\tag{11} A = QR,$$ where $Q^TQ= I_{r\times r}$, $Q\in M_{m\times r}$ and $R\in M_{r\times n}$.

$$\tag{12} (R^TR+I)\hat{x} = R^TQ^Tb.$$

$$\tag{13} \hat{x} = R^T(RR^T+I)^{-1}Q^Tb,$$

## 3. Conclusion

$$\min_{x\in\mathbb{R}^n}\left(\|Ax - b\|^2+\|x\|^2\right),$$

• 如果 $m>n$, 我們以下列式子來計算 $$\hat{x} = (A^TA+I)^{-1}A^Tb.$$
• 如果對 $A$ 做 (reduced) QR, $A=QR$, 並且 $Q^TQ=I_{n\times n}$, $$\hat{x} = (R^TR+I)^{-1}R^TQ^Tb.$$
• 如果 $m<n$, 我們以下列式子來計算 $$\hat{x} = A^T(AA^T+I)^{-1}b.$$
• 如果對 $A$ 做 (reduced) QR, $A=QR$, 並且 $Q^TQ=I_{n\times n}$, $$\hat{x} = R^T(RR^T+I)^{-1}Q^Tb.$$

