Stein’s lemma(斯坦引理)与MMSE估计

Auth:Lewis       Date:2025/10/2       Cat:通信       Word:共5627字

国庆没人找我玩,好无聊,借此也梳理一下关于Stein's lemma的相关内容。

Stein's lemma(斯坦引理)

首先,介绍一下这个引理。单变量的情况下,$X\sim{\cal N}\left( {\mu ,{\sigma ^2}} \right)$,且$g\left( X \right):{{\mathbb R}^1} \to {{\mathbb R}^1}$,${\mathbb E}\left( {g\left( X \right)} \right)$以及${\mathbb E}\left( {g'\left( X \right)} \right)$为有界,则 $${\mathbb E}\left( {g\left( X \right)\left( {X - \mu } \right)} \right) = {\sigma ^2}{\mathbb E}\left( {g'\left( X \right)} \right)$$ 其中${g'\left( X \right)}=\frac{dg(x)}{dx}$.其证明如下: $$\begin{array}{l} {\mathbb E}\left( {g\left( X \right)\left( {X - \mu } \right)} \right) = \int {g\left( X \right)\left( {X - \mu } \right){\cal N}\left( {\mu ,{\sigma ^2}} \right)dX} \\ =  - \frac{{{\sigma ^2}}}{{\sqrt {2\pi } \sigma }}\int {g\left( X \right)d\exp \left( { - \frac{{{{\left( {X - \mu } \right)}^2}}}{{2{\sigma ^2}}}} \right)} \\ = \left. { - \frac{{{\sigma ^2}}}{{\sqrt {2\pi } \sigma }}g\left( X \right)\exp \left( { - \frac{{{{\left( {X - \mu } \right)}^2}}}{{2{\sigma ^2}}}} \right)} \right|_{ - \infty }^{ + \infty } + \frac{{{\sigma ^2}}}{{\sqrt {2\pi } \sigma }}\int {g'\left( X \right)\exp \left( { - \frac{{{{\left( {X - \mu } \right)}^2}}}{{2{\sigma ^2}}}} \right)} dX\\ \overset{a}{=} {\sigma ^2}\int {g'\left( X \right){\cal N}\left( {\mu ,{\sigma ^2}} \right)} dX\\ = {\mathbb E}\left( {g'\left( X \right)} \right) \end{array}$$ a处的等号由有界性得到。在${\bf{g}}\left( {\bf{X}} \right):{{\mathbb R}^M} \to {{\mathbb R}^N}$,${\bf{X}}\sim{\cal N}\left( {{\bf{\mu }},{\bf{\Sigma }}} \right)$时,其矢量表达为 $${\mathbb E}\left( {\left( {{\bf{X}} - {\bf{\mu }}} \right)} {\bf{g}}{{\left( {\bf{X}} \right)^T}}\right) = {\bf{\Sigma }}{\mathbb E}\left( {\nabla {\bf{g}}{{\left( {\bf{X}} \right)}}} \right)$$ 其中$\nabla \left(  {\bf g}  \right) = \left[ {\begin{array}{} {\frac{{\partial {g_1}}}{{\partial {x_1}}}}& \cdots &{\frac{{\partial {g_1}}}{{\partial {x_M}}}}\\ \vdots & \ddots & \vdots \\ {\frac{{\partial {g_N}}}{{\partial {x_1}}}}& \cdots &{\frac{{\partial {g_N}}}{{\partial {x_M}}}} \end{array}} \right]^T$. 下面对$g_i$分量进行证明 $$\begin{array}{l} {\mathbb E}\left( {{g_i}\left( {\bf{X}} \right)\left( {{\bf{X}} - {\bf{\mu }}} \right)} \right) = \int {{g_i}\left( {\bf{X}} \right)\left( {{\bf{X}} - {\bf{\mu }}} \right){\cal N}\left( {{\bf{\mu }},{\sigma ^2}} \right)d{\bf{X}}} \\ = K{\bf{\Sigma }}\int {{g_i}\left( {\bf{X}} \right)d{\cal N}\left( {{\bf{\mu }},{\sigma ^2}} \right)} \\ =  - K{\bf{\Sigma }}{g_i}\left( {\bf{X}} \right)\left( {{\bf{X}} - {\bf{\mu }}} \right){\cal N}\left.\left( {{\bf{\mu }},{\sigma ^2}} \right)\right|_{ - \infty }^{ + \infty } + K{\bf{\Sigma }}\int {\nabla {g_i}\left( {\bf{X}} \right){\cal N}\left( {{\bf{\mu }},{\sigma ^2}} \right)d{\bf{X}}} \\ = {\bf{\Sigma }}{\mathbb E}\left( {\nabla {g_i}\left( {\bf{X}} \right)} \right) \end{array}$$ 此处$\nabla {g_i}\left( {\bf{X}} \right) = {\left[ {\begin{array}{} {\frac{{\partial {g_i}\left( {\bf{X}} \right)}}{{\partial {x_j}}}}& \cdots &{\frac{{\partial {g_i}\left( {\bf{X}} \right)}}{{\partial {x_M}}}} \end{array}} \right]^T}$,$K$为常数(如果导数方向不一样需要操作下)。针对输入输出等长的情况,取其迹可得内积形式: $${\mathbb E}\left( {{{\left( {{\bf{X}} - {\bf{\mu }}} \right)}^T}{\bf{g}}\left( {\bf{X}} \right)} \right) = \sum\limits_{j = 1}^N {\sum\limits_{k = 1}^N {\sigma _{jk}^2{\mathbb E}\left( {\frac{{\partial {g_k}}}{{\partial {x_j}}}} \right)} }$$ 若协方差为单位阵的标量倍(这里为${\sigma ^2}$),则还可以继续简化为散度 $${\mathbb E}\left( {{{\left( {{\bf{X}} - {\bf{\mu }}} \right)}^T}{\bf{g}}\left( {\bf{X}} \right)} \right) = {\sigma ^2}{\mathbb E}\left(div\left( {{\bf{g}}\left( {\bf{X}} \right)} \right)\right)$$

MMSE估计中的结论

仔细观察这个引理,若将${\bf{g}}\left( {\bf{X}} \right)$视为某个去噪器,其输出便是使用先验约束或其他一些人为设定的约束进行去噪的结果。而输入与输出之间的显性相关性,便由输入的协方差以及去噪器的本身特性决定,这是非常显然的。而在MMSE去噪器,即${\bf{g}}\left( {\bf{r}} \right) = {\mathbb E}\left( {{\bf{x}}|{\bf{r}}} \right) = \int {\frac{1}{Z}{\bf{x}}p\left( {\bf{x}} \right){\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)d{\bf{x}}}$的情况下,其中${\bf r}={\bf x}+{\bf n}_r$, ${\bf n}_r\sim{{\cal N}\left( {0 ,{\sigma_r ^2}{\bf I}} \right)}$,还有更进一步的结果: $$\begin{array}{l} \frac{{\partial {\bf{g}}\left( {\bf{r}} \right)}}{{\partial {r_k}}} = \int {\frac{1}{Z}{\bf{x}}p\left( {\bf{x}} \right)\frac{{\partial {\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)}}{{\partial {r_k}}}d{\bf{x}}} \\ = \int {\frac{{{x_k} - {r_k}}}{{\sigma _r^2}}\frac{1}{Z}{\bf{x}}p\left( {\bf{x}} \right){\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)d{\bf{x}}}  + \int {\frac{{\partial \frac{1}{Z}}}{{\partial {r_k}}}{\bf{x}}p\left( {\bf{x}} \right){\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)d{\bf{x}}} \\ = \frac{1}{{\sigma _r^2}}\left( {{\mathbb E}\left( {{\bf{x}}\left( {{x_k} - {r_k}} \right)|{\bf{r}}} \right) - {\mathbb E}\left( {{\bf{x}}|{\bf{r}}} \right)\left( {{\mathbb E}\left( {{x_k}|{\bf{r}}} \right) - {r_k}} \right)} \right)\\ = \frac{1}{{\sigma _r^2}}\left( {{\mathbb E}\left( {{\bf{x}}{x_k}|{\bf{r}}} \right) - {\mathbb E}\left( {{\bf{x}}|{\bf{r}}} \right){\mathbb E}\left( {{x_k}|{\bf{r}}} \right)} \right) \end{array}$$ 其中 $$\begin{array}{l} \frac{{\partial \frac{1}{Z}}}{{\partial {r_k}}} =  - \frac{1}{{{Z^2}}}\int {p\left( {\bf{x}} \right)\frac{{\partial {\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)}}{{\partial {r_k}}}d{\bf{x}}} \\ =  - \frac{1}{{{Z^2}}}\int {p\left( {\bf{x}} \right)\frac{{{x_k} - {r_k}}}{{\sigma _r^2}}{\cal N}\left( {{\bf{x}};{\bf{r}},\sigma _r^2{\bf{I}}} \right)d{\bf{x}}} \\ =  - \frac{1}{{\sigma _r^2Z}}\left( {{\mathbb E}\left( {{x_k}|{\bf{r}}} \right) - {r_k}} \right) \end{array}$$ 而写成矢量形式后,有 $$\frac{{\partial {\bf{g}}\left( {\bf{r}} \right)}}{{\partial {\bf{r}}}} = \frac{1}{{\sigma _r^2}}\left( {{\mathbb E}\left( {{\bf{x}}{{\bf{x}}^T}|{\bf{r}}} \right) - {\mathbb E}\left( {{\bf{x}}|{\bf{r}}} \right){\mathbb E}\left( {{{\bf{x}}^T}|{\bf{r}}} \right)} \right)$$ 这正是MMSE估计的协方差矩阵。若取散度,则得到 $$div\left( {{\bf{g}}\left( {\bf{r}} \right)} \right) = \frac{1}{{\sigma _r^2}}\sum\limits_{i = 1}^N {Var\left( {{x_i}|{\bf{r}}} \right)}$$ 这与OAMP/VAMP的结论相符。 实际上,从消息传递的角度上看,AMP/OAMP/VAMP中的Onsager项(补偿),来自于泰勒展开的一阶导数(之前博客里面有),而在很特殊的估计器(这里是MMSE估计器)下,其会退化为方差,这才有常见的闭形式的AMP/OAMP/VAMP的Onsager项。而从Stein's lemma的角度看,Onsager项中的一阶导数,实际上是一个“相关”项,其描述输出与输入噪声的相关性,而后在AMP/OAMP/VAMP中将其去掉,以获得更好的估计效果。

复数情况下有空再写。。。

《Stein’s lemma(斯坦引理)与MMSE估计》留言数:0

发表留言