next up previous
Next: High-Dimensional Density Estimation Up: Nonparametric Density Estimation Previous: Parzen-Window Density Estimation

Parzen-Window Convergence

We see in (2.53) that the kernel-bandwidth parameter $h_n$ can strongly affect the PDF estimate $P(X)$, especially when the number of observations $n$ is finite. Very small $h$ values will produce an irregular spiky $P(X)$, while very large values will excessively smooth out the structure of $P(X)$. For the case of finite data, i.e., finite $n$, the best possible strategy is to aim at a compromise between these two effects. Indeed, in this case, finding optimal values of $h_n$ entails additional constrains or strategies. For instance, the ML estimate yields an optimal $h_n$ value, and this is what we do in practice.

The case of an infinite number of observations, i.e., $n \rightarrow \infty$, is theoretically very interesting. In this case, Parzen proved that it is possible to have the PDF estimate converge to the actual PDF [125,48]. Let us consider $P_n (x)$ to be the estimator of the PDF at a point $x$ derived from a random sample of size $n$. This estimator has a mean $\bar P_n (x)$ and variance $\mathop{\mbox{Var}}(P_n (x))$. The estimator $P_n (x)$ converges in mean square to the true value $P (x)$, i.e.,

$\displaystyle \lim_{n \rightarrow \infty} \bar P_n (x)$ $\textstyle =$ $\displaystyle P (x),$  
$\displaystyle \lim_{n \rightarrow \infty} \mathop{\mbox{Var}}(P_n (x))$ $\textstyle =$ $\displaystyle 0,$ (56)

when all the following conditions hold:
$\displaystyle \sup_{x} K (x)$ $\textstyle <$ $\displaystyle \infty,$  
$\displaystyle \lim_{\vert x \vert \rightarrow \infty} x K (x)$ $\textstyle =$ $\displaystyle 0,$  
$\displaystyle \lim_{n \rightarrow \infty} h_n^d$ $\textstyle =$ $\displaystyle 0, \mathrm { and}$  
$\displaystyle \lim_{n \rightarrow \infty} n h_n^d$ $\textstyle =$ $\displaystyle \infty.$ (57)

Figure 2.6 shows the process of convergence of the Parzen-window PDF, using a Gaussian kernel, to an arbitrary simulated PDF.

Figure 2.6: Convergence of the Parzen-window density estimate. The first row gives the true PDF. (a1)-(a4) show random samples derived from the true PDF: sample sizes progressively increasing by a factor of 100, starting with a sample size of one. (b1)-(b4) and (c1)-(c4) give the Parzen-window PDF estimate ($2$D Gaussian kernel) with progressively decreasing $\sigma $, starting with $\sigma = 2$ and $\sigma = 4$, respectively. Observe that both sequences of the estimated PDFs in (b1)-(b4) and (c1)-(c4) are converging towards the true PDF.
\begin{figure}\oneWidth {Parzen/actualDensity.eps} {0.25}
\threeAcrossLabelsSma...
...} {Parzen/parzenDensity_1000000_0.66667.eps} {(a4)} {(b4)} {(c4)}
\end{figure}


next up previous
Next: High-Dimensional Density Estimation Up: Nonparametric Density Estimation Previous: Parzen-Window Density Estimation
Suyash P. Awate 2007-02-21