next up previous
Next: Maximum-Likelihood (ML) Estimation Up: Technical Background Previous: Random Variables

Statistical Inference

In practice, we only have access to the data that a physical process generates rather than the underlying RVs or PDFs. Statistical inference refers to the process of using observed data to estimate the forms of the PDFs of the RVs, along with any associated parameters, that model the physical processes fairly accurately. The foundations of modern statistical analysis were laid down by Sir Ronald A. Fisher in the early 1900s.

In the statistical-inference terminology, a population is the set of elements about which we want to infer. A sample is a subset of the population that is actually observed. Thus, the goal is to learn about the statistical characteristics of the population from the sample data. Let us consider an RV $X$, with the associated PDF $P(X)$, that models some physical process and produces a set of $n$ independent observations $\{ x_1, x_2, \ldots, x_n \}$. The goal is to infer some properties of $X$ from its observations. For instance, knowing that $P(X)$ was of a Gaussian form, we may want to determine the exact value for its mean and variance parameters such that the observed data best conform with the specific Gaussian model. We can consider each observation $x_i$ as the value of an RV $X_i$. Such a set of RVs ${\bf X} = \{ X_1, X_2, \ldots, X_n \}$ constitutes a random sample, and comprises a set of mutually independent RVs that are identically distributed:

$\displaystyle \forall i, F_{X_i} (x) = F_X (x).$     (24)

Suppose we want to estimate a particular parameter $\theta$ associated with the PDF of $X$. Here we assume that the data were derived from the PDF $P (X; \theta^*)$. A statistic $\hat \Theta$ is any deterministic function of the random sample and, hence, an RV itself. An estimator is a statistic $\hat \Theta (X_1, X_2, \ldots, X_n)$ that is used to estimate the value of some parameter $\theta$. Some properties of an estimator are highly desirable, e.g.,:

As an example, for an RV $X$, an unbiased and consistent estimator of its mean, or expectation, is the sample mean [167],

$\displaystyle \bar X = \frac {1} {n} \sum_{i=1}^{n} X_i.$     (28)

Another interesting example is that of the empirical CDF of a discrete RV, which is a consistent estimator of the true CDF $F_X (x)$ [167]. The empirical CDF for a discrete RV is
$\displaystyle \hat F (x) = \frac {1} {n} \sum_{i=1}^{n} \bigg( 1 - H (x_i - x) \bigg),$     (29)

where $H (x)$ is the Heaviside step (unit step) function.



Subsections
next up previous
Next: Maximum-Likelihood (ML) Estimation Up: Technical Background Previous: Random Variables
Suyash P. Awate 2007-02-21