Measure and probability theory
This article is under construction.
If is a probability space, the conditional expectation of a (measurable) random variable with respect to some sub--algebra is some measurable random variable which is a ”coarsened” version of . We can think of as a random variable with the same domain but which is measured with a sigma algebra containing only restricted information on the original event since to some events in has been assigned probability or in a consistent way.
Conditional expectation relative to a random variable
Let be a probability space, let be a measurable function into a measure space equipped with the pushforward measure? induced by , let be a real-valued random variable?.
Then for and there exists a essentially unique (two sets are defined to be equivalent if their difference is a set of measure ) integrable function such that the following diagram commutes:
where . Here ”commutes” shall mean that
(1) is -measurable.
(2) the integrals over and are equal.
In this case is called a version of the conditional expectation of provided .
In more detail (2) is equivalent to that for all we have
(The equivalence of the last two formulas is given since we always have by the substitution rule.)
Note that it does not follow from the preceding definition that the conditional expectation exists. This is a consequence of the Radon-Nikodym theorem as will be shown in the following section. (Note that the argument of the theorem applies to the definition of the conditional expectation by random variables if we consider the pushforward measure? as given by a sub--algebra of the original one. In this sense is a ”coarsened version” of factored by the information (i.e. the -algebra) given by .)
Conditional expectation relative to a sub--algebra
Note that by construction of the pushforward-measure it suffices to define the conditional expectation only for the case where is a sub--algebra.
(Note that we loose information with the notation ; e.g is different from )
is commutative (in our sense) iff
(a) is -measurable
We hence can write the conditional expectation as the equivalence class
An element of this class is also called a version.
exists and is unique almost surely.
is defined a measure on (if ; if not consider the positive part and the negative part of separate and use linearity of the integral). Let be the restriction of to . Then
meaning: for all . This is the condition of the theorem of Radon-Nikodym (the other condition of the theorem that is -finite is satisfied since is a probability measure). The theorem implies that has a density w.r.t which is .
Uniqueness: If and are candidates, by linearity the integral over their difference is zero.
From elementary probability theory we know that .
For we call the conditional probability of provided .
Conditional distribution, Conditional density
Integral kernel, Stochastic kernel
In probability theory and statistics, a stochastic kernel is the transition function of a stochastic process. In a discrete time process with continuous probability distributions, it is the same thing as the kernel of the integral operator that advances the probability density function.
An integral transform is an assignation of the form
where the function of two variables is called integral kernel of the transform .
Let be a measure space, let be a measurable space.
A map satisfying
(1) is measurable
(2) is a probability measure on ,
is called a stochastic kernel or transition kernel (or Markov kernel - which we avoid since it is confusing) from to .
Then induces a function between the classes of measures on and on
If is a probability measure, then so is . The symbol is sometimes written as in optical proximity to a conditional probability.
The stochastic kernel is hence in particular an integral kernel.
In a discrete stochastic process (see below) the transition function is a stochastic kernel (more precisely it is the function induced by a kernel ).
Let be a probability space, let be a measure space, let be a stochastic kernel from to .
is defined a probability measure on which is called coupling. is unique with the property
Let (with the above settings) be -measurable, let be a -dimensional random vector.
Then there exists a stochastic kernel from to such that
and is (a version of) the conditional distribution of provided , i.e.
This theorem says that that (more precisely ) fits in the diagram
In the discrete case, i.e. if and are finite- or enumerable sets, it is possible to reconstruct by just considering one-element sets in and the related probabilities
called transition probabilities encoding assemble to a (perhaps countably infinite) matrix called transition matrix of resp. of . Note that is the probability of the transition of the state (aka. elementary event or one-element event) to the event (which in this case happens to have only one element, too). We have forall .
If is a counting density on , then
is a counting density on .
The conditional expectation plays a defining role in the theory of martingales which are stochastic processes such that the conditional expectation of the next value (provided the previous values) equals the present realized value.
The terminology of stochastic processes is a special interpretation of some aspects of infinitary combinatorics? in terms of probability theory.
Let be a total order (i.e. transitive, antisymmetric, and total).
A stochastic process is a diagram where is the class of random variables such that is a random variable. Often one considers the case where all are equal; in this case is called state space of the process .
If all are equal and the class of -algebras is filtered i.e.
and all are measurable, the process is called adapted process.
For example the natural filtration where gives an adapted process.
In terms of a diagram we have for
and where is the transition probability for the passage from state to state .
An adapted stochastic process with the natural filtration in discrete time is called a martingale if all and .
An adapted stochastic process satisfying
is called a Markow process.
For a Markow process the Chapman-Kolmogorow equation encodes the statement that the transition probabilities of the process form a semigroup.
If in the notation from above is a family of stochastic kernels such that all are probabilities, then is called transition semigroup if