Information geometry aims to apply the techniques of differential geometry to statistics. Often it is useful to think of a family of probability distributions as a statistical manifold. For example, normal Gaussian distributions form a 2-dimensional manifold, parameterised by $(\mu, \sigma)$, mean and standard deviation. On such manifolds there are notions of Riemannian metric, connection, curvature, and so on, of statistical relevance.
More precisely,
Kullback-Leibler information?, or relative entropy, features as a measure of divergence (not quite a metric, because it’s asymmetric), and Fisher information? takes the role of curvature. One useful aspect of information geometry is that it gives a means to prove results about statistical models, simply by considering them as well-behaved geometrical objects. For instance, it’s basically a tautology to say that a manifold is not changing much in the vicinity of points of low curvature, and changing greatly near points of high curvature. Stated more precisely, and then translated back into probabilistic language, this becomes the Cramer-Rao inequality?, that the variance of a parameter estimator is at least the reciprocal of the Fisher information?. (Shalizi)
One of the founders of the subject is Shun-ichi Amari.
For $X$ a measurable space let $S$ be (a subspace of) the space of probability measures on $X$, equipped with the structure of a smooth manifold.
The Fisher metric on $S$ is the Riemannian metric given on two vector fields $v,w \in T S$ by
where $E_s(\cdots)$ denotes the expectation value under the measure $s \in S$ of the function $x \mapsto v(log s)_x w(log s)_x$ on $X$.
For instance (Amari, (2.1)).
A textbook providing the big picture is
Lecture notes include
See also
Hông Vân Lê, Statistical manifolds are statistical models, Journal of Geometry 84(1-2), March 2006, pp. 83-93.
Blog post.
A brief introduction with more references is