State Estimates are Distribution Approximations

09 Jun, 2025

I'm writing this because, even after an entire graduate degree, I only recently fully grasped the usage of covariances and states and state estimation filters. Naturally, I understood that covariance was a measure of uncertainty in your state estimate, where the state estimate was the estimate of the state (not sure how else to word that one), but I had a revelation recently when reading a blog about statistics--both are simply different moments approximating the distribution of the object the filter is being applied on. In other words, they are both approximations of the likely complex shape and positioning of the distribution of the states of the object of interest.

Assume we have a state estimate notated ${\hat{x}}_{k}$ and an associated covariance $P_{k}$ approximating a random state $X_{k}$ ,

{\hat{x}}_{k} ≔ 𝔼 [X_{k}]

P_{k} ≔ 𝔼 [(X_{k} - 𝔼 [X_{k}]) (X_{k} - 𝔼 [X_{k}])^{T}]

The expression of $P_{k}$ here is a bit abstruse due to it being generalized in dimensionality, so take the 1-dimensional case,

P_{k} ≔ 𝔼 [(X_{k} - 𝔼 [X_{k}])^{2}] = 𝔼 [X_{k}^{2}] - 𝔼 [X_{k}]^{2}

What this equation is saying is that the variance of the random variable is the mean of the square of its distribution less the square of its mean. Assume, for example, a normal distribution centered at zero. In this case, $𝔼 [X_{k}]^{2} = 0 \to P_{k} = 𝔼 [X_{k}^{2}]$ . Squaring the random variable is analogous to taking the absolute value of the distribution--thereby allowing the expectation to capture the average distance of the (square of the) distribution from its mean.

Moving back to the state estimate ${\hat{x}}_{k}$ , it is simply the mean of the distribution of $X_{k}$ .

Seeing the components of our estimate expressed in terms of expectations augmented my understanding of what a state estimate is--it is an approximation via the first raw $({\hat{x}}_{k})$ and second central $(P_{k})$ moments of the distribution of our state(s). The limitation of only using these first two moments yields the assumption of Gaussian measurements (and therefore states) seen in so many filtering techniques.