in Maths, Tutorial

The Extended Kalman Filter

This is the third part of the series dedicated to one of the most popular sensor de-noising technique: Kalman filters. This article will explain how to model non-linear processes to improve the filter performance, something known as the Extended Kalman Filter.

You can read all the tutorials in this online course here:


At the end of the previous article, we derived the equations for a Kalman filter able to work with linear models. In a nutshell, this means that we could use such a filter for any signal or quantity which changes over time in a linear fashion. If the assumption that both the measurement and process noises follow a normal distribution, Kalman filters are proven to be optimal.

Let’s recall the current structure of the Kalman filter:

And all of the equations that have been derived so far:


    \begin{equation*}\begin{align*}\hat{x}_0 & & \textup{initial state} \\P_0 &= 1& \textup{initial variance} \\Q & & \textup{process noise variance} \\R & & \textup{measurement noise variance} \\A & & \textup{model coefficient} \\B & & \textup{model offset}\end{align}\end{equation}

Prediction step
How we think the system should evolve, solely based on its model.

    \begin{equation*}\begin{align*}\hat{x}_1^{-} & =  A~\hat{x}_0 + B & \textup{state prediction} \\P_1^{-}            & =  A^2 ~ P_0 +Q & \textup{variance prediction} \\\end{align}\end{equation}

Correction step
The most likely estimation of the system state, integrating the sensor data.

    \begin{equation*}\begin{align*}k_1            & =  \frac{P_1^{-}}{P_0+R}  & \textup{Kalman gain} \\\hat{x}_1 &= \hat{x}_1^{-} \left(1-k_1\right) + z_1 \, k_1 & \textup{state update} \\P_1           & =  \left(1-k_1\right) P_1^{-} & \textup{variance update}\end{align}\end{equation}


    \begin{equation*}\begin{align*}\hat{x}_0 & \leftarrow \hat{x}_1 \\P_0 & \leftarrow P_1\end{align}\end{equation}

You can get a feeling for how the system behaves using the interactive chart below, which gives you the ability to control the amount of noise in the process (Q) and in the measurement (R):


The rest of this article will focus on dismantling one of the strongest limitation of the current derivation: linear models.

We are now ready to fix this introducing the so-called Extended Kalman Filter.

Non-Linear Models

The “magic” of the Kalman lies in a simple idea: both the sensor measurements (z_1) and the best estimate so far (\hat{x}_0) follow a normal distribution, and the joint probability of two normal distributions remains a normal distribution.

This means that in the next time frame the process can be repeated, since the new best estimated is once again a normal distribution.

However, updating the model alters its probability distribution. Allowing any function can break the assumption of its normal distribution. And if that fails, the guarantee of optimality fails as well.

The reason why this does not happen when using a linear model is that a linear combination of two normal distribution is a normal distribution itself. Under those assumptions, the equation for the state prediction respects the constraints that ultimately yields a good result:

(1)   \begin{equation*} x_1 = A~x_0 + B + v\end{equation*}

What this means is that the “vanilla” implementation of the Kalman filter is guaranteed to be optimal only for processes which evolution can be modelled as a line. This is a very strong constraint, as many real-life processes tend to be non-linear.

In reality, what would really make a difference is the ability to use any generic function f\left(\cdot\right) as our model:

(2)   \begin{equation*} x_1 = f\left({x_0} \right)+ v\end{equation*}

While this is not always possible using Kalman filters, there is a variant that can handle non-linear functions, as long as they are differentiable: Extended Kalman filters. Intuitively speaking, a function is differentiable if it can be drawn as a continuous, smooth line.

What an EFK does is finding a linear approximation of the function around its current estimate. So, in a way, even EKFs are still relying on a linear model. The interactive chart below show a sinusoid function; while non-linear, it can be approximated at any given point with a tangent line.


Such approximation can be very accurate if we stay around the tangent point. Translated to a signal, this means that approximating differentiable functions with a tangent line to their current state estimate is a good solution for short time intervals.

Model Linearisation

To do so, the first step is to find a way to “linearise” a model around the current estimate. This means replace the non-linear model f\left(\cdot\right) with its tangent line at \hat{x}_0.

One way way to approach this is to recall the equation of a line in its point-slope form:

(3)   \begin{equation*} y = y_p + m \left(x - x_p\right)\end{equation*}

Such an equation defines a line which passes through the point \left(x_p, y_p\right) and has slope m. In our case, the point we want is \left(\hat{x}_0, f\left({\hat{x}_0}\right)\right) and the slope is given by f'\left({\hat{x}_0}\right).

In order for this to work, it is necessary for the function that models the evolution of the system to be differentiable. This means that its first derivative can be calculated; a property that not all functions have. However, most well-behaved functions are differentiable in the majority of their domain.

โ“ Is there a connection between the slope and the first derivative?

The slope of a line (m) is defined as the ratio between its rise and the run. Given any two distinct points on a line, \left(x_1, y_1\right) and \left(x_2, y_2\right), in fact:

(4)   \begin{equation*} m \overset{\triangle}{=} \frac{y_2-y_1}{x_2-x_1}\end{equation*}

We can think of the first derivative of a function about a point as an extension of the concept of slope, applicable to functions in general. The equation is pretty much the same, with the difference that the interval between the two points must be arbitrarily small. This is why the formal equation for the derivative uses a limit to make such interval as small as possible:

(5)   \begin{equation*} f'\left(x\right) \overset{\triangle}{=} \lim_{\Delta x \to 0} \frac{f\left(x + \Delta x \right)-f\left(x\right)}{\Delta x}\end{equation*}

You can easily see that when the first derivative is calculated on the equation of a line, it converges to its slope. This is ultimately why we can use the value of the first derivative as the slope of the tangent line.

โ“ Is there a connection with the Taylor Series?

The equation of a line presented in the point-slope form might look suspiciously familiar to the ones of you who studied Calculus. That is in fact the expansion of the Taylor series!

The Taylor series is a discrete sum of infinite terms, which can be used to approximate certain functions to an arbitrary degree of precision. Each new term in the series results in a better approximation, so the more terms, the more precisely we can model a function. The Taylor series looks like this:

(6)   \begin{equation*} f\left(a\right) = \sum_{n=0}^{\infty} {\frac{f^{\left(n\right)}\left(a\right) \left(x-a \right )^n}{n!}}\end{equation*}

where f^{\left(n\right)}\left(a\right) is the n-th derivative of the function f\left(\cdot\right) evaluated at a.

Expanding the first two terms of the Taylor series yields the equation of tangent line. This provides the best linear approximation of a function around a point.

Thanks to this linearisation trick, we can now approximate the function f\left(\cdot\right) with its tangent line evaluated at \hat{x}_0:

(7)   \begin{equation*} f\left(x\right) \approx f\left(\hat{x}_0\right) + f'\left(\hat{x}_0\right) \left( x - \hat{x}_0 \right)\end{equation*}

It should be noticed that both f\left(\hat{x}_0\right) and f'\left(\hat{x}_0\right) are actual numbers, and they correspond to the value of the model at \hat{x}_0, and the slope of its tangent line at that point. So, regardless of the complexity of f\left(\cdot\right), this approximation is a linear function.

We can now rearrange the terms in (7) to better reveal its linear nature, in a form that we have already encountered:

(8)   \begin{equation*} \begin{align}f\left(x\right)  \approx & f\left(\hat{x}_0\right) + f'\left(\hat{x}_0\right) \left( x - \hat{x}_0 \right) = \\& \underset{A}{\underbrace{    f'\left(\hat{x}_0\right) }}\, x + \underset{B}{\underbrace{    f\left(\hat{x}_0\right) - f'\left(\hat{x}_0\right)  \, \hat{x}_0}}\end{align}\end{equation*}

We can now replace A and B from the original equations for the prediction and correction steps:

(9)   \begin{equation*} \begin{align}\hat{x}_1^{-} = & \boxed{A} &\hat{x}_0 &+& \boxed{B} &= \\&  \boxed{f'\left(\hat{x}_0\right)} &\hat{x}_0 &+& \boxed{f\left(\hat{x}_0\right) - f'\left(\hat{x}_0\right)  \, \hat{x}_0}&\end{align}\end{equation*}



(10)   \begin{equation*} \begin{align}P_1^{-}= & \boxed{A^2}  &P_0 &+Q &= \\&  \boxed{f'\left(\hat{x}_0\right)^2} & P_0 &+Q &\end{align}\end{equation*}


It is to be noted that in many derivations of the Extended Kalman Filter, you may find \hat{x}_1^{-} still including the original non-linear function f\left(\cdot\right). If it is well approximated by its linear counterpart, this will not really be a problem, and any error will be amortised as a higher process noise (i.e.: less certainty on the evolution of the system).

This derivation is really needed in the calculation of P_1^{-}, which only depends on A (the slope or tangent), and not on the entire function.

When the function is highly non-linear, even EFK can have issues adapting to its temperamental behaviour. In that case, another variant called the Unscented Kalman filter (UKF) finds ample application.

Numerical Estimation

The Extended Kalman Filters relies on the strong assumption that we can model the evolution of the system as a differentiable function. While a system might be evolving in such a way, it does not mean we are immediately able to derive the necessary equations.

This is why Extended Kalman Filters often rely on numerical estimation of the first derivative, rather than using its actual mathematical formulation. This makes such filters able to better handle rapid changes in behaviours, at the cost of a less precise measure overall.

The only thing needed in this case is to simply use the previous two best estimates (\hat{x}_0 and \hat{x}_{-1}) to calculate an approximation of the first derivative as seen in (5).

The interactive chart below shows the evolution of two Extended Kalman Filters. The one on top uses the actual first derivative, while the one on the bottom approximates it numerically.


Further Extensions…

Feedback Loop

The current derivation is for a Kalman filter that is “passive”, in the sense that it does not interact with the system it measures. This is often not the case: in the example of a thermostat, for instance, the observations might be used to determine whether or not to turn heating on or off.

More advanced versions of the Kalman filter also include a control factor (u_n), which can be used to update the evolution of the model based on the action that the filter is taking. Mathematically, this is done by updating the equation that controls the process evolution:

(11)   \begin{equation*} x_1 = A~x_0 + B + C ~ u_1 + v\end{equation*}

In the equation above, C is known as the control-input model, and modulates the contribution of the control factor (u_1) on the process evolution.

For instance, if the Kalman filter detects that temperature is too low, it could trigger a thermostat to turn the heating on. In that case, we could set u_1=1 to indicate that, with C set as a coefficient that indicates how fast the temperature is expected to raise once the heating is on.

This allows for a more complex, yet accurate prediction of the system. As seen before, the control-input model is expected to be linear, or at least differentiable if we are using an Extended Kalman Filter.

Observation Model

For the entire duration of this series we have simply assumed that the sensor would return readings in the same scale of the original process. This is not necessarily the case, especially for electronics sensors which might register a temperature as a current drop across a resistor, rather than degrees.

Traditionally, the “complete” formulation of the Kalman filter includes a factor H, which is known as the observation model. In a nutshell, it allows to remap values sampled from the sensor in the same scale and unit of the process property under examination:

(12)   \begin{equation*} z_1 = H ~ x_1 + w_1\end{equation*}

Extending Into Multiple Dimensions

Right now we have presented a derivation of the Kalman filter that works on scalar quantities, meaning that only works on a single numbers. In reality, however, there might be properties that we want to estimate that are multi-dimensional. In one of its more general formulations, Kalman filters are actually presented in matrix form. Under this new framework, the positions (x_n), the measurements (z_n) and the filter estimates (\hat{x}_n) becomes vectors which can have many elements.

The position of a building, for instance, is likely going to include at least two independent variables: latitude and longitude. One could easily use two separate Kalman filter for both properties, but that is very wasteful because it completely ignores how the two coordinates are connected.

State Prediction

Let’s see a concrete example, imaging a multi-dimensional filter which measures two quantities at the same time, such as latitude and longitude:

    \begin{equation*}\begin{align*}\boldsymbol{x_n} & = \begin{bmatrix} x_n^1 \\ x_n^2 \end{bmatrix} \\\boldsymbol{z_n} & = \begin{bmatrix} z_n^1 \\ z_n^2 \end{bmatrix} \\\boldsymbol{\hat{x}_n} & = \begin{bmatrix} \hat{x}_n^1 \\ \hat{x}_n^2 \end{bmatrix}\end{align}\end{equation}

To avoid confusion, the matrix (or vector) version of a variable is indicated in bold italic (in accordance with the ISO standard).

During the past few articles we have seen how the process evolves over time (1) as a linear combination of the previous state (or as its function, in the case of the Extended Kalman Filter):

(13)   \begin{equation*}x_1 = A~x_0 + B + v\end{equation*}

This can be rethought in terms of matrices as:

(14)   \begin{equation*} \boldsymbol{x_1} = \boldsymbol{A}~\boldsymbol{x_0} + \boldsymbol{B} + \boldsymbol{v}\end{equation*}

The two expressions seem pretty much the same, but they are fundamentally different. Under this new framework, \boldsymbol{A} is a square matrix, and \boldsymbol{B} a vector (2×2 and 2×1, respectively). This simple change allows the components of the new state (\boldsymbol{x_1}) to be combined together.

In the case of a static object which is not expected to move, \boldsymbol{A} is going to be the identity matrix (\mathbb{I}), and \boldsymbol{B} the zero vector:

(15)   \begin{equation*} \boldsymbol{x_1} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} ~\boldsymbol{x_0} + \begin{bmatrix} 0 \\ 0  \end{bmatrix} + \boldsymbol{v} \end{equation*}

โ“ How does matrix multiplication work?

Recalling how the the matrix multiplication work, you can see that multiplying a 2×2 matrix by a 2×1 vector yields a 2×1 vector:

(16)   \begin{equation*} \begin{bmatrix}a & b \\c & d\end{bmatrix} \begin{bmatrix}x \\ y\end{bmatrix}= \begin{bmatrix}ax + by \\cx + dy\end{bmatrix}\end{equation*}

One important fact to remember is that matrix multiplication is not commutative: this means that A x is not the same as x A. In fact, inverting the order of the terms would result in a 2×2 matrix, not a 2×1 vector:

(17)   \begin{equation*}\begin{bmatrix}x \\ y\end{bmatrix} \begin{bmatrix}a & b \\c & d\end{bmatrix}= \begin{bmatrix}ax + cx & bx+dx \\ay + cy & by+dy\end{bmatrix}\end{equation*}

It should also be noticed that the matrix multiplication is nothing more than a specific way to linearly combine the components of a vector. This means that, as long as the individual component are normally distributed, they will remain normally distributed after the multiplication. In fact, the property of being normally distributed is preserved by linear combination.

Extended Kalman Filter

Equation (15) expresses the state prediction step in its matrix form. However we have seen how the Extended Kalman Filter supports not just linear combinations, but any differentiable function.

Things get a bit more complex when we have to calculate the first derivative of such function f\left(\cdot\right) in a way that is compatible with our matrix expressions. Loosely speaking, the extension of the derivative in multiple dimensions is known as a Jacobian. This is a matrix which elements are the partial derivative of a given function, calculated with respect to each dimension of the system:

(18)   \begin{equation*} \boldsymbol{F}\overset{\triangle}{=}\frac{\partial f }{\partial x}\end{equation*}

For instance:

(19)   \begin{equation*} \boldsymbol{F} = \begin{bmatrix} \frac{\partial f }{\partial x_n^1}  \\ \frac{\partial f }{\partial x_n^2} \end{bmatrix} \end{equation*}

State Update

Due to the fact that matrix multiplication is non-commutative, we should be very careful in how terms are rearranged. For instance:

(20)   \begin{equation*} \boldsymbol{\hat{x}_1} = \left(\mathbb{I}-\boldsymbol{k_1}\right) \, \boldsymbol{\hat{x}_1^{-}}  + \boldsymbol{k_1} \, \boldsymbol{z_1}\end{equation*}

has to be expressed in this way to ensure that the matrix multiplication yields the correct result:

(21)   \begin{equation*} \underset{2 \times 1}{\boxed{\boldsymbol{\hat{x}_1} }} =\underset{2 \times 1}{\boxed{\underset{2 \times 1}{\boxed{\underset{2 \times 2}{\boxed{ \left(\underset{2 \times 2}{\boxed{\mathbb{I}}}-\underset{2 \times 2}{\boxed{\boldsymbol{k_1}}}\right) }}  \,\underset{2 \times 1}{\boxed{\boldsymbol{\hat{x}_1^{-}}}}}}+\underset{2 \times 1}{\boxed{\underset{2 \times 2}{\boxed{\boldsymbol{k_1}}} \,\underset{2 \times 1}{\boxed{\boldsymbol{z_1}}}}}}}\end{equation*}

Kalman Gain

Even the expression for Kalman gain requires some attention. In fact, scalar division needs to be replace with its matrix counterpart: multiplication by the inverse.

(22)   \begin{equation*}  \boldsymbol{k_1} =  \boldsymbol{P_1^{-}} {\left( {\boldsymbol{P_0}+\boldsymbol{R}} \right)}^{-1}\end{equation*}

Another tricky aspect is that other scalar properties, such as the variance P_n, are replaced by they multi-dimensional analogue: covariance matrices.

๐ŸŽฏ Multivariate Normal Distribution

So far we have explored the concept of normal distribution in one dimension. However, many phenomena are normally distributed, while also being multi-dimensional. For instance, the uncertain position of a building can be thought as a 2D vector, which latitude and longitude components are two separate normal distributions:

If a 1D normal distribution is visualised as a bell, its 2D equivalent looks like a cloud. If the cloud is somewhat “squished” in an oblique direction, it means that the two components of the distribution are somewhat correlated.

The variance of a multivariate distribution takes the form of a covariance matrix, which indicates the variance across all axis combinations:

(23)   \begin{equation*}\begin{bmatrix}{cov}\left(X\right) & {cov}\left({X,Y}\right) \\{cov}\left({Y,X}\right) & {cov}\left({Y}\right)\end{bmatrix}\end{equation*}


We have finally concluded the theoretical overview of the Kalman Filter, along with some of its many variants and evolutions.


    \begin{equation*}\begin{align*}\hat{x}_0 & & \textup{initial state} &&   \underset{d \times 1}{\boldsymbol{\hat{x}_0}} \\P_0 &= 1& \textup{initial variance} &&  \underset{d \times d}{\boldsymbol{P_0}} =\mathbb{I}  \\Q & & \textup{process noise variance} &&  \underset{d \times d}{\boldsymbol{Q}} \\R & & \textup{measurement noise variance}  &&  \underset{d \times d}{\boldsymbol{R}} \\f & & \textup{function} &&  f \\f' & & \textup{first derivative} &&  \underset{d \times 1}{\boldsymbol{F}}=\frac{\partial f }{\partial x}\end{align}\end{equation}

Prediction step

    \begin{equation*}\begin{align*}& \textup{Scalar} & & &  & \textup{Matrix} \\\hat{x}_1^{-} & =  f\left(\hat{x}_0\right) & \textup{state prediction} & &  \boldsymbol{\hat{x}_1^{-}} & =  f\left(\boldsymbol{\hat{x}_0} \right) \\P_1^{-}            & =  f'\left(P_0\right)^2 +Q & \textup{variance prediction}  & & \boldsymbol{P_1^{-}}            & =  \boldsymbol{F} ~ \boldsymbol{P_0} ~ \boldsymbol{F^T} + \boldsymbol{Q}            \\\end{align}\end{equation}

Correction step

    \begin{equation*}\begin{align*}& \textup{Scalar} & & &  & \textup{Matrix} \\k_1            & =  \frac{P_1^{-}}{P_0+R}  & \textup{Kalman gain} & & \boldsymbol{k_1} & =  \boldsymbol{P_1^{-}} {\left( {\boldsymbol{P_0}+\boldsymbol{R}} \right)}^{-1}\\\hat{x}_1 &= \hat{x}_1^{-} \left(1-k_1\right) + z_1 \, k_1 & \textup{state update} & & \boldsymbol{\hat{x}_1} &= \left(\mathbb{I}-\boldsymbol{k_1}\right) \, \boldsymbol{\hat{x}_1^{-}}  + \boldsymbol{k_1} \, \boldsymbol{z_1} \\P_1           & =  \left(1-k_1\right) P_1^{-} & \textup{variance update} & & \boldsymbol{P_1}   & =  \left(\mathbb{I}-\boldsymbol{k_1}\right) \boldsymbol{P_1^{-}}\end{align}\end{equation}

What’s Next…

You can read all the tutorials in this online course here:

The next and final part of this series will focus on a simple, efficient and effective implementation of the Kalman filter in C#.

Further Readings

๐Ÿ’– Support this blog

This website exists thanks to the contribution of patrons on Patreon. If you think these posts have either helped or inspired you, please consider supporting this blog.

Patreon Patreon_button

๐Ÿ“ง Stay updated

You will be notified when a new tutorial is released!

๐Ÿ“ Licensing

You are free to use, adapt and build upon this tutorial for your own projects (even commercially) as long as you credit me.

You are not allowed to redistribute the content of this tutorial on other platforms, especially the parts that are only available on Patreon.

If the knowledge you have gained had a significant impact on your project, a mention in the credit would be very appreciated. โค๏ธ๐Ÿง”๐Ÿป

Write a Comment



  • Kalman Filters: From Theory to Implementation - Alan Zucconi December 29, 2023

    […] 4: The Extended Kalman Filter: Non-Linear […]

  • The Mathematics of the Kalman Filter - Alan Zucconi December 29, 2023

    […] 4: The Extended Kalman Filter: Non-linear […]