<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Research | Mahyar's world 🌏</title><link>https://mahyar-osanlouy.com/category/research/</link><atom:link href="https://mahyar-osanlouy.com/category/research/index.xml" rel="self" type="application/rss+xml"/><description>Research</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Fri, 15 Nov 2024 00:00:00 +0000</lastBuildDate><image><url>https://mahyar-osanlouy.com/media/icon_hu35e4e9c9135f02752aab27d124db531b_75212_512x512_fill_lanczos_center_3.png</url><title>Research</title><link>https://mahyar-osanlouy.com/category/research/</link></image><item><title>Temporal Predictive Coding: A New Framework for Neural Processing of Dynamic Stimuli</title><link>https://mahyar-osanlouy.com/post/temporal-predictive-coding/</link><pubDate>Fri, 15 Nov 2024 00:00:00 +0000</pubDate><guid>https://mahyar-osanlouy.com/post/temporal-predictive-coding/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>One of the most fascinating aspects of the brain is its ability to process and predict dynamic sensory inputs that
continuously change over time. From tracking a moving object to predicting the next note in a melody, our brains are
remarkably adept at temporal prediction. In a recent paper published in PLOS Computational Biology titled
&amp;ldquo;Predictive coding networks for temporal prediction,&amp;rdquo; my colleagues and I proposed a new computational framework that
may help explain how the brain accomplishes this feat.&lt;/p>
&lt;p>Predictive coding has emerged as an influential theoretical model for understanding cortical function. The core idea
is deceptively simple: the brain constantly generates predictions of incoming sensory inputs and compares these
predictions with actual sensory data. Any mismatch results in prediction errors that drive learning and perceptual
processing. This framework has successfully explained many neural phenomena and receptive field properties in visual cortex.&lt;/p>
&lt;p>However, most previous predictive coding models have focused on static inputs, neglecting the temporal dimension that
is crucial for real-world perception. Our work addresses this gap by extending predictive coding to the temporal domain
while maintaining its elegant biological implementation.&lt;/p>
&lt;h2 id="the-temporal-predictive-coding-model">The Temporal Predictive Coding Model&lt;/h2>
&lt;h3 id="generative-model-and-free-energy">Generative Model and Free Energy&lt;/h3>
&lt;p>At the foundation of our temporal predictive coding (tPC) model is a Hidden Markov Model (HMM) structure,
which assumes that observations are generated by hidden states that evolve according to a Markov process.
Mathematically, we can express this generative model as:&lt;/p>
&lt;p>$$
x_k = A f(x_{k-1}) + B u_k + \omega_x
$$&lt;/p>
&lt;p>$$
y_k = C f(x_k) + \omega_y
$$&lt;/p>
&lt;p>Where:&lt;/p>
&lt;ul>
&lt;li>$x_{k}$ is the hidden state at time $k$.&lt;/li>
&lt;li>$y_{k}$ is the observed sensory input at time $k$.&lt;/li>
&lt;li>$u_{k}$ is the control input at time $k$.&lt;/li>
&lt;li>$A$ is the dynamics matrix governing state transitions.&lt;/li>
&lt;li>$B$ is the control matrix.&lt;/li>
&lt;li>$C$ is the observation matrix.&lt;/li>
&lt;li>$f$ is a potentially nonlinear function.&lt;/li>
&lt;li>$\omega_x$ and $\omega_y$ are Gaussian process and observation noise.&lt;/li>
&lt;/ul>
&lt;p>The goal is to infer the current hidden state $x_{k}$ given the current observation $y_{k}$ and previous state estimate
$y_{1:k-1}$. To achieve this, we formulate a variational free energy objective:&lt;/p>
&lt;p>$$
\mathcal{F}&lt;em>k = \frac{1}{2}(y_k - C f(x_k))^T \Sigma_y^{-1} (y_k - C f(x_k)) + \frac{1}{2}(x_k - A f(\hat{x}&lt;/em>{k-1}) - B u_k)^T \Sigma_x^{-1} (x_k - A f(\hat{x}_{k-1}) - B u_k)
$$&lt;/p>
&lt;p>This free energy can be understood as the sum of two weighted prediction errors:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Sensory prediction errors&lt;/strong>: The difference between observed and predicted sensory inputs $y_k - Cf(x_k)$.&lt;/li>
&lt;li>&lt;strong>Temporal prediction errors&lt;/strong>: The difference between the current state and the prediction from the previous state
$x_k - Af(\hat{x}_{k-1}) - Bu_k$.&lt;/li>
&lt;/ol>
&lt;p>Each prediction error is weighted by the precision (inverse variance) of the corresponding noise distribution,
ensuring that more reliable predictions carry more weight.&lt;/p>
&lt;h2 id="neural-implementation">Neural Implementation&lt;/h2>
&lt;p>A crucial contribution of our work is showing how temporal predictive coding can be implemented in neural circuits
using biologically plausible mechanisms. The neural dynamics for inferring the hidden state follow gradient descent
on the free energy:&lt;/p>
&lt;p>$$
\tau \frac{d x_k}{d t} = -\epsilon_x + f'(x_k) \odot C^T \epsilon_y
$$&lt;/p>
&lt;p>Where $\epsilon_x$ and $\epsilon_y$ are precision-weighted prediction errors:&lt;/p>
&lt;p>$$
\epsilon_y = \Sigma_y^{-1} \left( y_k - C f(x_k) \right)
$$&lt;/p>
&lt;p>$$
\epsilon_x = \Sigma_x^{-1} \left( x_k - A f(\hat{x}_{k-1}) - B u_k \right)
$$&lt;/p>
&lt;h3 id="we-proposed-multiple-neural-circuit-implementations-of-this-model">We proposed multiple neural circuit implementations of this model:&lt;/h3>
&lt;ol>
&lt;li>Network with explicit prediction error neurons: Where dedicated neurons represent prediction errors at each level of processing&lt;/li>
&lt;li>Dendritic computing implementation: Where prediction errors are computed as differences between somatic and dendritic potentials&lt;/li>
&lt;li>Single-iteration implementation: A simplified version that performs single updates per time step&lt;/li>
&lt;/ol>
&lt;h3 id="neural-circuit-implementation">Neural circuit implementation&lt;/h3>
&lt;p>Importantly, all of these implementations rely on local information and Hebbian plasticity.
The synaptic weights are updated according to:&lt;/p>
&lt;p>$$
\Delta A = \eta \epsilon_x f(\hat{x}_{k-1})^T
$$&lt;/p>
&lt;p>$$
\Delta B = \eta \epsilon_x u_k^T
$$&lt;/p>
&lt;p>$$
\Delta C = \eta \epsilon_y f(x_k)^T
$$&lt;/p>
&lt;p>These update rules are Hebbian in nature because they depend only on the activities of pre- and post-synaptic neurons,
making them biologically plausible.&lt;/p>
&lt;h2 id="relationship-to-kalman-filtering">Relationship to Kalman Filtering&lt;/h2>
&lt;p>An intriguing property of our model is its relationship to the Kalman filter, which is the optimal solution for
linear Gaussian filtering problems. We demonstrated that both Kalman filtering and temporal predictive coding can be
derived as special cases of Bayesian filtering, with the key difference being how they handle uncertainty.&lt;/p>
&lt;p>The Kalman filter propagates uncertainty estimates through time, tracking the full posterior covariance at each step.
In contrast, tPC approximates this by assuming a point estimate (Dirac distribution) for the previous state. Despite
this simplification, our tPC model achieves comparable performance to the Kalman filter in tracking tasks while being
computationally simpler and more biologically plausible.&lt;/p>
&lt;p>For linear systems, the tPC dynamics at equilibrium yield:&lt;/p>
&lt;p>$$
\hat{x}&lt;em>k^- = A\hat{x}&lt;/em>{k-1} + Bu_k
$$&lt;/p>
&lt;p>$$
\hat{x}_k = \hat{x}_k^- + K(y_k - C\hat{x}_k^-)
$$&lt;/p>
&lt;p>$$
K = \Sigma_x C^T \left[C \Sigma_x C^T + \Sigma_y \right]^{-1}
$$&lt;/p>
&lt;p>This resembles the Kalman filter update equations but with a fixed gain matrix $K$ rather than a dynamically updated one
based on posterior uncertainty.&lt;/p>
&lt;h2 id="experimental-results">Experimental Results&lt;/h2>
&lt;h3 id="performance-in-linear-filtering-tasks">Performance in Linear Filtering Tasks&lt;/h3>
&lt;p>We tested our model on classic tracking problems, where the goal is to infer the hidden state
(position, velocity, acceleration) of an object undergoing unknown acceleration based on noisy observations.
Even with just a few inference steps between observations, tPC achieved performance approaching that of the
optimal Kalman filter.&lt;/p>
&lt;p>A key advantage of our model is its ability to learn the parameters of the generative model (matrices $A$, $B$, and $C$)
using Hebbian plasticity. Even when starting with random matrices, tPC could learn to accurately predict observations.
Interestingly, the model also implicitly encoded noise covariance information in its recurrent connections, without
needing explicit representation of precision matrices.&lt;/p>
&lt;h3 id="motion-sensitive-receptive-fields">Motion-Sensitive Receptive Fields&lt;/h3>
&lt;p>Perhaps most excitingly, when trained on natural movies, our tPC model developed spatiotemporal receptive fields
resembling those observed in the visual cortex. These fields exhibited Gabor-like patterns and direction selectivity,
a hallmark of motion-sensitive neurons in early visual areas.&lt;/p>
&lt;h3 id="nonlinear-extensions">Nonlinear Extensions&lt;/h3>
&lt;p>We extended the model to handle nonlinear dynamics by incorporating nonlinear activation functions. When tested on a
simulated pendulum task, the nonlinear tPC significantly outperformed the linear model, accurately predicting the
pendulum&amp;rsquo;s motion even at extreme angles where nonlinear effects are strongest.&lt;/p>
&lt;h2 id="implications-and-future-directions">Implications and Future Directions&lt;/h2>
&lt;p>Our temporal predictive coding framework has several important implications:&lt;/p>
&lt;ol>
&lt;li>It provides a biologically plausible explanation for how the brain processes dynamic stimuli and performs temporal predictions.&lt;/li>
&lt;li>It demonstrates that complex temporal filtering operations can be implemented in neural circuits using simple, local computations.&lt;/li>
&lt;li>It offers a unified framework that connects normative theories of perception (Bayesian inference) with mechanistic models of neural circuits.&lt;/li>
&lt;li>It suggests that the same computational principles might underlie both static and dynamic sensory processing in the brain.&lt;/li>
&lt;/ol>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The temporal predictive coding model we&amp;rsquo;ve developed bridges an important gap in our understanding of how the brain
processes dynamic sensory inputs. By extending predictive coding to the temporal domain while maintaining its biological
plausibility, our model provides a compelling computational mechanism for temporal prediction in neural circuits.&lt;/p>
&lt;p>The fact that our model develops receptive fields resembling those in the visual cortex and approximates optimal
filtering solutions suggests that temporal predictive coding may indeed capture fundamental principles of neural
computation in the brain. As we continue to refine these models and test them against empirical data, we hope to
gain deeper insights into the remarkable predictive capabilities of the brain.&lt;/p>
&lt;p>&lt;em>This blog is based on the paper
&amp;ldquo;Predictive coding networks for temporal prediction&amp;rdquo; by Beren Millidge, Mufeng Tang, Mahyar Osanlouy, Nicol S. Harper,
and Rafal Bogacz, published in PLOS Computational Biology, April 2024.&lt;/em> &lt;a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011183" target="_blank" rel="noopener">Link to the paper&lt;/a>&lt;/p></description></item><item><title>Kalman Filtering in the Age of PyTorch: State Estimation, Differentiability, and the Philosophy of Uncertainty</title><link>https://mahyar-osanlouy.com/post/kalman-filter/</link><pubDate>Wed, 08 May 2024 00:00:00 +0000</pubDate><guid>https://mahyar-osanlouy.com/post/kalman-filter/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>The Kalman filter, a paragon of recursive estimation, has long stood at the intersection of mathematics, engineering, and epistemology.
Conceived in the 1960s to address the challenges of navigation and control in aerospace, its recursive structure and optimality
under Gaussian assumptions have made it indispensable across robotics, signal processing, finance, and beyond.
Yet, as machine learning frameworks like PyTorch have redefined the computational landscape, the Kalman filter
finds itself in a new context—one where differentiability, GPU acceleration, and integration with deep neural architectures
are not just desirable, but essential.&lt;/p>
&lt;p>In this blog post I want to embark on a dual journey. On one hand, I want to delve into the technicalities of
implementing Kalman filters in PyTorch, leveraging its tensor operations and automatic differentiation to enable
new research and applications.
On the other, I want to reflect on the philosophical questions about the nature of uncertainty, the meaning of optimality,
and the evolving relationship between model-based and data-driven approaches. By weaving together rigorous mathematics,
practical coding insights, and reflective inquiry, we aim to illuminate both the power and the limitations of state estimation
in the age of neural computation.&lt;/p>
&lt;h2 id="the-mathematical-foundations-of-kalman-filtering">The Mathematical Foundations of Kalman Filtering&lt;/h2>
&lt;h3 id="the-state-space-model-dynamics-and-observations">The State-Space Model: Dynamics and Observations&lt;/h3>
&lt;p>At the heart of the Kalman filter lies the state-space model, a mathematical abstraction that describes the evolution of a
system&amp;rsquo;s hidden state over time and its relationship to noisy observations. Formally, the discrete-time linear state-space model is given by:&lt;/p>
&lt;p>$$
\begin{aligned}
x_{k} &amp;amp;= F_{k} x_{k-1} + B_{k} u_{k} + w_{k} \
z_{k} &amp;amp;= H_{k} x_{k} + v_{k}
\end{aligned}
$$&lt;/p>
&lt;p>Where:&lt;/p>
&lt;ul>
&lt;li>$x_{k}$: State vector at time $k$&lt;/li>
&lt;li>$F_{k}$: State transition matrix&lt;/li>
&lt;li>$B_{k}$: Control input matrix&lt;/li>
&lt;li>$u_{k}$: Control vector&lt;/li>
&lt;li>$w_{k}$: Process noise $\sim \mathcal{N}(0,Q_{k})$&lt;/li>
&lt;li>$z_{k}$: Observation vector&lt;/li>
&lt;li>$H_{k}$: Observation matrix&lt;/li>
&lt;li>$v_{k}$: Observation noise $\sim \mathcal{N}(0,R_{k})$&lt;/li>
&lt;/ul>
&lt;p>This model encodes two key assumptions: linearity and Gaussianity. The linearity allows for closed-form recursive updates,
while the Gaussianity ensures that all conditional distributions remain Gaussian, making the mean and covariance sufficient statistics
for the state estimate.&lt;/p>
&lt;h3 id="recursive-estimation-prediction-and-update">Recursive Estimation: Prediction and Update&lt;/h3>
&lt;p>The Kalman filter operates in two alternating steps: prediction (time update) and correction (measurement update).
In the prediction step, the filter projects the current state estimate forward in time, using the system dynamics:&lt;/p>
&lt;p>$$
\begin{aligned}
\hat{x}&lt;em>{k|k-1} = F&lt;/em>{k} \hat{x}&lt;em>{k-1|k-1} + B&lt;/em>{k} u_{k} \
P_{k|k-1} = F_{k} P_{k-1|k-1} F_{k}^{T} + Q_{k}
\end{aligned}
$$&lt;/p>
&lt;p>Here $\hat{x}&lt;em>{k|k-1}$
is the predicted state mean,
and $P&lt;/em>{k|k-1}$
is the predicted state covariance.&lt;/p>
&lt;p>In the update step, the filter incorporates the new measurement $z_{k}$ to refine the state estimate:&lt;/p>
&lt;p>$$
\begin{aligned}
K_{k} &amp;amp;= P_{k|k-1} H_{k}^{T} \left( H_{k} P_{k|k-1} H_{k}^{T} + R_{k} \right)^{-1} \
\hat{x}&lt;em>{k|k} &amp;amp;= \hat{x}&lt;/em>{k|k-1} + K_{k} \left( z_{k} - H_{k} \hat{x}&lt;em>{k|k-1} \right) \
P&lt;/em>{k|k} &amp;amp;= \left( I - K_{k} H_{k} \right) P_{k|k-1}
\end{aligned}
$$&lt;/p>
&lt;p>Where $K_{k}$ is the Kalman gain, which determines how much the measurement should be trusted relative to the prediction.
Its derivation is rooted in the minimization of the mean squared error of the state estimate, balancing the uncertainty in the prediction and the measurement&lt;/p>
&lt;h3 id="the-geometry-of-uncertainty-covariance-propagation">The Geometry of Uncertainty: Covariance Propagation&lt;/h3>
&lt;p>A subtle yet profound aspect of the Kalman filter is its treatment of uncertainty. The covariance matrices $P_{k|k-1}$ and $P_{k|k}$
encode not just the spread of possible states, but also the correlations between different state variables.
The propagation of covariance through the system dynamics involves the transformation:&lt;/p>
&lt;p>$$
P_{k|k-1} = F_{k} P_{k-1|k-1} F_{k}^{T} + Q_{k}
$$&lt;/p>
&lt;p>This operation reflects how uncertainty &amp;ldquo;flows&amp;rdquo; through the linear transformation $F_{k}$, and how process noise $Q_{k}$
injects additional uncertainty. The measurement update, in turn, reduces uncertainty by incorporating
information from the observation, as modulated by the Kalman gain.&lt;/p>
&lt;p>Understanding the covariance as a bilinear form, rather than just a matrix, reveals the deep connection between
the algebra of estimation and the geometry of probability distributions. This perspective is crucial for appreciating
the filter&amp;rsquo;s optimality and for extending it to more complex, nonlinear, or high-dimensional settings.&lt;/p>
&lt;h2 id="kalman-filtering-meets-pytorch-implementation-and-differentiability">Kalman Filtering Meets PyTorch: Implementation and Differentiability&lt;/h2>
&lt;h3 id="why-pytorch-beyond-deep-learning">Why PyTorch? Beyond Deep Learning&lt;/h3>
&lt;p>PyTorch, originally designed for deep learning, offers a flexible tensor computation library with automatic
differentiation and seamless GPU acceleration. While its primary use case has been neural networks,
its capabilities make it an attractive platform for implementing classical algorithms like the Kalman filter.
The motivations are manifold:&lt;/p>
&lt;p>First, PyTorch&amp;rsquo;s tensor operations enable efficient batch processing, which is invaluable when filtering multiple signals
or running ensembles of filters in parallel. Second, the autograd engine allows for differentiable programming, making
it possible to optimize filter parameters or integrate the filter as a module within a larger neural architecture.
Third, PyTorch&amp;rsquo;s ecosystem encourages modularity, extensibility, and integration with probabilistic programming frameworks such as Pyro.&lt;/p>
&lt;h3 id="coding-the-classical-kalman-filter-in-pytorch">Coding the Classical Kalman Filter in PyTorch&lt;/h3>
&lt;p>Implementing the Kalman filter in PyTorch involves translating the recursive equations into tensor operations.
Consider the following minimal implementation for a batch of signals:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-python" data-lang="python">&lt;span style="color:#f92672">import&lt;/span> torch
&lt;span style="color:#f92672">from&lt;/span> torch &lt;span style="color:#f92672">import&lt;/span> nn
&lt;span style="color:#f92672">from&lt;/span> torch.linalg &lt;span style="color:#f92672">import&lt;/span> inv
&lt;span style="color:#66d9ef">class&lt;/span> &lt;span style="color:#a6e22e">KalmanFilter&lt;/span>(nn&lt;span style="color:#f92672">.&lt;/span>Module):
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Kalman Filter implementation for state estimation in linear dynamic systems.
&lt;/span>&lt;span style="color:#e6db74">
&lt;/span>&lt;span style="color:#e6db74"> Attributes:
&lt;/span>&lt;span style="color:#e6db74"> F (Tensor): State transition matrix.
&lt;/span>&lt;span style="color:#e6db74"> B (Tensor): Control input matrix.
&lt;/span>&lt;span style="color:#e6db74"> H (Tensor): Observation matrix.
&lt;/span>&lt;span style="color:#e6db74"> Q (Tensor): Process noise covariance.
&lt;/span>&lt;span style="color:#e6db74"> R (Tensor): Observation noise covariance.
&lt;/span>&lt;span style="color:#e6db74"> state_dim (int): Dimensionality of the state.
&lt;/span>&lt;span style="color:#e6db74"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;span style="color:#66d9ef">def&lt;/span> __init__(self, F, B, H, Q, R, state_dim):
super()&lt;span style="color:#f92672">.&lt;/span>__init__()
self&lt;span style="color:#f92672">.&lt;/span>F &lt;span style="color:#f92672">=&lt;/span> F&lt;span style="color:#f92672">.&lt;/span>clone()
self&lt;span style="color:#f92672">.&lt;/span>B &lt;span style="color:#f92672">=&lt;/span> B&lt;span style="color:#f92672">.&lt;/span>clone()
self&lt;span style="color:#f92672">.&lt;/span>H &lt;span style="color:#f92672">=&lt;/span> H&lt;span style="color:#f92672">.&lt;/span>clone()
self&lt;span style="color:#f92672">.&lt;/span>Q &lt;span style="color:#f92672">=&lt;/span> Q
self&lt;span style="color:#f92672">.&lt;/span>R &lt;span style="color:#f92672">=&lt;/span> R
self&lt;span style="color:#f92672">.&lt;/span>state_dim &lt;span style="color:#f92672">=&lt;/span> state_dim
&lt;span style="color:#75715e"># placeholders for the current state, covariance, observation and control&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>x &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span> &lt;span style="color:#75715e"># [state_dim, 1]&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>P &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span> &lt;span style="color:#75715e"># [state_dim, state_dim]&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>zs &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span> &lt;span style="color:#75715e"># [obs_dim, 1]&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>us &lt;span style="color:#f92672">=&lt;/span> &lt;span style="color:#66d9ef">None&lt;/span> &lt;span style="color:#75715e"># [control_dim, 1]&lt;/span>
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">project&lt;/span>(self):
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Projects the state and covariance forward.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
x_pred &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>F, self&lt;span style="color:#f92672">.&lt;/span>x) &lt;span style="color:#f92672">+&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>B, self&lt;span style="color:#f92672">.&lt;/span>us)
P_pred &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>F, torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>P, self&lt;span style="color:#f92672">.&lt;/span>F&lt;span style="color:#f92672">.&lt;/span>T)) &lt;span style="color:#f92672">+&lt;/span> self&lt;span style="color:#f92672">.&lt;/span>Q
&lt;span style="color:#66d9ef">return&lt;/span> x_pred, P_pred
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">correct&lt;/span>(self, x_pred, P_pred):
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;Corrects the state estimate with the current observation.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
S &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>H, torch&lt;span style="color:#f92672">.&lt;/span>matmul(P_pred, self&lt;span style="color:#f92672">.&lt;/span>H&lt;span style="color:#f92672">.&lt;/span>T)) &lt;span style="color:#f92672">+&lt;/span> self&lt;span style="color:#f92672">.&lt;/span>R
K &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(P_pred, self&lt;span style="color:#f92672">.&lt;/span>H&lt;span style="color:#f92672">.&lt;/span>T) &lt;span style="color:#f92672">@&lt;/span> inv(S)
&lt;span style="color:#75715e"># state update&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>x &lt;span style="color:#f92672">=&lt;/span> x_pred &lt;span style="color:#f92672">+&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(K, (self&lt;span style="color:#f92672">.&lt;/span>zs &lt;span style="color:#f92672">-&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>H, x_pred)))
&lt;span style="color:#75715e"># covariance update&lt;/span>
I &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>eye(self&lt;span style="color:#f92672">.&lt;/span>state_dim, device&lt;span style="color:#f92672">=&lt;/span>P_pred&lt;span style="color:#f92672">.&lt;/span>device)
self&lt;span style="color:#f92672">.&lt;/span>P &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul((I &lt;span style="color:#f92672">-&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(K, self&lt;span style="color:#f92672">.&lt;/span>H)), P_pred)
&lt;span style="color:#66d9ef">def&lt;/span> &lt;span style="color:#a6e22e">forward&lt;/span>(self, zs, us):
&lt;span style="color:#e6db74">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;span style="color:#e6db74"> Processes a batch of observation/control sequences.
&lt;/span>&lt;span style="color:#e6db74">
&lt;/span>&lt;span style="color:#e6db74"> Args:
&lt;/span>&lt;span style="color:#e6db74"> zs: [timesteps, batch, obs_dim] sequence of observations
&lt;/span>&lt;span style="color:#e6db74"> us: [timesteps, batch, control_dim] sequence of control inputs
&lt;/span>&lt;span style="color:#e6db74"> Returns:
&lt;/span>&lt;span style="color:#e6db74"> xs: [batch, state_dim, timesteps] filtered state estimates
&lt;/span>&lt;span style="color:#e6db74"> pred_obs: [batch, obs_dim, timesteps] one-step predictions of observations
&lt;/span>&lt;span style="color:#e6db74"> residuals: [batch, obs_dim, timesteps] observation residuals
&lt;/span>&lt;span style="color:#e6db74"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
xs &lt;span style="color:#f92672">=&lt;/span> []
pred_obs &lt;span style="color:#f92672">=&lt;/span> []
residuals &lt;span style="color:#f92672">=&lt;/span> []
&lt;span style="color:#75715e"># initial state &amp;amp; covariance&lt;/span>
self&lt;span style="color:#f92672">.&lt;/span>x &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>zeros((self&lt;span style="color:#f92672">.&lt;/span>state_dim, &lt;span style="color:#ae81ff">1&lt;/span>), device&lt;span style="color:#f92672">=&lt;/span>zs&lt;span style="color:#f92672">.&lt;/span>device)
self&lt;span style="color:#f92672">.&lt;/span>P &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>eye(self&lt;span style="color:#f92672">.&lt;/span>state_dim, device&lt;span style="color:#f92672">=&lt;/span>zs&lt;span style="color:#f92672">.&lt;/span>device)
&lt;span style="color:#75715e"># iterate over time&lt;/span>
&lt;span style="color:#66d9ef">for&lt;/span> z_t, u_t &lt;span style="color:#f92672">in&lt;/span> zip(zs&lt;span style="color:#f92672">.&lt;/span>transpose(&lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>), us&lt;span style="color:#f92672">.&lt;/span>transpose(&lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#ae81ff">1&lt;/span>)):
self&lt;span style="color:#f92672">.&lt;/span>zs &lt;span style="color:#f92672">=&lt;/span> z_t&lt;span style="color:#f92672">.&lt;/span>unsqueeze(&lt;span style="color:#ae81ff">1&lt;/span>)
self&lt;span style="color:#f92672">.&lt;/span>us &lt;span style="color:#f92672">=&lt;/span> u_t&lt;span style="color:#f92672">.&lt;/span>unsqueeze(&lt;span style="color:#ae81ff">1&lt;/span>)
x_pred, P_pred &lt;span style="color:#f92672">=&lt;/span> self&lt;span style="color:#f92672">.&lt;/span>project()
self&lt;span style="color:#f92672">.&lt;/span>correct(x_pred, P_pred)
xs&lt;span style="color:#f92672">.&lt;/span>append(self&lt;span style="color:#f92672">.&lt;/span>x&lt;span style="color:#f92672">.&lt;/span>detach()&lt;span style="color:#f92672">.&lt;/span>clone())
y_pred &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>matmul(self&lt;span style="color:#f92672">.&lt;/span>H, x_pred)
pred_obs&lt;span style="color:#f92672">.&lt;/span>append(y_pred)
residuals&lt;span style="color:#f92672">.&lt;/span>append(self&lt;span style="color:#f92672">.&lt;/span>zs &lt;span style="color:#f92672">-&lt;/span> y_pred)
xs &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>cat(xs, dim&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>)
pred_obs &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>cat(pred_obs, dim&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>)
residuals &lt;span style="color:#f92672">=&lt;/span> torch&lt;span style="color:#f92672">.&lt;/span>cat(residuals, dim&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#ae81ff">1&lt;/span>)
&lt;span style="color:#66d9ef">return&lt;/span> xs, pred_obs, residuals
&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="differentiable-kalman-filters-learning-and-optimization">Differentiable Kalman Filters: Learning and Optimization&lt;/h2>
&lt;p>One of the most transformative aspects of implementing the Kalman filter in PyTorch is the ability to make the entire
filtering process differentiable. By treating the system matrices ($F$, $H$, $Q$, $R$) as learnable parameters,
one can optimize them using gradient-based methods, either to fit data or to tune the filter for specific tasks.
This approach blurs the line between classical estimation and machine learning, enabling hybrid models that combine
the structure of state-space models with the flexibility of data-driven learning.&lt;/p>
&lt;p>Recent research has focused on improving the efficiency of backpropagation through the Kalman filter.
While PyTorch&amp;rsquo;s automatic differentiation can compute gradients, it may incur significant computational overhead,
especially for large-scale problems. Novel closed-form expressions for the derivatives of the filter&amp;rsquo;s outputs with
respect to its parameters have been developed, offering substantial speed-ups (up to 38 times faster than PyTorch&amp;rsquo;s
autograd in some cases). These advances make it feasible to embed Kalman filters within deep learning pipelines,
trainable end-to-end, and responsive to the demands of modern applications.&lt;/p>
&lt;h2 id="pytorch-libraries-for-kalman-filtering">PyTorch Libraries for Kalman Filtering&lt;/h2>
&lt;p>Several open-source libraries have emerged to facilitate Kalman filtering in PyTorch:&lt;/p>
&lt;ul>
&lt;li>torch-kf: A fast implementation supporting batch filtering and smoothing, capable of running on both CPU and GPU. It is particularly efficient when filtering large batches of signals, leveraging PyTorch&amp;rsquo;s parallelism.&lt;/li>
&lt;li>DeepKalmanFilter: Implements deep variants of the Kalman filter, where neural networks parameterize parts of the state-space model. This enables modeling of nonlinear dynamics and observations, bridging the gap between classical filtering and deep generative models.&lt;/li>
&lt;li>Pyro: A probabilistic programming framework that supports differentiable Kalman filters and extended Kalman filters, with learnable parameters and integration with variational inference.&lt;/li>
&lt;li>torchfilter: Provides advanced filters such as the square-root unscented Kalman filter, supporting both state and parameter estimation in nonlinear systems.&lt;/li>
&lt;/ul>
&lt;h2 id="extensions-and-hybrid-models-beyond-the-classical-filter">Extensions and Hybrid Models: Beyond the Classical Filter&lt;/h2>
&lt;h3 id="nonlinear-and-non-gaussian-filtering">Nonlinear and Non-Gaussian Filtering&lt;/h3>
&lt;p>While the classical Kalman filter assumes linear dynamics and Gaussian noise, many real-world systems violate
these assumptions. Extensions such as the Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF) address
nonlinearities by linearizing the dynamics or propagating sigma points, respectively. Particle filters, in turn,
approximate arbitrary distributions via Monte Carlo sampling.&lt;/p>
&lt;p>Implementing these advanced filters in PyTorch follows the same principles: tensorized operations,
differentiability, and integration with neural modules. For example, the EKF can be implemented by computing
Jacobians using PyTorch&amp;rsquo;s autograd, while the UKF can leverage batched sigma point propagation for efficient parallelism.&lt;/p>
&lt;h3 id="deep-kalman-filters-and-latent-dynamics">Deep Kalman Filters and Latent Dynamics&lt;/h3>
&lt;p>The fusion of Kalman filtering with deep learning has given rise to deep Kalman filters, where neural networks
parameterize the transition and observation functions. This approach enables modeling of complex, nonlinear,
and high-dimensional systems, such as video sequences or sensor fusion in robotics. The deep Kalman filter retains
the probabilistic structure of the classical filter but augments it with the representational power of neural networks.&lt;/p>
&lt;p>In PyTorch, this is achieved by defining neural modules for the transition and observation models,
and using the filtering equations to propagate means and covariances through time. The entire model
can be trained end-to-end using stochastic gradient descent, with the Kalman filter acting as a differentiable
layer within the network.&lt;/p>
&lt;h3 id="hybrid-estimators-neural-networks-and-kalman-filters">Hybrid Estimators: Neural Networks and Kalman Filters&lt;/h3>
&lt;p>Hybrid models that combine neural networks and Kalman filters have demonstrated superior performance in
state estimation tasks, particularly in scenarios with complex dynamics or partial observability.
These models can be categorized into two main types:&lt;/p>
&lt;ul>
&lt;li>NN-KF: Neural networks learn the parameters or functions of the state-space model, which are then used by the Kalman filter for estimation.&lt;/li>
&lt;li>KF-NN: The Kalman filter provides state estimates or uncertainty measures that are used as inputs or features for a neural network.&lt;/li>
&lt;/ul>
&lt;p>Such hybridization leverages the strengths of both approaches: the interpretability and optimality of the Kalman filter,
and the flexibility and expressiveness of neural networks. In PyTorch, these models can be implemented as composite
modules, trained jointly or sequentially, and deployed in a wide range of applications from battery state-of-charge
estimation to autonomous navigation.&lt;/p>
&lt;h2 id="philosophical-reflections-uncertainty-knowledge-and-learning">Philosophical Reflections: Uncertainty, Knowledge, and Learning&lt;/h2>
&lt;h3 id="the-epistemology-of-state-estimation">The Epistemology of State Estimation&lt;/h3>
&lt;p>At a deeper level, the Kalman filter embodies a philosophy of knowledge under uncertainty. It formalizes the process of
updating beliefs in the face of incomplete and noisy information, balancing prior expectations (the model) with new
evidence (the measurements). The recursive structure mirrors the Bayesian paradigm, where beliefs are continuously
revised as new data arrives.&lt;/p>
&lt;p>Yet, the filter&amp;rsquo;s optimality is contingent on its assumptions: linearity, Gaussianity, and known noise covariances.
When these assumptions are violated, as is often the case in complex systems, the filter&amp;rsquo;s estimates may become biased
or inconsistent. This raises fundamental questions: What does it mean to &amp;ldquo;know&amp;rdquo; the state of a system? How do we quantify
and manage uncertainty? Can we trust our models, or must we adapt them in light of new evidence?&lt;/p>
&lt;h3 id="the-fusion-of-model-based-and-data-driven-approaches">The Fusion of Model-Based and Data-Driven Approaches&lt;/h3>
&lt;p>The integration of Kalman filtering with PyTorch and neural networks reflects a broader trend in computational science:
the synthesis of model-based and data-driven approaches. Classical estimation theory offers structure, interpretability,
and guarantees of optimality. Machine learning provides flexibility, scalability, and the ability to discover patterns
from data.&lt;/p>
&lt;p>Hybrid models, differentiable filters, and end-to-end learning challenge the traditional dichotomy between &amp;ldquo;hard-coded&amp;rdquo;
models and &amp;ldquo;black-box&amp;rdquo; learning. They invite us to reconsider the boundaries between theory and data, deduction and
induction, certainty and doubt. In this sense, the Kalman filter is not just an algorithm, but a lens through which to
explore the nature of inference, prediction, and adaptation.&lt;/p>
&lt;h3 id="the-philosophy-of-differentiable-programming">The Philosophy of Differentiable Programming&lt;/h3>
&lt;p>The advent of differentiable programming—where algorithms are designed to be composed, differentiated,
and optimized—raises new philosophical questions. When we make the Kalman filter differentiable, we enable it to
learn from data, to adapt its parameters, and to participate in the broader ecosystem of neural computation.
But we also introduce new forms of uncertainty: about the correctness of gradients, the stability of optimization,
and the interpretability of learned models.&lt;/p>
&lt;p>Is the differentiable Kalman filter still a Kalman filter, or has it become something new? What are the implications of
treating classical algorithms as modules within a deep learning pipeline? How do we balance the desire for optimality
with the need for flexibility? These questions invite ongoing reflection and experimentation.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The Kalman filter, once a symbol of control theory and aerospace engineering, has found new life in the era of PyTorch
and machine learning. Its recursive structure, principled handling of uncertainty, and optimality under Gaussian
assumptions remain as compelling as ever. Yet, its implementation and interpretation are evolving, shaped by the
demands of differentiability, scalability, and integration with neural computation.&lt;/p>
&lt;p>By exploring the mathematical foundations, practical coding strategies, extensions to nonlinear and hybrid models,
and the deeper philosophical questions that arise, we have sought to illuminate both the enduring relevance and
the transformative potential of Kalman filtering in the age of PyTorch. As we continue to blur the boundaries between
model-based and data-driven approaches, the filter serves as a bridge—not just between past and future, but between
certainty and doubt, theory and practice, knowledge and learning.&lt;/p>
&lt;p>The journey of the Kalman filter is far from over. Its recursive dance of prediction and correction, its geometry
of uncertainty, and its adaptability to new computational paradigms ensure that it will remain a central figure in the
ongoing dialogue between mathematics, engineering, and philosophy. Whether as a standalone estimator, a differentiable
module, or a component of a deep generative model, the Kalman filter challenges us to rethink what it means to know,
to predict, and to learn.&lt;/p>
&lt;h2 id="further-reading-and-resources">Further Reading and Resources&lt;/h2>
&lt;p>For those interested in diving deeper, consider exploring the following resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/raphaelreme/torch-kf" target="_blank" rel="noopener">torch-kf&lt;/a>: Fast PyTorch implementation of Kalman filters, supporting batch processing and GPU acceleration.&lt;/li>
&lt;li>&lt;a href="https://github.com/morim3/DeepKalmanFilter" target="_blank" rel="noopener">DeepKalmanFilter&lt;/a>: PyTorch implementation of deep Kalman filters, integrating neural networks with probabilistic state-space models.&lt;/li>
&lt;li>[Pyro Tutorials](&lt;a href="https://pyro.ai/examples/ekf.html" target="_blank" rel="noopener">https://pyro.ai/examples/ekf.html&lt;/a>: Differentiable Kalman and extended Kalman filters with learnable parameters.&lt;/li>
&lt;li>&lt;a href="https://stanford-iprl-lab.github.io/torchfilter/_modules/torchfilter/filters/_square_root_unscented_kalman_filter/" target="_blank" rel="noopener">torchfilter&lt;/a>: Advanced filters including square-root unscented Kalman filter for nonlinear systems.&lt;/li>
&lt;li>Recent Research: &lt;a href="https://stanford-iprl-lab.github.io/torchfilter/_modules/torchfilter/filters/_square_root_unscented_kalman_filter/" target="_blank" rel="noopener">Closed-form gradients for efficient differentiable filtering&lt;/a>,
&lt;a href="https://www.semanticscholar.org/paper/A-review%3A-state-estimation-based-on-hybrid-models-Feng-Li/1f9d96407167c1bb894c4dec60a64bd31c00d1e8" target="_blank" rel="noopener">hybrid models for state estimation&lt;/a>,
and &lt;a href="https://arxiv.org/abs/2010.08196" target="_blank" rel="noopener">practical applications in robotics and sensor fusion&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Why Normalizing Flows (and Tensorizing Flows) deserve more attention</title><link>https://mahyar-osanlouy.com/post/tensorizing-flows/</link><pubDate>Fri, 04 Aug 2023 00:00:00 +0000</pubDate><guid>https://mahyar-osanlouy.com/post/tensorizing-flows/</guid><description>&lt;p>Other generative models like diffusion models and autoregressive LLMs tend to steal the spotlight, since they&amp;rsquo;re great
at producing stunning images or generating text. Normalizing Flows, on the other hand, aren&amp;rsquo;t the first choice for
those headline-grabbing tasks. But if you focus only on sample quality, you might overlook what makes Normalizing Flows
truly valuable.&lt;/p>
&lt;h2 id="why-normalizing-flows-deserve-more-attention">Why Normalizing Flows Deserve More Attention&lt;/h2>
&lt;p>Most generative models are black boxes. GANs, for example, can create high-quality samples, but you can&amp;rsquo;t compute the
likelihood of a given data point. Energy-based models often only give you unnormalized densities, so you can compare
samples but not get an actual probability.&lt;/p>
&lt;p>Normalizing Flows are different. They let you map a simple base distribution (like a Gaussian) through a sequence of
invertible transformations to model complex data. The kicker? You always have access to the exact, normalized probability
density for any sample. This is a huge deal for applications where you need to know the likelihood, not just generate
data.&lt;/p>
&lt;h2 id="the-real-world-use-case-variational-inference">The Real-World Use Case: Variational Inference&lt;/h2>
&lt;p>One area where this property is crucial is Variational Inference (VI). Here, you want to approximate a complex target
distribution with a flexible, normalized family so you can do things like Bayesian inference efficiently.
NFs are a natural fit because you can both sample from them and compute exact densities—something most other models
can&amp;rsquo;t offer.&lt;/p>
&lt;h2 id="but-theres-a-catch">But There&amp;rsquo;s a Catch&amp;hellip;&lt;/h2>
&lt;p>Traditional NFs use a Gaussian as their base distribution. This works fine for unimodal targets, but if your true
distribution is multimodal (think: multiple peaks), NFs tend to &amp;ldquo;collapse&amp;rdquo; to just one mode. This limits their
expressiveness in VI, especially for challenging scientific or physics problems where multimodality is the norm.&lt;/p>
&lt;h2 id="enter-tensorizing-flows">Enter Tensorizing Flows&lt;/h2>
&lt;p>The paper &amp;ldquo;Tensorizing Flows: A Tool for Variational Inference&amp;rdquo; introduces a clever fix: replace the Gaussian base
with a tensor-train (TT) distribution, built using tools from tensor networks. This TT base can already capture much
of the structure (including multimodality) of the target distribution, so the flow only needs to handle the
&amp;ldquo;fine details.&amp;rdquo; The result is a model that&amp;rsquo;s both more expressive and easier to train for high-dimensional,
multimodal problems.&lt;/p>
&lt;h2 id="resources">Resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://arxiv.org/pdf/2305.02460" target="_blank" rel="noopener">Article&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/VincentStimper/normalizing-flows" target="_blank" rel="noopener">NormFlow&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Generalization of predictive coding model to dynamic stimuli</title><link>https://mahyar-osanlouy.com/post/tpc/</link><pubDate>Mon, 26 Apr 2021 00:00:00 +0000</pubDate><guid>https://mahyar-osanlouy.com/post/tpc/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Predictive coding is an established model of perceptual inference and learning in hierarchical networks of the brain.
It describes a network of neuron-like nodes, which can infer stimulus properties from noisy input using only
&lt;em>local computation&lt;/em>, i.e. the changes of activity of each neuron in the model is determined only by its inputs and its
current activity levels. Furthermore, the network encodes the estimated parameters of a probabilistic model from which
the stimuli are generated in its synaptic connections, and learn these parameters employing only
&lt;em>local plasticity&lt;/em>, where the changes in synaptic weights only depend of activities of pre and post-synaptic neurons.
In its original form the predictive coding model assumes static input stimuli. However, most of
the stimuli experienced by animals and humans change in time, and it is critical for survival to efficiently interpret
such stimuli.&lt;/p>
&lt;p>Very soon after developing the predictive coding model, it was pointed that it could be generalized to dynamic stimuli,
and the Kalman filter could be employed to infer the states of hidden variables represented by the model. However that work has not described how such computation could be implemented in a biologically
plausible network of neuron-like nodes. More recently, a generalization of predictive coding to dynamic stimuli has been
proposed, in which different neurons represent not only the hidden variables, but also their temporal derivatives.
Although it is possible to implement this model in a network only employing local computation and local plasticity,
this network requires a very intricate and specific pattern of connectivity between various neurons, and there is no evidence that such connectivity exists in cortical circuits.&lt;/p>
&lt;p>This report outlines a simple generalization of predictive coding model to dynamic stimuli, which does not require more
intricate network than the original predictive coding model. A simulation of the proposed generalizations is shown for
a toy problem, and directions are suggested in which the work on the model needs to be conducted.&lt;/p>
&lt;h2 id="model">Model&lt;/h2>
&lt;h3 id="process-generating-stimuli">Process generating stimuli&lt;/h3>
&lt;p>In this report we assume that stimuli are generated from a very simple linear model, which parallels the assumptions
about signal made by the Kalman filter. Let us denote an observed stimulus at time
$t$ by a vector with elements $y_i(t)$. Let us assume that the stimulus depends on values of hidden variables
denoted by $x_j(t)$ according to:&lt;/p>
&lt;p>$$
y_i(t) = \sum_j w_{i,j} x_j(t) + \epsilon_{y,i}(t)
\label{eq:gen_y}
$$&lt;/p>
&lt;p>In the above equation, $w_{i,j}$ form a matrix of parameters, and $\epsilon_{y,i}(t)$ is a noise process (with zero mean).
Furthermore, let us assume that the hidden variables evolve according to:&lt;/p>
&lt;p>$$
\dot{x}&lt;em>j = \sum_k v&lt;/em>{j,k} x_k(t) + \epsilon_{x,j}(t)
\label{eq:gen_x}
$$&lt;/p>
&lt;p>Analogously as above, $v_{j,k}$ form a matrix of parameters, and $\epsilon_{x,j}(t)$ is a noise process.
A natural way for estimating $x_j$ from $y_i$ is to employ the Kalman filter, but it involves complex equations,
and it is not clear how such computation could be implemented in a network of neurons. Therefore, this report describes
a simpler method for estimating $x_j$ that has a more natural neural implementation.&lt;/p>
&lt;h3 id="computations-in-the-model">Computations in the model&lt;/h3>
&lt;p>Given a observed stimuli $y_i$, we will seek to infer the hidden variables $x_j$ and estimate the parameters $w_{i,j}$
and $v_{j,k}$. In the reminder of Section 2, we will use $x_j$, $w_{i,j}$ and $v_{j,k}$ to denote the estimates of
corresponding terms in Equations \ref{eq:gen_y}. We wish to find $x_j$ such that the stimulus $y_i$ is
close to the predicted value $\sum_j w_{i,j} x_j$. Thus we define error in prediction
of $y_i$ as:&lt;/p>
&lt;p>$$
e_i = y_i - \sum_j w_{i,j} x_j
\label{eq:error_y}
$$&lt;/p>
&lt;p>We wish to minimize a squared sum of these errors which we denote by $E_y = \frac{1}{2} \sum_i \varepsilon_{y,i}^2$.
Hence we change $x_j$ in the direction opposite to the gradient of $E_y$, but we additionally append this dynamics
towards our goal with the natural evolution of $x_j$:&lt;/p>
&lt;p>$$
\dot{x}&lt;em>j = - \frac{\partial E_y}{\partial x_j} + \sum_k v&lt;/em>{j,k} x_k
$$&lt;/p>
&lt;p>Evaluating the gradient, we obtain the equation describing the dynamics of our estimate of hidden variables:&lt;/p>
&lt;p>$$
\dot{x}&lt;em>j = \sum_i w&lt;/em>{i,j} \varepsilon_{y,i} + \sum_k v_{j,k} x_k
$$&lt;/p>
&lt;p>In order to learn parameters $w_{i,j}$, which describe how $y_i$ depends on $x_j$, we modify them to minimize $E_y$:&lt;/p>
&lt;p>$$
\dot{w}&lt;em>{i,j} = - \alpha \frac{\partial E_y}{\partial w&lt;/em>{i,j}} = \alpha \varepsilon_{y,i} x_j
$$&lt;/p>
&lt;p>In the above equation $\alpha$ denotes a learning rate. In order to learn parameters $v_{j,k}$ describing the natural
dynamics of hidden variables, we need to define an error in prediction of this dynamics:&lt;/p>
&lt;p>$$
\varepsilon_{x,j} = \dot{x}&lt;em>j - \sum_k v&lt;/em>{j,k} x_k
$$&lt;/p>
&lt;p>We wish to minimize squared sum of these errors $E_x = \frac{1}{2} \sum_j \varepsilon_{x,j}^2$,
and hence we modify the weights in the direction opposite to the gradient of $E_x$ over $v_{j,k}$:&lt;/p>
&lt;p>$$
\dot{v}&lt;em>{j,k} = \alpha \varepsilon&lt;/em>{x,j} x_k
$$&lt;/p>
&lt;p>In summary, this generalized predictive coding model continuously updates hidden variables and parameters according and
recomputes prediction errors.&lt;/p>
&lt;h3 id="possible-neural-implementations">Possible neural implementations&lt;/h3>
&lt;p>Inference of hidden variables $x_j$ from sensory input $y_i$ can be easily performed in a network shown in Figure 1A.
The bottom layer consists of sensory neurons representing the stimulus. They project to neurons computing prediction
error. These errors are then send to the neurons encoding hidden variables which
change their activity according to Equation \ref{eq:dot_x}. The weights of connections between neurons encoding errors
and hidden variables are symmetric, i.e. equal in both direction. This network has an architecture very similar to
a standard predictive coding model , but additionally includes recurrent connections between the
neurons encoding hidden variables with weights $v_{j,k}$.&lt;/p>
&lt;img src="featured.png" alt="Receptive fields" width="800">
&lt;p>Learning parameters $w_{i,j}$ corresponds to local Hebbian plasticity in the network
of Figure 1, analogously as in the standard predictive coding networks. However,
learning parameters $v_{j,k}$ is less straightforward because the
prediction error $\varepsilon_{x,j}$ is not explicitly represented in activity of any neurons in the network.
Nevertheless, it is possible to construct models in which $\varepsilon_{x,j}$ would be represented in internal
signals (e.g. concentrations of particular ions or proteins) within neurons encoding $x_j$,
and let us consider two such possible models.&lt;/p>
&lt;p>The first model is illustrated in Figure 1B.
In this network, the recurrent inputs from neurons representing hidden variables converge on a separate dendritic
branch, which sums them and thus can compute $\sum_k v_{j,k} x_k$. To compute the error $\varepsilon_{x,j}$,
the neuron would need to compute the difference between change in its activity and the membrane potential in the dendrite.
Since both of these quantities are encoded within the same neuron, it is plausible that such a computation may be performed,
and an error encoded in an internal signal. Such signal could then drive local synaptic plasticity.&lt;/p>
&lt;p>An alternative way of computing prediction errors $\varepsilon_{x,j}$ relies on an observation that by combining
equations describing the dynamics of $\dot{x}&lt;em>j$ adn error $\varepsilon&lt;/em>{x,j}$, we see that these errors are equal to:&lt;/p>
&lt;p>$$
\varepsilon_{x,j} = \sum_i w_{i,j} \varepsilon_{y,i}
$$&lt;/p>
&lt;p>Such input from the previous layer of prediction error neurons could be computed in dendrites shown in
Figure 1C. The membrane potential of such dendrite would need to set level of an internal signal that would govern the
plasticity within the entire neuron. This mechanism could be considered biologically plausible as it is analogous to
observations that high membrane potential of apical dendrites of pyramidal neurons triggers plateau potentials via
calcium influx, leading to a burst of spikes by the neuron. Such bursts of spikes may subsequently
induce synaptic plasticity.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>I tested the model on a simple problem in which hidden variables and stimuli were 2-dimensional.
The hidden variables were generated according to $\dot{x}&lt;em>j = \sum_k v&lt;/em>{j,k} x_k(t) + \epsilon_{x,j}(t)$ with parameters
$v_{j,k}$ set to a rotation matrix visualized in Figure 2C. The stimuli were generated according
to $y_i(t) = \sum_j w_{i,j} x_j(t) + \epsilon_{y,i}(t)$ with parameters $w_{i,j}$ set to the identity matrix,
so that the stimuli were simply noisy versions of the hidden variables. The stimuli are shown in Figure 2A, and they are
noisy periodic signal because parameters $v_{j,k}$ were set to a rotation matrix. The variables and stimuli were generated
with a sampling frequency 10, by solving our equations using Euler method with integration step $0.1$. During each step,
noise with variance of $0.01$ was added.&lt;/p>
&lt;img src="results.png" alt="Receptive fields" width="700">
&lt;p>At the start of the learning process, weights $w_{i,j}$ were initialized to an identity matrix,
while the weights between hidden units were all set to $v_{j,k}=0$. The hidden units were also initialized to $x_j=0$.
The hidden variables and parameters were updated according to our equations above using the Euler method with integration
step of $0.1$, and learning rate set to $\alpha=0.01$.&lt;/p>
&lt;p>Figure 2B shows that as the learning progressed, the error in prediction of stimuli decreased, so the network was able
to better predict the stimuli. Figure 2D visualizes learned values of parameters $v_{j,k}$, which are very close to the
original parameters used to generate the training data (cf. Figure 2C).
Thus the network was able to discover the underlying process generating the stimuli.&lt;/p>
&lt;h2 id="discussion">Discussion&lt;/h2>
&lt;p>This report outlines generalization of predictive coding to dynamic stimuli for linear and shallow generative models,
so more work would be required to extend this to more complex models and relate it with experimental data.
In particular the work can be extended in the following directions:&lt;/p>
&lt;ul>
&lt;li>Introduce the non-linear activation functions to hidden units, and test if the model can learn dynamics of non-linear systems.&lt;/li>
&lt;li>Introduce multiple levels of hierarchy and investigate if the model can extract dynamics of stimuli generated by hierarchical dynamical systems.&lt;/li>
&lt;li>Test the model performance on real world machine learning problems, e.g. prediction of EEG signal from past history.&lt;/li>
&lt;li>Investigate if after training with natural stimuli the receptive fields of neurons in the model have similar properties
to the receptive fields in the visual system, analogously as in neural networks trained with the back-propagation algorithm.&lt;/li>
&lt;/ul></description></item><item><title>Allostasis, Interoception, and the Free Energy Principle</title><link>https://mahyar-osanlouy.com/post/interoception-allostasis/</link><pubDate>Fri, 12 Mar 2021 00:00:00 +0000</pubDate><guid>https://mahyar-osanlouy.com/post/interoception-allostasis/</guid><description>&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>The intersection of biological regulation, predictive processing, and consciousness represents one of the most
fascinating frontiers in cognitive science today. After carefully reading Corcoran and Hohwy&amp;rsquo;s chapter &amp;ldquo;Allostasis,
interoception, and the free energy principle: Feeling our way forward,&amp;rdquo; I&amp;rsquo;m struck by both its ambitious scope and
its meticulous attention to conceptual clarity. This paper attempts to untangle a complex theoretical landscape that
has profound implications for how we understand the relationship between mind, body, and environment.&lt;/p>
&lt;h2 id="the-conceptual-maze-homeostasis-and-allostasis">The Conceptual Maze: Homeostasis and Allostasis&lt;/h2>
&lt;p>At its foundation, this paper addresses a fundamental question: how do biological organisms maintain their viability?
The traditional answer has been homeostasis - the concept developed by Claude Bernard and Walter Cannon emphasizing the
maintenance of stable internal conditions despite external fluctuations. The authors provide an excellent historical
overview of this concept, tracing its development from Bernard&amp;rsquo;s emphasis on the &amp;ldquo;milieu intérieur&amp;rdquo; to Cannon&amp;rsquo;s more
nuanced view of stability involving acceptable ranges rather than fixed setpoints.&lt;/p>
&lt;p>What makes this paper particularly valuable is its careful examination of allostasis - a concept introduced by Sterling
and Eyer in 1988 as &amp;ldquo;stability through change&amp;rdquo;. The authors meticulously document how this concept has evolved in multiple,
sometimes contradictory directions:&lt;/p>
&lt;ol>
&lt;li>Sterling and Eyer&amp;rsquo;s radical position that allostasis should entirely replace homeostasis&lt;/li>
&lt;li>McEwen&amp;rsquo;s view of allostasis as &amp;ldquo;the process for actively maintaining homeostasis&amp;rdquo;&lt;/li>
&lt;li>Schulkin&amp;rsquo;s perspective where homeostasis and allostasis are complementary mechanisms for maintaining biological viability&lt;/li>
&lt;/ol>
&lt;p>This historical excavation reveals something important: allostasis has been a contested concept from the beginning,
with no clear consensus about its precise meaning even 30+ years after its introduction.&lt;/p>
&lt;h2 id="free-energy-and-interoceptive-inference">Free Energy and Interoceptive Inference&lt;/h2>
&lt;p>The paper becomes even more interesting when it examines how these biological regulation concepts have been incorporated
into the free energy principle framework. The authors identify three distinct interpretations of allostasis within
recent free energy-inspired accounts:&lt;/p>
&lt;ol>
&lt;li>Behavioral allostasis: Focuses on behavioral actions on the external world to maintain internal states (Gu &amp;amp; FitzGerald, Seth)&lt;/li>
&lt;li>Teleological allostasis: Positions allostasis as the primary evolutionary design feature of the brain (Barrett and colleagues)&lt;/li>
&lt;li>Diachronic allostasis: Emphasizes allostasis as operating across various timescales (Pezzulo et al., Stephan et al.)&lt;/li>
&lt;/ol>
&lt;p>The authors' critique of the &amp;ldquo;behavioral&amp;rdquo; interpretation is particularly insightful. They point out that despite using
the term &amp;ldquo;allostasis,&amp;rdquo; these accounts describe what is essentially a reactive process - responding to homeostatic
perturbations rather than anticipating them. This seems to miss the core predictive emphasis that has been central
to allostasis from its inception.&lt;/p>
&lt;h2 id="strengths-of-the-analysis">Strengths of the Analysis&lt;/h2>
&lt;p>What I find most impressive about Corcoran and Hohwy&amp;rsquo;s analysis is its conceptual precision.
In a literature full of terminological confusion and competing definitions, they bring much-needed clarity.
Their systematic examination of different interpretations helps untangle what has become a rather messy theoretical landscape.&lt;/p>
&lt;p>The authors are also admirably even-handed in their assessment. While they ultimately favor a view that reconciles
homeostasis and allostasis as complementary strategies, they carefully consider the merits of alternative perspectives.
Their analysis of the &amp;ldquo;diachronic&amp;rdquo; interpretation of allostasis (particularly Stephan&amp;rsquo;s Bayesian implementation of
hierarchical allostatic control) is especially thoughtful.&lt;/p>
&lt;p>I also liked their recognition that &amp;ldquo;sustained biological viability (rather than some other criterion such as
internal stability) seems to us the most plausible target towards which physiological and behavioral regulatory mechanisms
are striving&amp;rdquo;. This shifts the focus from mechanism to purpose in a way that offers a principled resolution to some
of the conceptual tensions.&lt;/p>
&lt;h2 id="questions-for-further-investigation">Questions for Further Investigation&lt;/h2>
&lt;p>Reading this paper has sparked several questions that I believe could be subjects for further investigation:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Developmental Trajectory&lt;/strong>: How do homeostatic and allostatic regulatory mechanisms develop over the lifespan?
Are there critical periods for the development of predictive regulatory capacities?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Individual Differences&lt;/strong>: What accounts for the substantial variability in regulatory strategies across individuals?
Some people seem to rely more on anticipatory regulation, while others show more reactive patterns.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Artificial Systems&lt;/strong>: Could the complementary frameworks of homeostasis and allostasis inform the design of
artificial systems? Might robotic or AI systems benefit from implementing both reactive and anticipatory modes of self-regulation?&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Disorders of Regulation&lt;/strong>: How do disruptions in the relationship between homeostasis and allostasis contribute to
physical and mental health conditions? The concept of &amp;ldquo;allostatic load&amp;rdquo; is mentioned but deserves deeper exploration.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Consciousness and Regulation&lt;/strong>: If predictive regulation is indeed fundamental to biological systems,
what implications does this have for theories of consciousness? Could consciousness itself be understood partly as an extension of these regulatory processes?&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="link-to-the-paper-allostasis-interoception-and-the-free-energy-principle-feeling-our-way-forwardhttpsosfiopreprintspsyarxivzbqnx_v1">Link to the paper: &lt;a href="https://osf.io/preprints/psyarxiv/zbqnx_v1" target="_blank" rel="noopener">Allostasis, interoception, and the free energy principle: Feeling our way forward&lt;/a>&lt;/h4></description></item></channel></rss>