Continuous Thought Machines

Continuous Thought Machines are a new architecture where Temporal Dynamics are put at the forefront of the training process.

Intuition

Neuronal Synchronization

Given that we want the model’s relationship with data to not only depend on a single snapshot of the network state, but to also build complex temporal dynamics through which time dependency can emerge as a variable in the latent space.

We do this by first collecting all the post activations $z^{t}$ into a “post activation history”:

Z^{t} = [z^{1} z^{2} \dots z^{t}] \in R^{D \times t}

It is important to note that the size of $Z^{t}$ is not fixed, and is instead dependent on the current tick $t$ .

We now define neuronal synchronization as the matrix yielded by the inner dot product between post-activation histories:

S^{t} = Z^{t} \cdot (Z^{t})^{⊺} \in R^{D \times D}

However, it is important to note that this operation scales very poorly( $O (D^{2})$ ), so instead we will use random sampling to choose these pairs. We define these as $D_{o u t}$ and $D_{a c t i o n} (i, j)$ as pairs from $S$ , yielding two synchronization representation pairs:

S_{o u t}^{t} \in R^{D_{o u t}} and S_{a c t i o n}^{t} \in R^{D_{a c t i o n}}

We then project $S_{o u t}^{t}$ onto an output space as:

y^{t} = W_{o u t} \cdot S_{o u t}^{t}

We can then compute neuronal synchronization by first getting the $Q, K, V$ matrices from the $F e a t u re E x t r a c t or$

o^{t} = A tt e n t i o n (Q = q^{t}, K V = F e a t u re E x t r a c t or (d a t a))

Graph View

Continuous Thought Machines

Intuition

Neuronal Synchronization

Architecture

Synapse Model

Neuron Level Models

Synchronisation

Backlinks