Continuous Thought Machines are a new architecture where Temporal Dynamics are put at the forefront of the training process.
Intuition
Neuronal Synchronization
Given that we want the modelβs relationship with data to not only depend on a single snapshot of the network state, but to also build complex temporal dynamics through which time dependency can emerge as a variable in the latent space.
We do this by first collecting all the post activations into a βpost activation historyβ:
It is important to note that the size of is not fixed, and is instead dependent on the current tick .
We now define neuronal synchronization as the matrix yielded by the inner dot product between post-activation histories:
However, it is important to note that this operation scales very poorly( ), so instead we will use random sampling to choose these pairs. We define these as and as pairs from , yielding two synchronization representation pairs:
We then project onto an output space as:
We can then compute neuronal synchronization by first getting the matrices from the