
MIT's CompreSSM Uses Control Theory to Compress AI Models During Training, Achieving 4x Speedups
A new technique from MIT CSAIL applies Hankel singular values to identify and remove unnecessary model components while training is still underway.
Researchers at MIT CSAIL, in collaboration with the Max Planck Institute, ETH Zurich, and Liquid AI, have developed a technique called CompreSSM that can compress AI models during training rather than after — a shift that could significantly reduce the time and compute required to produce efficient models.
How It Works
CompreSSM borrows a concept from control theory called Hankel singular values. In control systems engineering, these values measure how important each component of a dynamic system is to the overall input-output behavior. The MIT team applied this mathematical framework to AI models, using Hankel singular values to identify which components of a model are contributing meaningfully to its outputs — and which can be safely removed.
The key innovation is that this analysis happens during training, not after. Traditional model compression techniques train a full-sized model first and then prune or distill it into a smaller version. CompreSSM integrates compression into the training process itself, which means smaller, faster models can be produced without first investing the full compute budget of a large model.
Results
On the Mamba architecture (a state-space model alternative to transformers), CompreSSM achieved:
- Approximately 4x training speedup
- Compression from 128 dimensions to roughly 12 dimensions while maintaining functional equivalence
- On image classification tasks: near-equivalent accuracy at 1.5x faster training
- 40x faster than Hankel nuclear norm regularization, the previous best approach
Lead author Makram Chahine and co-author Daniela Rus (director of MIT CSAIL) published the findings on April 9, 2026.
Why It Matters
The cost of training frontier AI models has been rising exponentially, with the largest models now requiring compute budgets measured in hundreds of millions of dollars. Any technique that reduces training time translates directly into cost savings — and CompreSSM's 4x speedup, if it generalizes to larger models and different architectures, could meaningfully change the economics of AI development.
The approach is also significant because it applies to state-space models like Mamba, which are emerging as potential alternatives to the transformer architecture that dominates current AI systems. As the field explores post-transformer architectures, efficient training techniques specific to these new designs become increasingly valuable.
Limitations and Next Steps
CompreSSM has been demonstrated primarily on state-space models, and its applicability to transformer-based architectures — which power the majority of current frontier models — remains to be shown. The technique's effectiveness at the scale of frontier models (hundreds of billions of parameters) is also untested, though the mathematical principles are architecture-agnostic.
The research represents part of a broader effort at MIT and its collaborators to make AI development more computationally efficient — an increasingly important goal as the gap between the compute haves and have-nots continues to widen.
Newsletter
Get Lanceum in your inbox
Weekly insights on AI and technology in Asia.


