MIT's CompreSSM Uses Control Theory to Compress AI Models During Training, Achieving 4x Speedups

Researchers at MIT CSAIL, in collaboration with the Max Planck Institute, ETH Zurich, and Liquid AI, have developed a technique called CompreSSM that can compress AI models during training rather than after — a shift that could significantly reduce the time and compute required to produce efficient models.

How It Works

CompreSSM borrows a concept from control theory called Hankel singular values. In control systems engineering, these values measure how important each component of a dynamic system is to the overall input-output behavior. The MIT team applied this mathematical framework to AI models, using Hankel singular values to identify which components of a model are contributing meaningfully to its outputs — and which can be safely removed.

The key innovation is that this analysis happens during training, not after. Traditional model compression techniques train a full-sized model first and then prune or distill it into a smaller version. CompreSSM integrates compression into the training process itself, which means smaller, faster models can be produced without first investing the full compute budget of a large model.

Results

On the Mamba architecture (a state-space model alternative to transformers), CompreSSM achieved:

Approximately 4x training speedup
Compression from 128 dimensions to roughly 12 dimensions while maintaining functional equivalence
On image classification tasks: near-equivalent accuracy at 1.5x faster training
40x faster than Hankel nuclear norm regularization, the previous best approach

Lead author Makram Chahine and co-author Daniela Rus (director of MIT CSAIL) published the findings on April 9, 2026.

Why It Matters

The cost of training frontier AI models has been rising exponentially, with the largest models now requiring compute budgets measured in hundreds of millions of dollars. Any technique that reduces training time translates directly into cost savings — and CompreSSM's 4x speedup, if it generalizes to larger models and different architectures, could meaningfully change the economics of AI development.

The approach is also significant because it applies to state-space models like Mamba, which are emerging as potential alternatives to the transformer architecture that dominates current AI systems. As the field explores post-transformer architectures, efficient training techniques specific to these new designs become increasingly valuable.

Limitations and Next Steps

CompreSSM has been demonstrated primarily on state-space models, and its applicability to transformer-based architectures — which power the majority of current frontier models — remains to be shown. The technique's effectiveness at the scale of frontier models (hundreds of billions of parameters) is also untested, though the mathematical principles are architecture-agnostic.

The research represents part of a broader effort at MIT and its collaborators to make AI development more computationally efficient — an increasingly important goal as the gap between the compute haves and have-nots continues to widen.

MIT's CompreSSM Uses Control Theory to Compress AI Models During Training, Achieving 4x Speedups

How It Works

Results

Why It Matters

Limitations and Next Steps

Get Lanceum in your inbox

More in Research

New AI System Cuts Energy Use by 100x While Boosting Accuracy Through Symbolic Reasoning

Mimic Robotics' Video-Action Model Achieves 10x Sample Efficiency in Robot Learning

Neuromorphic Computers Can Now Solve Physics Simulations Once Reserved for Supercomputers