Lagrange Engineering Update: August 2025
In August, Lagrange’s engineering and research teams delivered a series of critical refactorings, optimizations, and new capabilities that substantially advanced DeepProve’s performance and portability. Most notably, we achieved the ability to prove full-sequence (1024 token) inference for GPT-2 on the same benchmark hardware used in prior 10-token runs—demonstrating the scalability of our system and positioning DeepProve as the clear performance leader in verifiable machine learning.
Full-Sequence GPT-2 Proofs (1024 tokens)
This month, DeepProve successfully proved full-sequence GPT-2 inference at 1024 tokens. Importantly, this was achieved on the same benchmark machine previously used for 10-token proofs, showcasing the scalability of our proving pipeline.
One of GPT-2’s strengths in the DeepProve context is that inference can be batched into a single proof regardless of the number of forward passes performed. This “one-shot” proving design makes longer sequences more efficient on a per-token basis: the larger the sequence, the higher the throughput.
- 10 tokens: 0.02 tokens/sec
- 1024 tokens: 0.5 tokens/sec (25× improvement)
For comparison, zkTorch recently announced support for GPT-style models but reported performance only at very short sequence lengths (1–2 tokens)【paper†L10-L18】. Based on their published numbers, throughput is ~0.001 tokens/sec—placing DeepProve at up to 500× faster.
Upgrade to Latest scroll/ceno
We completed a major refactor to rebase DeepProve onto the latest scroll/ceno published version. This introduced breaking changes across the polynomial, sumcheck, and PCS APIs, requiring us to restructure internal components. These changes improved both proving speed and memory consumption, while setting the foundation for more scalable commitment strategies.
- Sumcheck API – Expressions can now be constructed symbolically as algebraic objects, with data injected at proving time.
- PCS API – A new commitment/opening interface simplified integration and improved efficiency.
Optimized Commitment Structure
Previously, DeepProve committed individually to each polynomial per layer (input, output, chunk tables, etc.), producing multiple Merkle trees per layer – a sequential and memory-heavy process.
With the latest basefold API, we can now perform a single commitment per layer, regardless of the number of polynomials. This change eliminates one of the largest bottlenecks in our proving pipeline.
- Proving time: ~2× reduction
- Memory use: ~10× reduction (fewer Merkle trees kept in memory)
Memory Management Framework
To make DeepProve portable across devices – from embedded systems to computing clusters – we introduced a multi-tiered cache-based storage management framework. This design allows DeepProve to scale efficiently across constrained devices and distributed environments.
- Tensors are now lightweight wrappers pointing either to memory or to disk.
- Frequently used tensors are retained in memory; others are evicted to disk.
- Disk I/O overhead is negligible compared to inference and proving time on modern hardware.
GPU Inference with Burn
DeepProve requires custom inference logic to support re-quantization and special floating-point arithmetic. Previously, inference ran sequentially on CPU. We have now begun migrating inference modules to GPU via the Burn deep learning library, which provides unified CPU/GPU backends.
- ~70% of supported layers have already been ported.
- Some layers require new GPU kernels (e.g. our custom softmax, which uses LUTs and scaling factors).
- Combined with cache management, this migration opens the door to large-scale distributed proving on heterogeneous hardware.
Current Focus
- GPU Porting Continue rewriting inference layers in Burn with custom kernels as needed.
- Computational Graph Prover Transform prover logic into a graph representation to enable parallel and distributed execution.
- Accuracy Measurement Develop robust evaluation for LLM accuracy under DeepProve’s requantization techniques, targeting parity with PyTorch benchmarks.
With full-sequence GPT-2 proving now in production, DeepProve has demonstrated both scalability and performance leadership in verifiable AI. By coupling upstream advances (Scroll’s Ceno), internal refactors, and GPU acceleration efforts, we are pushing toward a framework capable of proving inference for state-of-the-art LLMs at practical speeds across diverse hardware environments.