September 10, 2025

Lagrange Engineering Update: August 2025

In August, Lagrange’s engineering and research teams delivered a series of critical refactorings, optimizations, and new capabilities that substantially advanced DeepProve’s performance and portability. Most notably, we achieved the ability to prove full-sequence (1024 token) inference for GPT-2 on the same benchmark hardware used in prior 10-token runs—demonstrating the scalability of our system and positioning DeepProve as the clear performance leader in verifiable machine learning.

Full-Sequence GPT-2 Proofs (1024 tokens)

This month, DeepProve successfully proved full-sequence GPT-2 inference at 1024 tokens. Importantly, this was achieved on the same benchmark machine previously used for 10-token proofs, showcasing the scalability of our proving pipeline.

One of GPT-2’s strengths in the DeepProve context is that inference can be batched into a single proof regardless of the number of forward passes performed. This “one-shot” proving design makes longer sequences more efficient on a per-token basis: the larger the sequence, the higher the throughput.

10 tokens: 0.02 tokens/sec
1024 tokens: 0.5 tokens/sec (25× improvement)

For comparison, zkTorch recently announced support for GPT-style models but reported performance only at very short sequence lengths (1–2 tokens)【paper†L10-L18】. Based on their published numbers, throughput is ~0.001 tokens/sec—placing DeepProve at up to 500× faster.

Upgrade to Latest `scroll/ceno`

We completed a major refactor to rebase DeepProve onto the latest scroll/ceno published version. This introduced breaking changes across the polynomial, sumcheck, and PCS APIs, requiring us to restructure internal components. These changes improved both proving speed and memory consumption, while setting the foundation for more scalable commitment strategies.

Sumcheck API – Expressions can now be constructed symbolically as algebraic objects, with data injected at proving time.
PCS API – A new commitment/opening interface simplified integration and improved efficiency.

Optimized Commitment Structure

Previously, DeepProve committed individually to each polynomial per layer (input, output, chunk tables, etc.), producing multiple Merkle trees per layer – a sequential and memory-heavy process.

With the latest basefold API, we can now perform a single commitment per layer, regardless of the number of polynomials. This change eliminates one of the largest bottlenecks in our proving pipeline.

Proving time: ~2× reduction
Memory use: ~10× reduction (fewer Merkle trees kept in memory)

Memory Management Framework

To make DeepProve portable across devices – from embedded systems to computing clusters – we introduced a multi-tiered cache-based storage management framework. This design allows DeepProve to scale efficiently across constrained devices and distributed environments.

Tensors are now lightweight wrappers pointing either to memory or to disk.
Frequently used tensors are retained in memory; others are evicted to disk.
Disk I/O overhead is negligible compared to inference and proving time on modern hardware.

GPU Inference with Burn

DeepProve requires custom inference logic to support re-quantization and special floating-point arithmetic. Previously, inference ran sequentially on CPU. We have now begun migrating inference modules to GPU via the Burn deep learning library, which provides unified CPU/GPU backends.

~70% of supported layers have already been ported.
Some layers require new GPU kernels (e.g. our custom softmax, which uses LUTs and scaling factors).
Combined with cache management, this migration opens the door to large-scale distributed proving on heterogeneous hardware.

Current Focus

GPU Porting Continue rewriting inference layers in Burn with custom kernels as needed.
Computational Graph Prover Transform prover logic into a graph representation to enable parallel and distributed execution.
Accuracy Measurement Develop robust evaluation for LLM accuracy under DeepProve’s requantization techniques, targeting parity with PyTorch benchmarks.

With full-sequence GPT-2 proving now in production, DeepProve has demonstrated both scalability and performance leadership in verifiable AI. By coupling upstream advances (Scroll’s Ceno), internal refactors, and GPU acceleration efforts, we are pushing toward a framework capable of proving inference for state-of-the-art LLMs at practical speeds across diverse hardware environments.

Recent Updates

7.24.2025

Lagrange Engineering Update: July 2025

ENGINEERING

ECOSYSTEM

FOUNDATION

DOCS

BLOG

Lagrange Engineering Update: August 2025

Full-Sequence GPT-2 Proofs (1024 tokens)

Upgrade to Latest `scroll/ceno`

Optimized Commitment Structure

Memory Management Framework

GPU Inference with Burn

Current Focus

Recent Updates

Sign up for updates

Lagrange Engineering Update: August 2025

Full-Sequence GPT-2 Proofs (1024 tokens)

Upgrade to Latest scroll/ceno

Optimized Commitment Structure

Memory Management Framework

GPU Inference with Burn

Current Focus

Recent Updates

Sign up for updates

Upgrade to Latest `scroll/ceno`