Zero-knowledge Deep Learning
What is this?
This project focuses on the development of specialized zero-knowledge proof (ZKP) protocols for deep learning (DL). Unlike the approach of feeding neural networks into off-the-shelf, general-purpose ZKP backends, our method:
- Preserves tensor structures: This preservation allows for the potential parallelization of the proof generation process, which is essential for compressing overhead (especially proving times) to feasible levels.
- Develops specialized protocols: We fully exploit the mathematical properties of various tensor operations in the design of their specialized proof protocols. This strategy aims to reduce the analytical overhead of the proof.
- Implements CUDA-accelerations: We implement the protocols in CUDA to achieve a high degree of parallelization, enabled by the preserved tensor structures. This implementation is intended to reduce the empirical overhead of the proof.
Achievements
We have successfully developed the first operational ZKP schemes for:
- Large Language Models (LLMs) with up to 13 billion parameters: Our system can generate proofs in just 15 minutes per inference. This accomplishment is made possible through zkLLM, which introduces
tlookup
for the efficient handling of non-arithmetic tensor operations andzkAttn
for the attention mechanism. zkLLM ensures the privacy of model parameters and enables efficient zero-knowledge verifiable computations over LLMs. - Training Neural Networks of size 10 million: We have achieved a proving time of just 1 minute per update. This feat is accomplished through zkDL, which initially focuses on the verification of ReLU activation and its backpropagation (now superseded by zkLLM’s
tlookup
), and subsequently develops FAC4DNN for modeling neural networks as arithmetic circuits.
Related publications
- zkLLM: Efficient Zero-Knowledge Proofs for Large Language Models
- zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training
Future directions
- Deep learning under fixed-point arithmetic: To apply cryptographic primitives, it is necessary to use fixed-point arithmetic instead of floating-point arithmetic. Specialized developments that adapt models to fixed-point arithmetic can help reduce overhead while preserving accuracy. This is closely related to, but different from, model quantization.
- Implementation: Before any industrialization can take place, we believe that a
torch
implementation over finite fields and elliptic curves is necessary. This will lay the groundwork for the practical application of our ZKP system in real-world scenarios.
But unfortunately, I am no longer working on this…
Yes, this project has been terminated after more than a year of my struggles. The primary reason is that I cannot complete the future directions listed above on my own or with only a few collaborators (although they have been great). Additionally, the fundamental empirical overhead introduced by the cryptographic structures, despite my efforts to eliminate asymptotic overhead, remains a significant challenge compared to, for example, float16
. Overcoming this would require revolutionary advancements in cryptography. My expertise lies more in machine learning, so this is likely not my responsibility.