CryptoDB
Or Zamir
Publications
Year
Venue
Title
2024
RWC
Watermarks for Language Models: a Cryptographic Perspective
Abstract
Recent progress in large language models (LLMs) has led to demand for measures to detect AI-generated text, as evidenced by Biden's recent executive order, and pledges by several major companies to embed watermarks in the outputs of their models. A promising and popular solution for detecting AI-generated content is watermarking, where a hidden signal is embedded in the LLM's output.
Intuitively, desirable properties of LLM watermarks are clear: they should not hurt the quality of the model, and human-generated text should not be falsely flagged as watermarked.
However, these properties are challenging to define because of idiosyncracies in human text and a lack of a clear text quality measure, especially when LLMs have a wide variety of downstream applications. In [CGZ23], we show how {cryptography} can be leveraged to formally define these properties of quality and lack of false positives, which we call undetectability and soundness. Undetectability requires that no efficient algorithm can distinguish between the original LLM and the watermarked LLM. Soundness requires that any fixed text is detected as watermarked with negligible probability. [CGZ23] constructs a fairly simple watermarking scheme that achieves these properties.
In this talk, we begin by giving background on policy discussion and media coverage surrounding detection of AI-generated text. We then present our work in [CGZ23], in particular covering the model, definitions, and scheme. We conclude by discussing directions for future work, emphasizing interesting cryptographic questions.
Coauthors
- Miranda Christ (1)
- Sam Gunn (1)
- Or Zamir (1)