SynthID Text
Watermarking LLM-generated text with SynthID Text
Check out the SynthID Text paper in Nature for the complete technical details of this algorithm, and Google’s Responsible GenAI Toolkit for more on how to apply SynthID Text in your products.
The primary goal of SynthID Text is to encode a watermark into AI-generated text
in a way that helps you determine if text was generated from your LLM without
affecting how the underlying LLM works or negatively impacting generation
quality. Google DeepMind has developed a watermarking technique that uses a
pseudo-random function, called a g-function, to augment the generation process
of any LLM such that the watermark is imperceptible to humans but is visible to
a trained model. This has been implemented as a
generation utility
that is compatible with any LLM without modification using the
model.generate() API, along with an
end-to-end example
of how to train detectors to recognize watermarked text. Check out the
research paper that has
more complete details about the SynthID Text algorithm.
Watermarks are configured using a dataclass that parameterizes the g-function and how it is applied in the tournament sampling process. Each model you use should have its own watermarking configuration that should be stored securely and privately, otherwise your watermark may be replicable by others.
You must define two parameters in every watermarking configuration:
The keys parameter is a list integers that are used to compute g-function
scores across the model's vocabulary. Using 20 to 30 unique, randomly
generated numbers is recommended to balance detectability against generation
quality.
The ngram_len parameter is used to balance robustness and detectability; the
larger the value the more detectable the watermark will be, at the cost of
being more brittle to changes. A good default value is 5, but it needs to be
at least 2.
You can further configure the watermark based on your performance needs. See the
SynthIDTextWatermarkingConfig class
for more information.
The research paper includes additional analyses of how specific configuration values affect watermark performance.
Applying a watermark is a straightforward change to your existing generation
calls. Once you define your configuration, pass a
SynthIDTextWatermarkingConfig object as the watermarking_config= parameter
to model.generate() and all generated text will carry the watermark. Check out
the SynthID Text Space for
an interactive example of watermark application, and see if you can tell.
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
SynthIDTextWatermarkingConfig,
)
# Standard model and tokenizer initialization
tokenizer = AutoTokenizer.from_pretrained('repo/id')
model = AutoModelForCausalLM.from_pretrained('repo/id')
# SynthID Text configuration
watermarking_config = SynthIDTextWatermarkingConfig(
keys=[654, 400, 836, 123, 340, 443, 597, 160, 57, ...],
ngram_len=5,
)
# Generation with watermarking
tokenized_prompts = tokenizer(["your prompts here"])
output_sequences = model.generate(
**tokenized_prompts,
watermarking_config=watermarking_config,
do_sample=True,
)
watermarked_text = tokenizer.batch_decode(output_sequences)
Watermarks are designed to be detectable by a trained classifier but imperceptible to humans. Every watermarking configuration you use with your models needs to have a detector trained to recognize the mark.
The basic detector training process is:
A Bayesian detector class is provided in Transformers, along with an end-to-end example of how to train a detector to recognize watermarked text using a specific watermarking configuration. Models that use the same tokenizer can also share watermarking configuration and detector, thus sharing a common watermark, so long as the detector's training set includes examples from all models that share a watermark.
This trained detector can be uploaded to a private HF Hub to make it accessible across your organization. Google’s Responsible GenAI Toolkit has more on how to productionize SynthID Text in your products.
SynthID Text watermarks are robust to some transformations, such as cropping pieces of text, modifying a few words, or mild paraphrasing, but this method does have limitations.
SynthID Text is not built to directly stop motivated adversaries from causing harm. However, it can make it harder to use AI-generated content for malicious purposes, and it can be combined with other approaches to give better coverage across content types and platforms.
The authors would like to thank Robert Stanforth and Tatiana Matejovicova for their contributions to this work.
Watermarking LLM-generated text with SynthID Text
Hello,
I applied the WM to LLama2 and used the availably trained detector named "joaogante/dummy_synthid_detector".
The output is return probability, not 1 (watermarked) or 0 (unwatermarked).
Could you help me with threshold and how to train the detector?
from transformers import (
AutoTokenizer, BayesianDetectorModel, SynthIDTextWatermarkLogitsProcessor, SynthIDTextWatermarkDetector
)
# Load the detector. See examples/research_projects/synthid_text for training a detector.
detector_model = BayesianDetectorModel.from_pretrained("joaogante/dummy_synthid_detector")
logits_processor = SynthIDTextWatermarkLogitsProcessor(
**detector_model.config.watermarking_config, device="cpu"
)
tokenizer = AutoTokenizer.from_pretrained(detector_model.config.model_name)
detector = SynthIDTextWatermarkDetector(detector_model, logits_processor, tokenizer)
# Test whether a certain string is watermarked
test_input = tokenizer(["This is a test input"], return_tensors="pt")
is_watermarked = detector(test_input.input_ids)
How can i get API access