arxiv:2406.13348

Textual Unlearning Gives a False Sense of Unlearning

Published on Jun 10, 2025

Authors:

Abstract

Textual unlearning in language models fails to provide adequate privacy protection, as demonstrated by attacks that can still detect removed content and reveal information about unlearned texts.

AI-generated summary

Language Models (LMs) are prone to ''memorizing'' training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ''forget'' specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2406.13348

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.13348 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.13348 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.13348 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.