Papers
arxiv:2604.12128

When Self-Reference Fails to Close: Matrix-Level Dynamics in Large Language Models

Published on Apr 13
Authors:

Abstract

Self-referential prompts in large language models affect internal matrix dynamics, with non-closing truth recursion causing instability measured through attention effective rank and other metrics across multiple architectures.

AI-generated summary

We investigate how self-referential inputs alter the internal matrix dynamics of large language models. Measuring 106 scalar metrics across up to 7 analysis passes on four models from three architecture families -- Qwen3-VL-8B, Llama-3.2-11B, Llama-3.3-70B, and Gemma-2-9B -- over 300 prompts in a 14-level hierarchy at three temperatures (T in {0.0, 0.3, 0.7}), we find that self-reference alone is not destabilizing: grounded self-referential statements and meta-cognitive prompts are markedly more stable than paradoxical self-reference on key collapse-related metrics, and on several such metrics can be as stable as factual controls. Instability concentrates in prompts inducing non-closing truth recursion (NCTR) -- truth-value computations with no finite-depth resolution. NCTR prompts produce anomalously elevated attention effective rank -- indicating attention reorganization with global dispersion rather than simple concentration collapse -- and key metrics reach Cohen's d = 3.14 (attention effective rank) to 3.52 (variance kurtosis) vs. stable self-reference in the 70B model; 281/397 metric-model combinations differentiate NCTR from stable self-reference after FDR correction (q < 0.05), 198 with |d| > 0.8. Per-layer SVD confirms disruption at every sampled layer (d > +1.0 in all three models analyzed), ruling out aggregation artifacts. A classifier achieves AUC 0.81-0.90; 30 minimal pairs yield 42/387 significant combinations; 43/106 metrics replicate across all four models. We connect these observations to three classical matrix-semigroup problems and propose, as a conjecture, that NCTR forces finite-depth transformers toward dynamical regimes where these problems concentrate. NCTR prompts also produce elevated contradictory output (+34-56 percentage points vs. controls), suggesting practical relevance for understanding self-referential failure modes.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.12128
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.12128 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.12128 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.12128 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.