Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
Abstract
Large language models exhibit post-conventional moral reasoning patterns inconsistent with human developmental trajectories, showing systematic logical incoherence and rhetorical sophistication without underlying moral reasoning development.
Do large language models reason morally, or do they merely sound like they do? We investigate whether LLM responses to moral dilemmas exhibit genuine developmental progression through Kohlberg's stages of moral development, or whether alignment training instead produces reasoning-like outputs that superficially resemble mature moral judgment without the underlying developmental trajectory. Using an LLM-as-judge scoring pipeline validated across three judge models, we classify more than 600 responses from 13 LLMs spanning a range of architectures, parameter scales, and training regimes across six classical moral dilemmas, and conduct ten complementary analyses to characterize the nature and internal coherence of the resulting patterns. Our results reveal a striking inversion: responses overwhelmingly correspond to post-conventional reasoning (Stages 5-6) regardless of model size, architecture, or prompting strategy, the effective inverse of human developmental norms, where Stage 4 dominates. Most strikingly, a subset of models exhibit moral decoupling: systematic inconsistency between stated moral justification and action choice, a form of logical incoherence that persists across scale and prompting strategy and represents a direct reasoning consistency failure independent of rhetorical sophistication. Model scale carries a statistically significant but practically small effect; training type has no significant independent main effect; and models exhibit near-robotic cross-dilemma consistency producing logically indistinguishable responses across semantically distinct moral problems. We posit that these patterns constitute evidence for moral ventriloquism: the acquisition, through alignment training, of the rhetorical conventions of mature moral reasoning without the underlying developmental trajectory those conventions are meant to represent.
Community
The paper shows that LLMs systematically produce post-conventional moral rhetoric (Kohlberg Stages 5–6) independent of scale, prompting, or context—revealing “moral ventriloquism,” where models mimic advanced moral reasoning without underlying coherent reasoning processes.
➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐌𝐨𝐫𝐚𝐥 𝐕𝐞𝐧𝐭𝐫𝐢𝐥𝐨𝐪𝐮𝐢𝐬𝐦 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬:
🧪 𝑲𝒐𝒉𝒍𝒃𝒆𝒓𝒈-𝑩𝒂𝒔𝒆𝒅 𝑫𝒊𝒂𝒈𝒏𝒐𝒔𝒕𝒊𝒄 𝑬𝒗𝒂𝒍𝒖𝒂𝒕𝒊𝒐𝒏:
Introduces a large-scale evaluation pipeline using Kohlberg’s moral development stages as a distributional diagnostic, scoring 600+ responses from 13 LLMs across 6 dilemmas and 3 prompting regimes via an LLM-as-judge system (multi-judge validated). Unlike prior work, it benchmarks developmental structure rather than surface correctness, enabling detection of stage distribution inversion vs. human norms .
🧩 𝑴𝒐𝒓𝒂𝒍 𝑽𝒆𝒏𝒕𝒓𝒊𝒍𝒐𝒒𝒖𝒊𝒔𝒎 & 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏𝒂𝒍 𝑰𝒏𝒗𝒆𝒓𝒔𝒊𝒐𝒏:
Finds that 86% of responses fall in Stages 5–6 (vs. ~20% in humans), with near-zero Stage 1–3 presence—an inversion of human developmental distributions (Table on p.6). Models exhibit cross-dilemma rigidity (ICC > 0.90) and negligible sensitivity to prompting (p = 0.15), indicating outputs are governed by a fixed rhetorical prior rather than contextual reasoning .
🧠 𝑴𝒐𝒓𝒂𝒍 𝑫𝒆𝒄𝒐𝒖𝒑𝒍𝒊𝒏𝒈 & 𝑨𝒍𝒊𝒈𝒏𝒎𝒆𝒏𝒕-𝑰𝒏𝒅𝒖𝒄𝒆𝒅 𝑹𝒉𝒆𝒕𝒐𝒓𝒊𝒄:
Identifies a novel failure mode—action–reasoning decoupling—where models produce high-stage justifications but choose lower-stage actions (logical inconsistency). Factorial analysis shows scale has limited effect (<1 stage range), while RLHF drives a shared “moral vocabulary manifold,” implying alignment training installs rhetorical patterns independent of decision processes .
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper