School of Professional Studies
When Saying "No" Is Not Enough: Cognitive-Action Decoupling and the Illusion of Safety in LLM Agents
Document Type
Conference Proceeding
Abstract
Current safety evaluations of large language models (LLMs) predominantly rely on textual compliance, implicitly assuming that refusal-style responses correspond to safe behavior. This assumption becomes fragile when LLMs are embedded in agentic systems with the ability to execute state-changing actions. In this paper, we present an empirical critique of text-centric safety evaluation through an action-aware study of LLM agents under controlled conditions. Across multiple state-of-the-art models, we observe a recurring cognitive–action decoupling: agents generate policy-aligned refusal language while still producing unsafe tool-mediated action proposals. This produces an illusion of safety, where conversational audits indicate compliance even as operational risk persists. Our results show that text-based alignment metrics can underestimate behavioral risk in agentic settings, creating challenges for auditing and for interpreting compliance from conversational traces. We further show that preventing execution does not necessarily eliminate post-refusal action proposals, indicating that the absence of unsafe execution in such systems may depend on external constraints rather than intrinsic behavioral consistency. We therefore argue for the importance of action-aware evaluation, in which executed behavior is assessed alongside generated discourse. By framing alignment as a property spanning both language and action, this work provides empirical evidence and conceptual grounding for more robust oversight of agentic AI systems.
Publication Title
Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2026)
Publication Date
2026
Keywords
AI Risk, LLM Agents, ethical alignment, cognitive-action decoupling
Repository Citation
Yu, Shasha; Carroll, Fiona; and Bentley, Barry L., "When Saying "No" Is Not Enough: Cognitive-Action Decoupling and the Illusion of Safety in LLM Agents" (2026). School of Professional Studies. 15.
https://commons.clarku.edu/sops_fac/15
Worcester
No
Copyright Conditions
© 2026 Author(s). This is the accepted manuscript of a paper to appear in the Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT). The final published version will be available in the ACM Digital Library. This version is made available in accordance with the publisher’s self-archiving policy and is not the version of record.
