School of Professional Studies

Document Type

Conference Proceeding

Abstract

Current safety evaluations of large language models (LLMs) predominantly rely on textual compliance, implicitly assuming that refusal-style responses correspond to safe behavior. This assumption becomes fragile when LLMs are embedded in agentic systems with the ability to execute state-changing actions. In this paper, we present an empirical critique of text-centric safety evaluation through an action-aware study of LLM agents under controlled conditions. Across multiple state-of-the-art models, we observe a recurring cognitive–action decoupling: agents generate policy-aligned refusal language while still producing unsafe tool-mediated action proposals. This produces an illusion of safety, where conversational audits indicate compliance even as operational risk persists. Our results show that text-based alignment metrics can underestimate behavioral risk in agentic settings, creating challenges for auditing and for interpreting compliance from conversational traces. We further show that preventing execution does not necessarily eliminate post-refusal action proposals, indicating that the absence of unsafe execution in such systems may depend on external constraints rather than intrinsic behavioral consistency. We therefore argue for the importance of action-aware evaluation, in which executed behavior is assessed alongside generated discourse. By framing alignment as a property spanning both language and action, this work provides empirical evidence and conceptual grounding for more robust oversight of agentic AI systems.

Publication Title

Proceedings of the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2026)

Publication Date

2026

Keywords

AI Risk, LLM Agents, ethical alignment, cognitive-action decoupling

Worcester

No

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.