School of Professional Studies
Document Type
Conference Proceeding
Abstract
Large language models (LLMs) serving as planners in tool-using autonomous agents introduce dynamic reliability risks in multi-turn execution. While single-turn safety mechanisms are relatively mature, extended interactions reveal structural vulnerabilities where initial alignment degrades over time. This paper empirically characterizes two observed failure modes across multiple state-of-the-art LLMs: Safety Drift, the gradual erosion of declared safety intent leading to constraint-violating actions (e.g., textual refusal followed by reconnaissance and unsafe execution), and Operational Hallucination, persistent repetitive tool calls indicative of flawed state perception (e.g., livelocks even in legitimate tasks). Through controlled multi-turn evaluation on high-stakes ethical dilemmas, malicious requests, and benign controls, we quantify these phenomena using declaration-action gap and livelock metrics, demonstrating their cross-model prevalence under direct execution protocols. Root-cause analysis attributes the instabilities to the decoupling of reasoning context from execution state in current agent loops. We propose an Action-Aware Supervision Layer—a lightweight, plug-and-play architectural blueprint incorporating intent-action consistency checks, runtime state tracking, and forced termination primitives. Post-hoc simulation on captured failure trajectories shows the layer can intercept observed violations without false positives on benign cases. This work advances agent reliability by shifting focus from linguistic safeguards to enforceable architectural mechanisms for responsible agentic AI.
Publication Title
Proceedings of the 2026 IEEE International Conference on AI and Data Analytics (ICAD 2026)
Publication Date
2026
Keywords
AI system risk, safety drift, operational hallucination, agent reliability, autonomous systems
Repository Citation
Yu, Shasha; Carroll, Fiona; and Bentley, Barry L., "Operational Hallucination and Safety Drift in AI Agents" (2026). School of Professional Studies. 14.
https://commons.clarku.edu/sops_fac/14
Worcester
No
Copyright Conditions
© 2026 Author(s). This is the accepted manuscript of a paper accepted to the 2026 IEEE International Conference on AI and Data Analytics (ICAD 2026). The final published version will appear in the conference proceedings published by IEEE. This version is made available in accordance with IEEE’s self-archiving policy and is not the version of record.
