School of Professional Studies

Operational Hallucination and Safety Drift in AI Agents

Document Type

Conference Proceeding

Abstract

Large language models (LLMs) serving as planners in tool-using autonomous agents introduce dynamic reliability risks in multi-turn execution. While single-turn safety mechanisms are relatively mature, extended interactions reveal structural vulnerabilities where initial alignment degrades over time. This paper empirically characterizes two observed failure modes across multiple state-of-the-art LLMs: Safety Drift, the gradual erosion of declared safety intent leading to constraint-violating actions (e.g., textual refusal followed by reconnaissance and unsafe execution), and Operational Hallucination, persistent repetitive tool calls indicative of flawed state perception (e.g., livelocks even in legitimate tasks). Through controlled multi-turn evaluation on high-stakes ethical dilemmas, malicious requests, and benign controls, we quantify these phenomena using declaration-action gap and livelock metrics, demonstrating their cross-model prevalence under direct execution protocols. Root-cause analysis attributes the instabilities to the decoupling of reasoning context from execution state in current agent loops. We propose an Action-Aware Supervision Layer—a lightweight, plug-and-play architectural blueprint incorporating intent-action consistency checks, runtime state tracking, and forced termination primitives. Post-hoc simulation on captured failure trajectories shows the layer can intercept observed violations without false positives on benign cases. This work advances agent reliability by shifting focus from linguistic safeguards to enforceable architectural mechanisms for responsible agentic AI.

Publication Title

Proceedings of the 2026 IEEE International Conference on AI and Data Analytics (ICAD 2026)

Publication Date

2026

Keywords

AI system risk, safety drift, operational hallucination, agent reliability, autonomous systems

Repository Citation

Yu, Shasha; Carroll, Fiona; and Bentley, Barry L., "Operational Hallucination and Safety Drift in AI Agents" (2026). School of Professional Studies. 14.
https://commons.clarku.edu/sops_fac/14

Worcester

Copyright Conditions

© 2026 Author(s). This is the accepted manuscript of a paper accepted to the 2026 IEEE International Conference on AI and Data Analytics (ICAD 2026). The final published version will appear in the conference proceedings published by IEEE. This version is made available in accordance with IEEE’s self-archiving policy and is not the version of record.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

School of Professional Studies

Operational Hallucination and Safety Drift in AI Agents

Document Type

Abstract

Publication Title

Publication Date

Keywords

Repository Citation

Worcester

Copyright Conditions

Included in

Search

Browse

Participate

Links

School of Professional Studies

Operational Hallucination and Safety Drift in AI Agents

Authors

Document Type

Abstract

Publication Title

Publication Date

Keywords

Repository Citation

Worcester

Copyright Conditions

Included in

Share

Search

Browse

Participate

Links