ALL models when tasked tend to deviate, fail and mess up because no enforcement is done at runtime. A method to fix it. [P]

I was told here to post here instead from the AI automations subreddit, since it is a more generic solution which applies broadly. I have been following this and many other subs around LLMs and Agents, everything from the top posts to recent are regarding agents going off and doing something they are not supposed to do, drift and ignore the system prompts. That's just the way models behave now, (and will for a while). Real examples:

"Never delete user data" → agent calls DROP TABLE users next turn
"Don't share internal pricing" → agent leaks cost basis to a customer
"Verify identity first" → agent skips to the action
Add 10 more rules → model quietly drops the first 5

I am 100% sure if you have used Agents in prod, this has occurred to you (especially when your system prompts get larger, and context gets bigger). You can test this yourself and notice immediate enforcement.

Prompt-based rules are suggestions, not constraints. Re-prompting fixes one case, breaks two. Post-hoc evals tell you what already went wrong. NeMo and Guardrails AI help on content safety but don't cover business logic/your specification.

After tackling this from a few angles, I finally got something solid. A proxy system between your app and your LLM, which reads rules from a plain markdown, enforces at runtime. Provider-agnostic, one base URL change, works with LangGraph/CrewAI/custom.

- Maximum discount is 15%. - Never reveal internal pricing or cost basis.

Without it: agent offers 90% off and mentions your margin. With it: 15%, no margin talk.

Curious if it solved your LLMs for outputting incorrect stuff or agents from going off tracks, it definitely did for my (specific) use cases.

What's everyone doing for this in prod? Shadow evals? Re-prompt loops? Something I'm missing?

submitted by /u/Chinmay101202
[link] [comments]

ALL models when tasked tend to deviate, fail and mess up because no enforcement is done at runtime. A method to fix it. [P]

Want to read more?

Tagged with