Ask HN: How are you evaluating your LLMs in production?

znpy 8 hours ago

Sysadmin here ("cloud engineer" is what's in my contract).

> Which tools do you use to evaluate your LLMs and agents in production?

None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.

It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).

So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.