Small language models are the future of agentic AI

46 points by favoboa 5 hours ago

No mention of mixture-of-exports. Seems related. They do list a DeepSeek R1 distillate as an SLM. The introduction starts with sales pitch. And there's a call-to-action at the end. This seems like marketing with source references sprinkled in.

That said, I also think the "Unix" approach to ML is right. We should see more splits, however currently all these tools rely on great language comprehension. Sure, we might be able to train a model on only English and delegate translation to another model, but that will certainly lose (much needed) color. So if all of these agents will need comprehensive language understanding anyway, to be able to communicate with each other, is SLM really better than MoE?

What I'd love to "distill" out of these models is domain knowledge that is stale anyway. It's great that I can ask Claude to implement a React component, but why does the model that can do taxes so-so also try to write a React component so-so? Perhaps what's needed is a search engine to find agents. Now we're into expensive market place subscription territory, but that's probably viable for companies. It'll create a larger us-them chasm, though and the winner takes it all.

bryant 3 hours ago

A few weeks ago, I processed a product refund with Amazon via agent. It was simple, straightforward, and surprisingly obvious that it was backed by a language model based on how it responded to my frustration about it asking tons of questions. But in the end, it processed my refund without ever connecting me with a human being.

I don't know whether Amazon relies on LLMs or SLMs for this and for similar interactions, but it makes tons of financial sense to use SLMs for narrowly scoped agents. In use cases like customer service, the intelligence behind LLMs is all wasted on the task the agents are trained for.

Wouldn't surprise me if down the road we start suggesting role-specific SLMs rather than general LLMs as both an ethics- and security-risk mitigation too.

automatic6131 17 minutes ago

You can (used to?) get a refund on Amazon with normal CRUD app flow. Putting an SLM and a conversational interface over it is a backwards step.
torginus 11 minutes ago

I just had my first experience with a customer service LLM. I needed to get my account details changed, and for that I needed to use the customer support chat.
The LLM told me what sort of information they need, and what is the process, after which I followed through the whole thing.
After I went through the whole thing it reassured me everything is in order, and my request is being processed.
For two weeks, nothing happened, I emailed the (human) support staff, and they responded to me, that they can see no such request in their system, turns out the LLM hallucinated the entire customer flow and was just spewing BS at me.
- exe34 8 minutes ago
  
  That's why I take screenshots of anything that I don't get an email confirmation for.

moqizhengz 14 minutes ago

How is SLM the future of AI while we are not even sure about if LMs are the future of AI?

boxed 3 minutes ago

"Future" maybe means "next two months"? :P

janpmz 3 hours ago

One could start with a large model for exploration during development, and then distill it down to a small model that covers the variety of the task and fits on a USB drive. E.g. when I use a model for gardening purposes, I could prune knowledge about other topics.

loktarogar 2 hours ago

Pruning is exactly what you're looking for in a gardening SLM

eric-burel 3 hours ago

Slightly related, on the cooperation between large models and small models (traditional ML) : https://arxiv.org/abs/2409.06857