Local Agentic Flow with Mistral Small 3.2
At the beginning of May, I wrote: “As someone who prefers running AI locally for reasons like privacy and control, I believe achieving an agentic flow like the one shown isn’t possible yet with local AI setups.”, in this blog post.
Then came the recent announcement of devstral
by Mistral, which was supposed to enable just that. I also saw some benchmarks that looked promising. However, it seems that I wasn’t able to use an agentic workflow for this. Most importantly, it actually faked tool calls that never happened. Maybe I used a suboptimal system prompt, maybe I did something wrong, any hints are welcome.
But then I read that Mistral Small 3.2
has improved the robustness of function calls. So I made three tools list_files
, read_file
and write_file
available for it and see: it works pretty well. I’ve only tested with Python so far since the model has only been available for a few days, but the code quality also seems to be better than devstral. Surprising.
If you are using ollama, then take a look here: https://ollama.com/library/mistral-small3.2.
Something like this is actually better to see in a screencast than to explain in words. I cut out a few waiting times, but you can see the time stamps in the agentic progress boxes. Of course, you can’t expect the response times of high-performance servers, but it feels comparatively fast on an M2 Max.
As a test case, I had it create some FIR filter plots, an exercise I did in my last blog post with Claude, and it solved the problem pretty well. Funnily enough, it also made the mistake of passing the filter order as the number of taps (but the correct relationship is number of taps = filter order + 1). But that’s just a side note.
The memory requirements can be seen in the screencast for the operation. I simply used ollama
connected via http://localhost:11434/v1
. I set the context window size to 16,384 and the maximum number of tokens issued to 8,192. As with all Ollama operations, all available GPU capacity is used during the queries.
I then had it generate not only the Python program, but also an installation shell script. This should set up a virtual Python environment (venv) that analyzes the Python program just generated in the previous chat, determines all required dependencies it contains and then installs them into this virtual environment via pip. Finally, of course, I also executed the program. But see for yourself: