Retrospective, Lessons Learned, and v2 Roadmap

This post is part of the Voice Assistant on Raspberry Pi series.

Six articles and two evenings later, we have a French-language voice assistant running entirely on two Raspberry Pi 4s.

What we built

Final architecture:

┌─────────────────────────────────────────────────────┐
│                   Pi Client (2GB)                   │
│  GPIO Button → arecord → ffmpeg → Whisper.net       │
│  .NET 10 Worker Service                             │
│  Piper TTS → ffmpeg → aplay                        │
└──────────────────────┬──────────────────────────────┘
                       │ HTTP (local network)
┌──────────────────────▼──────────────────────────────┐
│                   Pi Brain (4GB)                    │
│  Ollama + llama3.2:1b                              │
└─────────────────────────────────────────────────────┘

Cloud mode: Pi Client → Claude API → Function calling → local tools

Full stack:

Component	Technology	Decision
Runtime	.NET 10 Worker Service	C# on Pi, it works
Speech-to-text	Whisper.net (base model)	Free, local, French supported
Local LLM	Ollama + llama3.2:1b	Fast on 4GB RAM
Cloud LLM	Claude API (Sonnet)	Better French quality
Text-to-speech	Piper TTS	Lightweight, decent French voice
Audio	arecord + ffmpeg + aplay	48kHz stereo conversion needed
GPIO	System.Device.Gpio	Simple, well documented
Weather	Open-Meteo API	Free, no key required
Tools	Anthropic.SDK function calling	Extensible pattern
Auto-start	systemd	Boots automatically

What went well

Honestly, .NET on Pi ARM64 surprised me. I expected friction (a missing runtime, something that wouldn’t compile), but none of that. One command to install, and the Worker Service runs cleanly. The DI container gives the project a structure I’d have a harder time achieving as cleanly in Python.

Splitting the two Pis was also the right call. The Pi Client handles audio and transcription, the Pi Brain handles the LLM. That separation gave us the flexibility to swap components without touching the rest, which is exactly what we did in article #5 when we switched from Ollama to Claude API.

The ILlmService abstraction seemed like overkill early on, but it’s what made that swap a single line in appsettings.json. The kind of decision you make “just in case” and never regret.

Open-Meteo also deserves a mention: a free, key-free, registration-free weather API. Hard to find that combination.

The pain points

The AB13X USB audio adapter cost me a fair bit of time. I expected to plug in a USB mic and have it work, but the AB13X only records at 48kHz stereo, and Whisper needs 16kHz mono. Playback had the same problem: aplay was speeding up the audio because the sample rate didn’t match. The fix runs through ffmpeg in both directions, which works well now, but a better adapter that supports multiple formats from the start would have saved the detour.

The GPIO button also caught me off guard: a 4-pin tactile button has its pins in two pairs, and if you wire both leads to the same pair, nothing happens. You need to wire across opposite sides. I lost a solid hour before figuring that out.

Transcription quality is the most visible limitation. Whisper base handles quiet environments fine, but with background noise or a normal speaking pace in Québécois French, the results get creative:

“Comment tu t’appelles?” → “Comment je te pète?”
“Quel est ton nom?” → “Quelle est-on non?”

And llama3.2:1b in French is imaginative. When I asked how to dress for cool weather, it responded with “la crinière devra s’abuffer”. Claude API fixes this, at the cost of a cloud dependency.

If I were starting over

I’d start with Whisper small instead of base, the quality difference is worth the latency. I’d pick an audio adapter that supports mono and multiple sample rates from the start, which would eliminate the ffmpeg complexity. And I’d validate GPIO with a standalone Python script before integrating into .NET.

v2 ideas

Wake word. Replacing the button with “Hey Alex” would be the most impactful change. The openWakeWord library runs on Pi ARM64 and is free. That’s a project on its own.

4-inch status display. I have a small HDMI display sitting around. Showing “Listening…”, “Processing…”, “Responding…” through a minimal ASP.NET Core interface on the Pi Client would be simple and useful.

API transcription. Sending the WAV to OpenAI’s Whisper API instead of transcribing locally would give much better quality and lower latency at about $0.006/minute for personal use.

Persistent memory. Conversation history disappears on restart. SQLite via EF Core would fix that cleanly.

Home Assistant. A HomeAssistantTool for lights, sensors, occupancy: a few dozen lines with the REST API.

Google Calendar. A CalendarTool for today’s events. “What do I have today?” would become a genuinely useful question.

Series recap

Article	Topic	Estimated effort
#1	Setting up both Pis	~2h
#2	.NET 10 Worker Service + audio pipeline	~3h
#3	Ollama integration	~1h
#4	Memory, silence detection, systemd	~2h
#5	Real-time weather + Claude API swap	~2h
#6	Function calling + extensible tools	~3h

Two evenings of work for a complete voice assistant, in .NET, on hardware under $150.

The complete source code for this series is available on GitHub.

Articles in this series

Setting Up Both Raspberry Pis
.NET 10 Worker Service and Audio Pipeline
Ollama Integration and Home Context
Memory, Silence Detection, and systemd
Real-Time Weather and Swapping to the Claude API
Function Calling: Teaching Tools to the Assistant
Retrospective, Lessons Learned, and v2 Roadmap (this article)

The biggest surprise was how well .NET runs on Pi ARM64, and that a local LLM on a sub-$100 board is already usable. The weak link is Whisper base: fix that, and the assistant holds up for daily use.

This post was written with AI assistance and edited by me.