This post is part of the Voice Assistant on Raspberry Pi series.
Six articles and two evenings later, we have a French-language voice assistant running entirely on two Raspberry Pi 4s.
What we built
Final architecture:
┌─────────────────────────────────────────────────────┐
│ Pi Client (2GB) │
│ GPIO Button → arecord → ffmpeg → Whisper.net │
│ .NET 10 Worker Service │
│ Piper TTS → ffmpeg → aplay │
└──────────────────────┬──────────────────────────────┘
│ HTTP (local network)
┌──────────────────────▼──────────────────────────────┐
│ Pi Brain (4GB) │
│ Ollama + llama3.2:1b │
└─────────────────────────────────────────────────────┘
Cloud mode: Pi Client → Claude API → Function calling → local tools
Full stack:
| Component | Technology | Decision |
|---|---|---|
| Runtime | .NET 10 Worker Service | C# on Pi, it works |
| Speech-to-text | Whisper.net (base model) | Free, local, French supported |
| Local LLM | Ollama + llama3.2:1b | Fast on 4GB RAM |
| Cloud LLM | Claude API (Sonnet) | Better French quality |
| Text-to-speech | Piper TTS | Lightweight, decent French voice |
| Audio | arecord + ffmpeg + aplay | 48kHz stereo conversion needed |
| GPIO | System.Device.Gpio | Simple, well documented |
| Weather | Open-Meteo API | Free, no key required |
| Tools | Anthropic.SDK function calling | Extensible pattern |
| Auto-start | systemd | Boots automatically |
What went well
Honestly, .NET on Pi ARM64 surprised me. I expected friction (a missing runtime, something that wouldn’t compile), but none of that. One command to install, and the Worker Service runs cleanly. The DI container gives the project a structure I’d have a harder time achieving as cleanly in Python.
Splitting the two Pis was also the right call. The Pi Client handles audio and transcription, the Pi Brain handles the LLM. That separation gave us the flexibility to swap components without touching the rest, which is exactly what we did in article #5 when we switched from Ollama to Claude API.
The ILlmService abstraction seemed like overkill early on, but it’s what made that swap a single line in appsettings.json. The kind of decision you make “just in case” and never regret.
Open-Meteo also deserves a mention: a free, key-free, registration-free weather API. Hard to find that combination.
The pain points
The AB13X USB audio adapter cost me a fair bit of time. I expected to plug in a USB mic and have it work, but the AB13X only records at 48kHz stereo, and Whisper needs 16kHz mono. Playback had the same problem: aplay was speeding up the audio because the sample rate didn’t match. The fix runs through ffmpeg in both directions, which works well now, but a better adapter that supports multiple formats from the start would have saved the detour.
The GPIO button also caught me off guard: a 4-pin tactile button has its pins in two pairs, and if you wire both leads to the same pair, nothing happens. You need to wire across opposite sides. I lost a solid hour before figuring that out.
Transcription quality is the most visible limitation. Whisper base handles quiet environments fine, but with background noise or a normal speaking pace in Québécois French, the results get creative:
- “Comment tu t’appelles?” → “Comment je te pète?”
- “Quel est ton nom?” → “Quelle est-on non?”
And llama3.2:1b in French is imaginative. When I asked how to dress for cool weather, it responded with “la crinière devra s’abuffer”. Claude API fixes this, at the cost of a cloud dependency.
If I were starting over
I’d start with Whisper small instead of base, the quality difference is worth the latency. I’d pick an audio adapter that supports mono and multiple sample rates from the start, which would eliminate the ffmpeg complexity. And I’d validate GPIO with a standalone Python script before integrating into .NET.
v2 ideas
Wake word. Replacing the button with “Hey Alex” would be the most impactful change. The openWakeWord library runs on Pi ARM64 and is free. That’s a project on its own.
4-inch status display. I have a small HDMI display sitting around. Showing “Listening…”, “Processing…”, “Responding…” through a minimal ASP.NET Core interface on the Pi Client would be simple and useful.
API transcription. Sending the WAV to OpenAI’s Whisper API instead of transcribing locally would give much better quality and lower latency at about $0.006/minute for personal use.
Persistent memory. Conversation history disappears on restart. SQLite via EF Core would fix that cleanly.
Home Assistant. A HomeAssistantTool for lights, sensors, occupancy: a few dozen lines with the REST API.
Google Calendar. A CalendarTool for today’s events. “What do I have today?” would become a genuinely useful question.
Series recap
| Article | Topic | Estimated effort |
|---|---|---|
| #1 | Setting up both Pis | ~2h |
| #2 | .NET 10 Worker Service + audio pipeline | ~3h |
| #3 | Ollama integration | ~1h |
| #4 | Memory, silence detection, systemd | ~2h |
| #5 | Real-time weather + Claude API swap | ~2h |
| #6 | Function calling + extensible tools | ~3h |
Two evenings of work for a complete voice assistant, in .NET, on hardware under $150.
The complete source code for this series is available on GitHub.
Articles in this series
- Setting Up Both Raspberry Pis
- .NET 10 Worker Service and Audio Pipeline
- Ollama Integration and Home Context
- Memory, Silence Detection, and systemd
- Real-Time Weather and Swapping to the Claude API
- Function Calling: Teaching Tools to the Assistant
- Retrospective, Lessons Learned, and v2 Roadmap (this article)
The biggest surprise was how well .NET runs on Pi ARM64, and that a local LLM on a sub-$100 board is already usable. The weak link is Whisper base: fix that, and the assistant holds up for daily use.
This post was written with AI assistance and edited by me.