Retrospective, Lessons Learned, and v2 Roadmap

Retrospective and v2 roadmap for the Raspberry Pi voice assistant

Retrospective, Lessons Learned, and v2 Roadmap

This post is part of the Voice Assistant on Raspberry Pi series.

Six articles and two evenings later, we have a French-language voice assistant running entirely on two Raspberry Pi 4s.

What we built

Final architecture:

┌─────────────────────────────────────────────────────┐
│                   Pi Client (2GB)                   │
│  GPIO Button → arecord → ffmpeg → Whisper.net       │
│  .NET 10 Worker Service                             │
│  Piper TTS → ffmpeg → aplay                        │
└──────────────────────┬──────────────────────────────┘
                       │ HTTP (local network)
┌──────────────────────▼──────────────────────────────┐
│                   Pi Brain (4GB)                    │
│  Ollama + llama3.2:1b                              │
└─────────────────────────────────────────────────────┘

Cloud mode: Pi Client → Claude API → Function calling → local tools

Full stack:

ComponentTechnologyDecision
Runtime.NET 10 Worker ServiceC# on Pi, it works
Speech-to-textWhisper.net (base model)Free, local, French supported
Local LLMOllama + llama3.2:1bFast on 4GB RAM
Cloud LLMClaude API (Sonnet)Better French quality
Text-to-speechPiper TTSLightweight, decent French voice
Audioarecord + ffmpeg + aplay48kHz stereo conversion needed
GPIOSystem.Device.GpioSimple, well documented
WeatherOpen-Meteo APIFree, no key required
ToolsAnthropic.SDK function callingExtensible pattern
Auto-startsystemdBoots automatically

What went well

Honestly, .NET on Pi ARM64 surprised me. I expected friction (a missing runtime, something that wouldn’t compile), but none of that. One command to install, and the Worker Service runs cleanly. The DI container gives the project a structure I’d have a harder time achieving as cleanly in Python.

Splitting the two Pis was also the right call. The Pi Client handles audio and transcription, the Pi Brain handles the LLM. That separation gave us the flexibility to swap components without touching the rest, which is exactly what we did in article #5 when we switched from Ollama to Claude API.

The ILlmService abstraction seemed like overkill early on, but it’s what made that swap a single line in appsettings.json. The kind of decision you make “just in case” and never regret.

Open-Meteo also deserves a mention: a free, key-free, registration-free weather API. Hard to find that combination.

The pain points

The AB13X USB audio adapter cost me a fair bit of time. I expected to plug in a USB mic and have it work, but the AB13X only records at 48kHz stereo, and Whisper needs 16kHz mono. Playback had the same problem: aplay was speeding up the audio because the sample rate didn’t match. The fix runs through ffmpeg in both directions, which works well now, but a better adapter that supports multiple formats from the start would have saved the detour.

The GPIO button also caught me off guard: a 4-pin tactile button has its pins in two pairs, and if you wire both leads to the same pair, nothing happens. You need to wire across opposite sides. I lost a solid hour before figuring that out.

Transcription quality is the most visible limitation. Whisper base handles quiet environments fine, but with background noise or a normal speaking pace in Québécois French, the results get creative:

  • “Comment tu t’appelles?”“Comment je te pète?”
  • “Quel est ton nom?”“Quelle est-on non?”

And llama3.2:1b in French is imaginative. When I asked how to dress for cool weather, it responded with “la crinière devra s’abuffer”. Claude API fixes this, at the cost of a cloud dependency.

If I were starting over

I’d start with Whisper small instead of base, the quality difference is worth the latency. I’d pick an audio adapter that supports mono and multiple sample rates from the start, which would eliminate the ffmpeg complexity. And I’d validate GPIO with a standalone Python script before integrating into .NET.

v2 ideas

Wake word. Replacing the button with “Hey Alex” would be the most impactful change. The openWakeWord library runs on Pi ARM64 and is free. That’s a project on its own.

4-inch status display. I have a small HDMI display sitting around. Showing “Listening…”, “Processing…”, “Responding…” through a minimal ASP.NET Core interface on the Pi Client would be simple and useful.

API transcription. Sending the WAV to OpenAI’s Whisper API instead of transcribing locally would give much better quality and lower latency at about $0.006/minute for personal use.

Persistent memory. Conversation history disappears on restart. SQLite via EF Core would fix that cleanly.

Home Assistant. A HomeAssistantTool for lights, sensors, occupancy: a few dozen lines with the REST API.

Google Calendar. A CalendarTool for today’s events. “What do I have today?” would become a genuinely useful question.

Series recap

ArticleTopicEstimated effort
#1Setting up both Pis~2h
#2.NET 10 Worker Service + audio pipeline~3h
#3Ollama integration~1h
#4Memory, silence detection, systemd~2h
#5Real-time weather + Claude API swap~2h
#6Function calling + extensible tools~3h

Two evenings of work for a complete voice assistant, in .NET, on hardware under $150.

The complete source code for this series is available on GitHub.

Articles in this series

  1. Setting Up Both Raspberry Pis
  2. .NET 10 Worker Service and Audio Pipeline
  3. Ollama Integration and Home Context
  4. Memory, Silence Detection, and systemd
  5. Real-Time Weather and Swapping to the Claude API
  6. Function Calling: Teaching Tools to the Assistant
  7. Retrospective, Lessons Learned, and v2 Roadmap (this article)

The biggest surprise was how well .NET runs on Pi ARM64, and that a local LLM on a sub-$100 board is already usable. The weak link is Whisper base: fix that, and the assistant holds up for daily use.


This post was written with AI assistance and edited by me.


See also