Memory, Silence Detection, and systemd

This post is part of the Voice Assistant on Raspberry Pi series.

The assistant from article #3 works, but every exchange starts from scratch. We fix that in three steps: conversational memory, automatic silence detection, and auto-start at boot with systemd.

The complete code for this article is available on GitHub.

Part 1: Conversational memory

How it works

Ollama supports a chat format with message history, the same as the OpenAI or Claude API. Instead of a plain prompt, you send a list of messages with [{role, content}]. Ollama maintains context and generates responses that are coherent with previous turns.

Turn 1: [system] + [user: "What's your name?"]
Turn 2: [system] + [user: "What's your name?"] + [assistant: "My name is Alex."] + [user: "How old are you?"]
Turn 3: ...

Step 1.1: Update OllamaService

Switch from /api/generate to /api/chat, the Ollama endpoint that supports message history. Replace Services/OllamaService.cs:

using System.Net.Http.Json;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface ILlmService
{
    Task<string> ChatAsync(List<ConversationMessage> history, CancellationToken cancellationToken);
}

public record ConversationMessage(string Role, string Content);

public class OllamaService : ILlmService
{
    private readonly HttpClient _http;
    private readonly AssistantOptions _options;
    private readonly ILogger<OllamaService> _logger;

    public OllamaService(HttpClient http, IOptions<AssistantOptions> options, ILogger<OllamaService> logger)
    {
        _http = http;
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> ChatAsync(List<ConversationMessage> history, CancellationToken cancellationToken)
    {
        _logger.LogInformation("Sending to LLM ({Count} messages)...", history.Count);

        var request = new
        {
            model = _options.OllamaModel,
            messages = history.Select(m => new { role = m.Role, content = m.Content }),
            stream = false
        };

        var response = await _http.PostAsJsonAsync(
            $"{_options.OllamaBaseUrl}/api/chat",
            request,
            cancellationToken);

        response.EnsureSuccessStatusCode();

        var result = await response.Content.ReadFromJsonAsync<OllamaChatResponse>(
            cancellationToken: cancellationToken);

        var text = result?.Message?.Content?.Trim() ?? "I don't have a response.";
        _logger.LogInformation("LLM response: \"{Text}\"", text);
        return text;
    }
}

internal record OllamaChatMessage(string Role, string Content);
internal record OllamaChatResponse(OllamaChatMessage Message);

Step 1.2: Update ContextService

ContextService becomes a conversation manager that maintains history. Replace Services/ContextService.cs:

using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IContextService
{
    List<ConversationMessage> AddUserMessage(string userInput);
    void AddAssistantMessage(string response);
    void Reset();
}

public class ContextService : IContextService
{
    private readonly AssistantOptions _options;
    private readonly List<ConversationMessage> _history = new();
    private readonly ILogger<ContextService> _logger;

    public ContextService(IOptions<AssistantOptions> options, ILogger<ContextService> logger)
    {
        _options = options.Value;
        _logger = logger;
        _history.Add(new ConversationMessage("system", _options.SystemPrompt));
    }

    public List<ConversationMessage> AddUserMessage(string userInput)
    {
        _history.Add(new ConversationMessage("user", userInput));
        _logger.LogInformation("History: {Count} messages", _history.Count);
        return _history;
    }

    public void AddAssistantMessage(string response)
    {
        _history.Add(new ConversationMessage("assistant", response));
    }

    public void Reset()
    {
        _history.Clear();
        _history.Add(new ConversationMessage("system", _options.SystemPrompt));
        _logger.LogInformation("History reset.");
    }
}

History grows with every exchange. If you notice incoherent responses after several turns, call Reset() manually, or add a check on MaxConversationTurns in AddAssistantMessage.

Step 1.3: Add MaxConversationTurns to AssistantOptions

namespace AudioAssistant;

public class AssistantOptions
{
    public int GpioButtonPin { get; set; } = 17;
    public string AudioDevice { get; set; } = "hw:3,0";
    public int RecordingDurationSeconds { get; set; } = 10;
    public string WhisperModel { get; set; } = "ggml-base.bin";
    public string PiperBinary { get; set; } = "/home/gabriel/piper/piper/piper";
    public string PiperVoice { get; set; } = "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx";
    public string AudioOutputDevice { get; set; } = "hw:3,0";
    public string OllamaBaseUrl { get; set; } = "http://pi-cerveau.local:11434";
    public string OllamaModel { get; set; } = "llama3.2:3b";
    public string SystemPrompt { get; set; } = "";
    public int MaxConversationTurns { get; set; } = 10;
}

Step 1.4: Update Worker.cs

using AudioAssistant.Services;

namespace AudioAssistant;

public class Worker : BackgroundService
{
    private readonly IGpioService _gpio;
    private readonly IAudioRecorderService _recorder;
    private readonly ITranscriptionService _transcription;
    private readonly ILlmService _llm;
    private readonly IContextService _context;
    private readonly ISpeechService _speech;
    private readonly ILogger<Worker> _logger;

    public Worker(
        IGpioService gpio,
        IAudioRecorderService recorder,
        ITranscriptionService transcription,
        ILlmService llm,
        IContextService context,
        ISpeechService speech,
        ILogger<Worker> logger)
    {
        _gpio = gpio;
        _recorder = recorder;
        _transcription = transcription;
        _llm = llm;
        _context = context;
        _speech = speech;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Assistant started. Press the button to speak.");

        while (!stoppingToken.IsCancellationRequested)
        {
            _gpio.WaitForButtonPress(stoppingToken);
            if (stoppingToken.IsCancellationRequested) break;

            try
            {
                var audioFile = await _recorder.RecordAsync(stoppingToken);
                var text = await _transcription.TranscribeAsync(audioFile, stoppingToken);

                if (string.IsNullOrWhiteSpace(text))
                {
                    await _speech.SpeakAsync("Je n'ai pas bien entendu. Peux-tu répéter?", stoppingToken);
                }
                else
                {
                    var history = _context.AddUserMessage(text);
                    var response = await _llm.ChatAsync(history, stoppingToken);
                    _context.AddAssistantMessage(response);
                    await _speech.SpeakAsync(response, stoppingToken);
                }

                if (File.Exists(audioFile))
                    File.Delete(audioFile);
            }
            catch (Exception ex) when (!stoppingToken.IsCancellationRequested)
            {
                _logger.LogError(ex, "Error in pipeline");
                await _speech.SpeakAsync("Une erreur s'est produite.", stoppingToken);
            }
        }
    }
}

Step 1.5: Update PiperSpeechService

The direct pipe from Piper to aplay caused a rate mismatch with the USB adapter. The fix is to use intermediate files: Piper generates a 22050 Hz WAV, ffmpeg resamples to 48000 Hz stereo, and aplay plays the result. The finally block ensures temp files are always cleaned up.

Replace Services/PiperSpeechService.cs:

using System.Diagnostics;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface ISpeechService
{
    Task SpeakAsync(string text, CancellationToken cancellationToken);
}

public class PiperSpeechService : ISpeechService
{
    private readonly AssistantOptions _options;
    private readonly ILogger<PiperSpeechService> _logger;

    public PiperSpeechService(IOptions<AssistantOptions> options, ILogger<PiperSpeechService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }

    public async Task SpeakAsync(string text, CancellationToken cancellationToken)
    {
        _logger.LogInformation("Speaking: \"{Text}\"", text);

        var piperFile = Path.Combine(Path.GetTempPath(), $"tts_{Guid.NewGuid()}.wav");
        var resampledFile = Path.Combine(Path.GetTempPath(), $"tts_resampled_{Guid.NewGuid()}.wav");

        try
        {
            // 1. Piper generates a 22050 Hz mono WAV
            var piperPsi = new ProcessStartInfo
            {
                FileName = _options.PiperBinary,
                Arguments = $"--model {_options.PiperVoice} --output_file {piperFile}",
                RedirectStandardInput = true,
                UseShellExecute = false
            };

            using var piper = Process.Start(piperPsi)!;
            await piper.StandardInput.WriteLineAsync(text);
            piper.StandardInput.Close();
            await piper.WaitForExitAsync(cancellationToken);

            // 2. ffmpeg resamples to 48000 Hz stereo for the USB adapter
            var ffmpegPsi = new ProcessStartInfo
            {
                FileName = "ffmpeg",
                Arguments = $"-y -i {piperFile} -ar 48000 -ac 2 {resampledFile}",
                RedirectStandardError = true,
                UseShellExecute = false
            };

            using var ffmpeg = Process.Start(ffmpegPsi)!;
            await ffmpeg.WaitForExitAsync(cancellationToken);

            // 3. aplay plays the resampled file
            var aplayPsi = new ProcessStartInfo
            {
                FileName = "aplay",
                Arguments = $"-D {_options.AudioOutputDevice} {resampledFile}",
                UseShellExecute = false
            };

            using var aplay = Process.Start(aplayPsi)!;
            await aplay.WaitForExitAsync(cancellationToken);
        }
        finally
        {
            if (File.Exists(piperFile)) File.Delete(piperFile);
            if (File.Exists(resampledFile)) File.Delete(resampledFile);
        }
    }
}

Testing memory

Run dotnet run and try this sequence:

You  : "What's your name?"
Alex : "My name is Alex."

You  : "Say your name again."
Alex : "My name is Alex." ← it remembers!

Part 2: Automatic silence detection

The fixed 10-second timer means you have to wait it out even for short questions. Silence detection cuts the recording as soon as you stop speaking.

Step 2.1: New parameters in AssistantOptions

public int SilenceDurationMs { get; set; } = 1500;    // 1.5s of silence to stop
public string SilenceThreshold { get; set; } = "-40dB"; // Detection threshold

Step 2.2: Update AudioRecorderService

Replace the fixed timer with ffmpeg’s silencedetect filter. Replace Services/AudioRecorderService.cs:

using System.Diagnostics;
using Microsoft.Extensions.Options;

namespace AudioAssistant.Services;

public interface IAudioRecorderService
{
    Task<string> RecordAsync(CancellationToken cancellationToken);
}

public class AudioRecorderService : IAudioRecorderService
{
    private readonly AssistantOptions _options;
    private readonly ILogger<AudioRecorderService> _logger;

    public AudioRecorderService(IOptions<AssistantOptions> options, ILogger<AudioRecorderService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }

    public async Task<string> RecordAsync(CancellationToken cancellationToken)
    {
        var rawFile = Path.Combine(Path.GetTempPath(), $"audio_raw_{Guid.NewGuid()}.wav");
        var outputFile = Path.Combine(Path.GetTempPath(), $"audio_{Guid.NewGuid()}.wav");

        _logger.LogInformation("Recording started (auto-silence)...");

        // ffmpeg records from ALSA and stops after SilenceDurationMs of silence
        // RecordingDurationSeconds is the safety max duration
        var ffmpegPsi = new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = string.Join(" ",
                "-f alsa",
                $"-i {_options.AudioDevice}",
                "-af",
                $"silencedetect=noise={_options.SilenceThreshold}:d={_options.SilenceDurationMs / 1000.0}",
                $"-t {_options.RecordingDurationSeconds}",
                "-ar 16000 -ac 1",
                $"-y {rawFile}"),
            RedirectStandardError = true,
            RedirectStandardInput = true,
            UseShellExecute = false
        };

        using var ffmpegProcess = Process.Start(ffmpegPsi)!;

        // Read stderr to detect silence_end
        _ = Task.Run(async () =>
        {
            string? line;
            while ((line = await ffmpegProcess.StandardError.ReadLineAsync()) != null)
            {
                if (line.Contains("silence_end"))
                {
                    _logger.LogInformation("Silence detected, stopping recording.");
                    // Write 'q' rather than Kill() so ffmpeg finishes writing the file
                    ffmpegProcess.StandardInput.Write("q");
                    break;
                }
            }
        }, cancellationToken);

        await ffmpegProcess.WaitForExitAsync(cancellationToken);

        var convertPsi = new ProcessStartInfo
        {
            FileName = "ffmpeg",
            Arguments = $"-y -i {rawFile} -ar 16000 -ac 1 {outputFile}",
            RedirectStandardError = true,
            UseShellExecute = false
        };

        using var convertProcess = Process.Start(convertPsi)!;
        await convertProcess.WaitForExitAsync(cancellationToken);

        if (File.Exists(rawFile))
            File.Delete(rawFile);

        _logger.LogInformation("Recording done: {File}", outputFile);
        return outputFile;
    }
}

Update appsettings.json

{
  "Assistant": {
    "GpioButtonPin": 17,
    "AudioDevice": "hw:3,0",
    "RecordingDurationSeconds": 15,
    "SilenceDurationMs": 1500,
    "SilenceThreshold": "-40dB",
    "WhisperModel": "ggml-base.bin",
    "PiperBinary": "/home/gabriel/piper/piper/piper",
    "PiperVoice": "/home/gabriel/piper-voices/fr_FR-siwis-low.onnx",
    "AudioOutputDevice": "hw:3,0",
    "OllamaBaseUrl": "http://pi-cerveau.local:11434",
    "OllamaModel": "llama3.2:1b",
    "MaxConversationTurns": 10,
    "SystemPrompt": "You are a personal voice assistant named Alex. You help the Mongeon family in Blainville, Quebec. Always respond in French, naturally and concisely. Keep answers to 1-3 sentences — they will be read aloud."
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information"
    }
  }
}

RecordingDurationSeconds is now a safety cap. If silence is never detected, recording stops after that limit. Set it to 15-20 seconds.

On a Pi 4 4 GB with around 800 MB available, llama3.2:3b can exceed the 60-second timeout. llama3.2:1b gives 10-15 second latency, which is plenty for short answers.

Also bump the HTTP client timeout in Program.cs:

builder.Services.AddHttpClient<ILlmService, OllamaService>(client =>
{
    client.Timeout = TimeSpan.FromSeconds(120);
});

Part 3: Auto-start at boot

Step 3.1: Publish the binary

cd ~/projects/AudioAssistant
dotnet publish -c Release -r linux-arm64 --self-contained false -o ~/assistant-publish

Step 3.2: Create the systemd unit

sudo nano /etc/systemd/system/assistant.service

[Unit]
Description=Voice Assistant
After=network.target sound.target

[Service]
Type=simple
User=gabriel
WorkingDirectory=/home/gabriel/assistant-publish
ExecStart=/home/gabriel/.dotnet/dotnet /home/gabriel/assistant-publish/AudioAssistant.dll
Restart=always
RestartSec=5
Environment=DOTNET_ROOT=/home/gabriel/.dotnet
Environment=HOME=/home/gabriel

[Install]
WantedBy=multi-user.target

After=sound.target ensures the audio subsystem is ready before the assistant starts. Without it, Piper can fail on the first boot.

Step 3.3: Enable and start

sudo systemctl daemon-reload
sudo systemctl enable assistant
sudo systemctl start assistant
sudo systemctl status assistant

# Stream logs
journalctl -u assistant -f

Update workflow

cd ~/projects/AudioAssistant
dotnet publish -c Release -r linux-arm64 --self-contained false -o ~/assistant-publish
sudo systemctl restart assistant

From here, the assistant starts on its own at boot, remembers the conversation, and stops recording when you stop talking. No more manual dotnet run.