Atom S3R AI Assistant

A local-first voice assistant built around an M5Stack Atom S3R, local speech recognition, local LLM responses, and text-to-speech.

Project Summary

Atom S3R AI Assistant is a local-first voice assistant built around an M5Stack Atom S3R and the M5 Echo Pyramid audio base. The device behaves like a small physical AI appliance: it listens through the microphone array, sends speech to a local backend, receives a generated spoken response, and plays it back through the built-in speaker.

The project brings together embedded firmware, local speech recognition, local language models, text-to-speech, Node-RED orchestration, and a reusable backend tools service. The goal is not just a chatbot in a box, but a privacy-conscious assistant that can eventually interact with notes, tasks, home automation, notifications, and practical tools.

Project Pages

This project currently keeps its component notes, architecture, and roadmap on this main page. Dedicated component-list and installation pages can be split out when the hardware and backend setup documentation is ready.

  • GitHub repository: planned public link after the repositories are reviewed for secrets and cleaned for release.
  • Media and demo videos: planned.

What It Does Today

The current prototype supports a complete local voice loop:

  1. The user starts speaking through the Atom S3R device.
  2. The device streams microphone audio to a backend over WebSocket.
  3. Whisper transcribes the audio to text.
  4. Node-RED routes the request through the assistant pipeline.
  5. Ollama runs a local Gemma model to generate a concise response.
  6. Kokoro converts the response text into speech.
  7. Node-RED streams the generated PCM audio back to the Atom.
  8. The Atom plays the response through the Echo Pyramid speaker.

Current Features

  • Physical voice assistant built on Atom S3R and M5 Echo Pyramid.
  • Microphone audio streaming from the device to a local backend.
  • Local speech recognition using Whisper.
  • Local LLM response generation through Ollama and Gemma.
  • Local text-to-speech generation through Kokoro.
  • WebSocket-based audio streaming in both directions.
  • LED states for recording, thinking, speaking, and idle.
  • On-device display for connectivity and status feedback.
  • Atom-hosted web control panel for assistant settings.
  • Configurable assistant voice, volume, response length, role prompt, and user profile.
  • Backend orchestration through Node-RED.
  • Initial LXC-based FastAPI tools service with a working health endpoint.

Architecture

Atom S3R / Echo Pyramid
-> Node-RED WebSocket receiver
-> Whisper ASR
-> Node-RED router and orchestration flow
-> Local tools / OpenClaw / Ollama as needed
-> Kokoro TTS
-> Node-RED WebSocket sender
-> Atom speaker playback

The architecture keeps the embedded device focused on physical interaction, audio I/O, display, LEDs, and local configuration. The backend handles orchestration, speech recognition, language model calls, speech synthesis, and future tool integrations.

Main Components

  • Atom S3R Firmware: Wi-Fi, WebSocket audio streaming, playback, LEDs, touch input, display, and local configuration.
  • M5 Echo Pyramid Base: microphone input, audio codec, speaker output, RGB LED feedback, and touch input.
  • Node-RED: orchestration and routing for the voice pipeline.
  • Whisper: local speech recognition.
  • Ollama and Gemma: local LLM response generation.
  • Kokoro: local text-to-speech generation.
  • LXC FastAPI Tools Service: reusable backend tools for future integrations.

Services And Technologies Used

Current services and technologies:

  • M5Stack Atom S3R and M5 Echo Pyramid for the physical assistant hardware.
  • Node-RED for backend orchestration.
  • Whisper for local speech-to-text.
  • Ollama with Gemma for local LLM inference.
  • Kokoro for local text-to-speech.
  • FastAPI in an LXC container for reusable backend tools.
  • WebSocket audio streaming between the device and backend.

Planned integrations:

  • Google Calendar and Google Tasks.
  • Obsidian note operations through controlled tools.
  • Home Assistant queries and selected control actions.
  • Weather, topic tracking, and research tools.
  • OpenClaw or MCP-compatible workflows where useful.

Design Principles

  • Keep the physical device simple and reliable.
  • Keep credentials and integrations off the microcontroller.
  • Use local models and local services where possible.
  • Use deterministic tools for exact tasks and let the LLM handle language and phrasing.
  • Add confirmations for risky actions.
  • Make backend tools reusable outside the voice assistant.

Roadmap

  • Add a currency exchange endpoint to the LXC tools service.
  • Add a Node-RED intent router after speech recognition.
  • Route practical requests to deterministic backend tools.
  • Add Obsidian note append and edit tools.
  • Add Google Tasks and Google Calendar integrations.
  • Add Home Assistant tools with a safety policy.
  • Add a smart notification queue and priority rules.
  • Improve the Atom web control panel with backend settings and tool preferences.

Project Status

The core voice pipeline is working end to end. The next milestone is moving from a conversational prototype to a tool-using assistant by adding reusable backend APIs for real-world tasks.