Atom S3R AI Assistant is a local-first voice assistant built around an M5Stack Atom S3R and the M5 Echo Pyramid audio base. The device acts as a small physical AI appliance: it listens through the microphone array, sends speech to a local backend, receives a generated spoken response, and plays it back through the built-in speaker.
What It Does Today
- The user starts speaking through the Atom S3R device.
- The device streams microphone audio to a backend over WebSocket.
- Whisper transcribes the audio to text.
- Node-RED routes the request through the assistant pipeline.
- Ollama runs a local Gemma model to generate a concise response.
- Kokoro converts the response text into speech.
- Node-RED streams the generated PCM audio back to the Atom.
- The Atom plays the response through the Echo Pyramid speaker.
Main Components
- Atom S3R Firmware: Wi-Fi, WebSocket audio streaming, playback, LEDs, touch input, display, and local configuration.
- M5 Echo Pyramid Base: microphone input, audio codec, speaker output, RGB LED feedback, and touch input.
- Node-RED: orchestration and routing for the voice pipeline.
- Whisper: local speech recognition.
- Ollama and Gemma: local LLM response generation.
- Kokoro: local text-to-speech generation.
- LXC FastAPI Tools Service: reusable backend tools for future integrations.
Architecture
Atom S3R / Echo Pyramid -> Node-RED WebSocket receiver -> Whisper ASR -> Node-RED router and orchestration flow -> Local tools / OpenClaw / Ollama as needed -> Kokoro TTS -> Node-RED WebSocket sender -> Atom speaker playback
Project Status
The core voice pipeline is working. The next major milestone is moving from a conversational prototype to a tool-using assistant by adding reusable backend APIs for real-world tasks.