Add Voice Interaction to LeKiwi Robot with reSpeaker Flex

Robots are becoming more intelligent, more autonomous, and more interactive. In this project, we combined the LeKiwi Robot platform with reSpeaker Flex to transform a Kiwi-drive robot into a voice-interactive embodied AI system capable of far-field wake word detection, natural language voice commands, real-time movement control, simultaneous audio playback and speech recognition, and hands-free human-robot interaction. By integrating voice AI with robotics, the system allows users to naturally speak to the robot and receive real-time responses in a much more intuitive and interactive way.
Why add the reSpeaker Flex microphone array to LeKiwi?
LeKiwi is already a highly flexible robotics platform. But after integrating reSpeaker Flex, the robot gains something fundamentally different: A true voice interaction interface.
This goes far beyond attaching a USB microphone.

reSpeaker Flex 4-microphone array provides onboard acoustic intelligence, including:
- Acoustic Echo Cancellation (AEC)
- Beamforming
- Noise Suppression (NS)
- Direction of Arrival (DoA)
- Far-field voice pickup
These capabilities allow the robot to continue understanding voice commands even while:
- Playing music
- Moving with motor noise
- Operating in noisy environments
This is one of the key challenges in embodied AI: How can robots reliably “hear” humans in the physical world?
reSpeaker Flex addresses this by turning audio into a perception layer rather than simple sound capture.
What Can the LeKiwi Robot Do?
Once integrated, the LeKiwi Robot becomes a fully voice-controlled robotic system.
Users can interact naturally through commands such as:
- “Hey Jarvis, move forward”
- “Turn left”
- “Strafe right”
- “Stop”
The system supports:
- Wake word activation
- Speech-to-text transcription
- LLM-based command understanding
- Voice feedback through TTS
- Real-time motor control
This creates a much more natural interaction loop between humans and robots.
Instead of pressing buttons or manually controlling movement, users simply speak to the robot.
System Architecture
The project combines multiple AI and robotics components into a single voice pipeline.
Hardware

The setup includes:
AI Pipeline
The interaction flow works as follows:
Wake Word → Speech Recognition → LLM Reasoning → TTS Response → Robot Movement
This architecture combines voice AI and robotics into a single embodied interaction system.
Why Far-Field Voice Pickup Matters
One of the most impressive aspects of the project is its far-field voice interaction capability.
In testing, the robot successfully responded to wake words and commands from distances of 5–7 meters.
This is extremely important for real robotics deployments because users are rarely standing directly next to the robot.
Far-field pickup allows robots to:
- Hear commands across rooms
- Interact naturally while moving
- Support hands-free operation
- Maintain responsiveness in dynamic environments
Combined with beamforming and noise suppression, reSpeaker Flex enables significantly more reliable interaction than conventional microphones.
Real-World Robotics Challenges
Voice interaction becomes much harder once robots begin moving and speaking simultaneously.
Robots generate noise themselves. Motors spin. Speakers play audio feedback. Users interact from different distances and directions. In real-world environments, robots also need to handle far-field voice pickup, sufficiently powerful onboard speaker output for clear conversational feedback across different positions, room reverberation, background conversations, and acoustically complex spaces with reflections or partial occlusions.
Without proper acoustic processing:
- The robot hears its own speaker output
- Motor noise interferes with ASR
- Wake word accuracy drops
- Speech recognition becomes unstable
In these scenarios, stable and natural human-robot interaction requires much more than simple audio input, it requires spatial audio perception, robust far-field voice capture, and real-time acoustic processing.
That’s why acoustic algorithms like beamforming, Acoustic Echo Cancellation (AEC), noise suppression, and Direction of Arrival (DoA) become essential for embodied AI systems.
With onboard AEC, reSpeaker Flex can suppress the robot’s own playback audio while preserving user speech.
Combined with beamforming, noise suppression, and far-field voice pickup, reSpeaker Flex enables the robot to continue understanding voice commands even while:
- Playing music
- Moving with motor noise
- Operating in noisy environments
This allows the robot to:
- Play music
- Speak through TTS
- Continue listening at the same time
This full-duplex interaction model is essential for natural embodied AI experiences.
More Than Just a Robot Demo
Projects like this demonstrate a broader trend in robotics:
Robots are evolving from passive machines into interactive AI systems.
Voice becomes one of the most natural interfaces for embodied AI because it allows:
- Hands-free interaction
- Low-latency control
- Natural communication
- Multimodal perception
And microphone arrays are becoming increasingly important as the “hearing system” behind that interaction.
Rather than simply recording audio, systems like reSpeaker Flex provide:
- Spatial hearing
- Acoustic perception
- Noise-robust interaction
- Real-time audio intelligence
This is what enables robots to operate more naturally in human environments.
Start Building Your Own Voice Robot
This project provides a powerful starting point for developers interested in:
- Embodied AI
- Voice-enabled robotics
- Local AI agents
- Human-robot interaction
- Edge AI audio systems
By combining LeKiwi with reSpeaker Flex, developers can rapidly prototype robots capable of natural speech interaction and real-world responsiveness.
Whether you’re building:
- AI companion robots
- Educational robotics
- Interactive assistants
- Smart service robots
- Experimental embodied AI systems
voice interaction is quickly becoming a core part of the experience.
And it starts with giving robots the ability to hear intelligently.
Get the reSpeaker Flex here: reSpeaker Flex 4-Mic Array: Split Type For Embodied AI | Seeed Studio
Get the LeKiwi Kit here: LeKiwi Full Kit (12V Verision)
Check the LeKiwi + reSpeaker Flex integration wiki here: Add Voice Interaction to Your LeKiwi Robot with reSpeaker Flex | Seeed Studio Wiki
Is your reSpeaker Flex upside down? Compared to the video and mic holes, it appears to be upside down.
Yes, you’re absolutely right, and thank you for pointing that out. You have a very sharp eye!
The microphone holes on reSpeaker Flex are intentionally placed on the back side of the PCB. This design allows the microphones to be positioned as close as possible to the sound source in embedded applications, while also minimizing the impact of other electronic components on the front side of the PCB.
In this particular LeKiwi setup, however, the reSpeaker Flex is mounted externally and elevated above the robot chassis, so the orientation has very little practical impact on voice pickup performance.
That said, your observation is completely correct. For desktop robots and similar applications, it is generally better to mount the reSpeaker Flex with the microphone holes facing upward or forward to achieve the best possible acoustic performance.
Thank you again for catching that and bringing it to our attention!