Building an AI Voice Assistant on NVIDIA Jetson – Voice Activation & Speech-to-Text

As voice interfaces become more prevalent in smart homes and robotics, developers are increasingly seeking low-latency, private, and offline AI voice assistants. In this tutorial, we’ll show you how to build a fully local voice assistant on the NVIDIA Jetson platform. Unlike cloud-based services, this assistant runs entirely on edge hardware, offering real-time performance, enhanced data privacy, and no internet dependency. Here we’ll focus on building how to:

  • Capture microphone input
  • Activate with a hotword (wake word)
  • Convert speech to text using Whisper, an open-source speech recognition model

Whether you’re building smart home devices, service robots, or edge AI prototypes, this guide will help you deploy a powerful voice pipeline using Jetson Orin and open-source tools.

🛠️ Hardware You’ll Need

To follow along, you’ll need:

All the processing will be done entirely offline, directly on the Jetson device.

🎙️ Step 1: Capture Voice Input & Detect Wake Word

We’ll use a lightweight C++ implementation to handle microphone input and hotword detection.

🔧 Install Dependencies

Run the following commands on your Jetson:

sudo apt install nlohmann-json3-dev libcurl4-openssl-dev mpg123
git clone https://github.com/jjjadand/record-activate.git

🔧 Configure Microphone Settings

Open respeaker.cpp and adjust these parameters based on your mic and use case:

#define SAMPLE_RATE 44100
#define CHANNELS 2
#define RECORD_MS 20000
#define SILENCE_MS 4000
#define ENERGY_VOICE 2000
#define DEVICE_NAME "plughw:2,0" // Use 'arecord -l' to get this
  • ENERGY_VOICE defines how sensitive the detection is
  • Change DEVICE_NAME based on your mic’s hardware ID

🏗️ Build the Recorder

cd record-activate/build
cmake .. && make

This will generate two binaries: one to detect voice (record_lite) and another to send audio to Whisper (wav2text).

🧠 Step 2: Run Whisper Locally for Speech-to-Text

We’ll use whisper.cpp, a lightweight and fast C++ implementation of OpenAI’s Whisper model.

🔽 Download & Compile

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
cmake -B build && cmake --build build -j

🎯 Optional: Quantize the Model

To reduce memory usage:

./build/bin/quantize models/ggml-base.en.bin models/ggml-base.en-q5_0.bin q5_0

This creates a smaller, faster version of the English Whisper model.

🔁 Step 3: Connect Voice Input with Whisper Transcription

🔌 Start the Whisper Server

./build/bin/whisper-server -m models/ggml-base.en-q5_0.bin -t 8

This runs a local HTTP server that accepts .wav files and returns transcribed text.

🗣️ Start the Voice Assistant Pipeline

Back in the record-activate folder:

pasuspender -- sudo ./wav2text
pasuspender -- sudo ./record_lite
  • When the hotword is detected, the system captures voice input and sends it to Whisper
  • Once transcribed, the assistant plays an activation sound (activate.mp3)
  • You can then use the transcribed command to trigger AI actions or send it to a local language model

✅ What’s Working Now

At this point, you’ve built a local voice assistant that:

  • Listens for a hotword
  • Captures your voice
  • Converts it to text in real-time
  • Works completely offline, with no cloud dependencies

🧭 Next Steps

  • Running a quantized LLM to understand and respond to commands
  • Adding text-to-speech (TTS) output
  • Controlling smart home devices or APIs using voice

About Author

Calendar

July 2025
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
28293031