Keeping Pace with Dynamic Content: Raspberry Pi Audio Source Localization & Computer Vision for Advanced Camera Tracking


The demand for advanced camera technologies has escalated in the burgeoning era of self-media, where dynamic content creation is paramount. A novel idea emerges: integrating audio source localization with computer vision (CV) technologies to enhance automatic camera tracking capabilities. This integration aims to address the limitations of CV in tracking fast-moving objects, especially in scenarios where visual cues are insufficient. Using sound localization as a complementary mechanism, the system can accurately track subjects in real-time, even in challenging conditions like low visibility or rapid movement. This innovative approach promises to revolutionize content creation, offering seamless tracking solutions that keep pace with the dynamic nature of live broadcasting, video production, and interactive media.

Key Info

Seeed Solution: ReSpeaker USB Mic Array – Seeed Studio

Industry: Media / AI Camera

Key Components

For a simple example, here’s the list of hardware you’ll need

Raspberry Pi 4B:

  • To serve as the central processing unit for handling audio and visual data.
  • To run tracking algorithms and data fusion logic.

2D Gimbal (Two-Axis Gimbal):

  • To provide vertical and horizontal movement capabilities for the camera.
  • To adjust the camera’s orientation based on the output of the tracking algorithm.

Microphone Array:

  • For capturing sound and performing audio source localization.
  • Must be compatible with and able to connect to the Raspberry Pi for processing.

Battery Pack:

  • To ensure mobility and an independent power source for the system.
  • The battery should have sufficient capacity to support the system’s operation for an adequate duration.

Cables and Connectors:

  • For connecting all components (Raspberry Pi, gimbal, microphone array).
  • Includes power cables, data cables, and any necessary adapters.

Mounting Hardware:

  • To securely attach the camera to the gimbal and the microphone array in an optimal position.
  • Includes brackets, screws, and any other mounting accessories.

Test against the list:

  1. STorM32 3-axis gimbals
  2. Raspberry Pi 4B
  3. ReSpeaker USB Mic-array & PiSugar 3 plus battery module
  4. Lipo battery (1000mAh)

Solution/How to build it?

Introlab ODAS
 Introlab ODAS is an open-source library for sound source localization, sound source tracking, and sound source separation, developed by F. Grondin and published on GitHub. The library is written entirely in C language and optimized for low-power embedded systems. It has the advantages of a small amount of calculation and fast executing speed.

Gimbals control
 The STorM32 controller board is mainly designed for aerial imaging. A pin interface is provided on the board for inputting remote control signals, which supports the signals of S-Bus, Sum-PPM, and PWM protocols. This controller board can be connected to the computer and use the corresponding graphical user interface (GUI) to adjust the pan mode, rotation speed, PID parameters, etc.

ReSpeaker USB Mic Array

  • There are 4 MEMS microphones on the ReSpeaker mic array that pick up sound from different angles, convert them into multi-dimensional audio signals, and transmit them to the Raspberry Pi through the USB cable. If you want to learn more, you can refer to the official wiki.

More Resources

“At the very beginning, I also considered several microphone arrays available on the market. After comparing them, I found that reSpeaker had more comprehensive documentation and was easier to develop with. Moreover, Seeed’s wiki website provided some simple tutorials and demos that were enlightening for users. Combined with the user-friendly development ecosystem of the Raspberry Pi, I was able to implement the desired functionalities quickly. As a result, I chose the reSpeaker array. Buying the gimbal directly from DJI was also one of my initial options, intending to utilize their SDK for developing sound source localization functionality. However, I discovered that the gimbal functions in the Osmo SDK at that time were not sufficiently open for development and posed challenges. Therefore, I switched to an open-source storm32 gimbal. Based on these factors, I created the current case.” said the innovative creator Weijie Yu, a Chinese Raspberry Pi enthusiast, who was also bringing his project to the Raspberry Pi Meetup during the Maker Faire Shenzhen and showcasing to the Raspberry Pi team there.

Seeed Studio's Raspberry Pi Ecosystem

Seeed Studio has been serving the Raspberry Pi user community since 2013 and took the lead to join the approved reseller and design partner. Since the first version of reTerminal in 2021, we have a series of products including reRouteredge controller series, and this year reTerminal DM, serving creators, makers, enthusiasts, students, engineers, enterprises as well as industries, and every scenario needing Raspberry Pi. 

More Resources

About Author


December 2023