How to Choose Hardware for Edge ML!

It’s almost common knowledge that machine learning requires more computational power than your average day-to-day tasks. With a variety of offerings from Google, NVIDIA, Intel and others that range from TPUs, tensor cores to GPUs, it’s becoming increasingly difficult to choose hardware for Edge ML tasks. In this article, I aim to shed some light on the technologies that are available on the market and your options when it comes to hardware for both machine learning training and inferencing on the edge. Let’s go!

What goes on behind Machine Learning?

Machine learning is a broad field that has seen tremendous progress in the recent years. It is based on the principle that a computer can autonomously improve its own performance on a given task by learning from data – sometimes even beyond the capabilities of humans.

Today, machine learning can perform many advanced tasks, including but not limited to:

  • Image Classification / Object Detection
  • Audio Scene / Speech Recognition
  • Forecasting (Eg. Weather & Stock Markets)
  • Anomaly Detection

Neural Networks & the Hardware Crunch

Some of these latest machine learning advancements are thanks to the use of Neural Networks, which are modeled after the human brain.

Each artificial neuron in the network is implemented as a nonlinear function that carries a weight, and they are arranged in multiple layers that “vote” on the model’s output. Modern Deep Neural Networks (DNNs) can use up to thousands of neurons that are arranged in hundreds of layers, but why is this important for hardware?

Well, the secret lies in what happens during machine learning training and inferencing. Without going into the technical specifics, the training process of neural networks involves continuously passing data forwards and backwards through the model while progressively adjusting the weights of each of the neurons. This may only be a few calculations for a small network, but this quickly grows to a massive scale when we are talking about DNNs!

For running inferences (making predictions) with a model, only the forward pass is required, which eases the computational requirement. Nonetheless, if you don’t have the proper hardware, your ML model can run too slowly for real time applications!

To read more about the math behind machine learning with neural networks, Gavril Ognjanovski has a great article for getting started!

Note: Because of the sheer amount of power required for training machine learning models, it is not feasible to train machine learning models on the edge. Instead, Edge ML applications will only perform inferencing (producing outputs from a trained model).

The Solution: Specialised Hardware

As machine learning has advanced, it’s no surprise that the industry has tried to keep up in hardware to support the increasing computational workloads. These include advancements to CPUs (Central Processing Units), GPUs (Graphics Processing Units), as well as new ASICs (Application Specific Integrated Circuits) that have been specially customised to support the calculations needed for DNNs!

Today, I will be talking about the following pieces of hardware for machine learning:

  • Google’s TPUs
  • Intel’s FPGAs & ASICs
  • TinyML: Microcontroller Units for Inferencing
  • Cloud Computing for Training & Deployment

Google’s Tensor Processing Units

Tensor Processing Units (TPUs) are Google’s purpose-built ASICs designed specially for neural network machine learning. In particular, they are intended for use with Google’s own TensorFlow machine learning framework.

Where are Google’s TPUs found?

Google uses TPUs to accelerate their machine learning workloads in their data centres as part of their cloud infrastructure. For edge applications, Google’s first TPU was introduced in 2016, which then evolved into the latest Edge TPU that is widely popular today.

The Edge TPU has been integrated into a variety of products under the Coral brand, so that it can handle machine learning workloads in different hardware configurations. For example, the Edge TPU has been implemented in single board computers (SBCs), systems on modules (SoMs), USB accessories, mini PCI-e, and more!

What can Google’s Edge TPU do?

The Edge TPU can only run TensorFlow lite, which is a performance and resource optimised version of the full TensorFlow for edge devices. Take note that only forward-pass operations can be accelerated, which means that the Edge TPU is more useful for performing machine learning inferences (as opposed to training). In addition, the Edge TPU only supports 8-bit operations, so the models that you want to run on it must first have been optimised with quantization.

Despite these caveats, Google’s TPUs offer substantial power for edge machine learning while remaining power efficient. For example, Google states that the Edge TPU will enable users to execute state-of-the-art mobile vision models such as MobileNetv2 at nearly 400 FPS!

For those who are more concerned about processor numbers, thats:

  • 4 Trillion (fixed-point) operations per second (4 TOPS),
  • at 2 watts of power (2 TOPS per watt)!

Getting Started with a Google Edge TPU

If you’re raring to explore some of the options powered by a Google Edge TPU, you’re in luck as I have some of the most popular recommendations for you today!

Coral Dev Board – 1GB RAM Version

The Coral Dev board is an all-in-one platform that allows you to quickly prototype on-device machine learning products and applications. You can even easily scale to production through its flexible SoM design! Capable of running Debian Linux and paired with rich peripherals, the Coral Dev Board is well equipped to take advantage of its on-board Edge TPU for any ML purpose.

To learn more about the Coral Dev Board, please visit its product page on the Seeed Online Store!

Coral USB Accelerator

For those who want to use Google’s Edge TPU with your existing development boards, the Coral USB Accelerator is your best bet. Packaged as a USB accessory, you can easily interface with and accelerate the machine learning workloads on any Linux SBC like your Raspberry Pi!

Keen to learn more about the Coral USB Accelerator? Visit its product page on the Seeed Online Store now!

Coral M.2 Accelerator with Dual Edge TPU

The Coral M.2 Accelerator is equipped with not one, but two Edge TPUs to handle your larger machine learning workloads. With the flexible and common M.2 connector, you can use this module with most Debian-based Linux or Windows 10 systems!

Visit the Seeed Online Store to learn more about the Coral M.2 Accelerator with Dual Edge TPU now!

NVIDIA’s Graphic Processing Units

NVIDIA’s graphics cards (or GPUs) have been a big part of the AI industry for a long time, but what do graphics have to do with machine learning? Well, the answer actually lies in the hardware that supports graphical rendering. Compared to a CPU, a GPU tends to have more logical cores (or arithmetic logic units), which allows it to run many operations in parallel! In theory, this means that GPUs can perform many more operations than CPUs, which drastically speeds up the performance of machine learning.


What can NVIDIA’s GPUs do?

NVIDIA has an impressive line of enterprise or desktop class GPUs based on their own Turing Architecture, as well as a selection of developer kits that support machine learning on the edge. Most of them contain features such as CUDA cores, Tensor cores and CuDNN libraries to accelerate machine learning performances.

CUDA Cores, short for Compute Unified Device Architecture, are used to enable general purpose computing with NVIDIA’s CUDA-enabled GPUs to take advantage of their parallel processing capabilities.

Tensor Cores are the primary differentiator from NVIDIA in scaling up deep learning, delivering up to 125 TFLOPS (trillion floating point operations per second) of optimised performance for matrix calculations in Neural Network training and inferencing.

CuDNN Libraries provide developers with an enhanced interface to utilise Tensor Cores for deep learning applications and networks.

NVIDIA’s Edge ML Offerings

While NVIDIA has had great success with their more powerful GPU lines, their Edge ML offerings in the NVIDIA Jetson Nano and NVIDIA Jetson Xavier NX developer kits are nothing to scoff at either. They are both compatible with NVIDIA’s JetPack SDK, which includes:

  • Bootloader, Linux kernel, Firmwares & Drivers
  • TensorRT – NVIDIA’s high performance deep learning inference runtime
  • cuDNN Library, CUDA Toolkit
  • VisionWorks – Software development package for Computer Vision

NVIDIA Jetson Xavier NX Developer Kit

The Jerson Xavier NX is the most powerful platform offered by NVIDIA for edge development. Equipped with an NVIDIA GPU with 384 CUDA cores and 48 Tensor cores, it can offer astounding performance of 6 TFLOPS (FP16) and 21 TOPS (INT8)!

Keen to learn more about the NVIDIA Jetson Xavier NX? Visit the Seeed Online Store!

NVIDIA Jetson Nano 2GB Developer Kit

If you’re looking for a more affordable option, consider the NVIDIA Jetson Nano Developer Kit instead! While it isn’t listed to have CUDA or Tensor cores, it’s still powered by a powerful 128-core NVIDIA Maxwell GPU and Jetpack SDK. This is a perfect entry-level option for educators, students and enthusiasts that won’t break the bank!

Pick up your very own NVIDIA Jetson Nano Developer Kit on the Seeed Online Store!

Intel’s Edge ML Product Lineup

In the field of AI research, Intel continues to push the limits of computing by producing hardware for both data centre workloads and low-power machine learning on the edge. Amongst their diverse lineup, we will be focusing on their FPGAs and Movidius VPU!

Intel’s AI FPGAs

FPGAs, or Field Programmable Gate Arrays, are essentially blank canvas integrated circuits made up of configurable blocks. This means that you can specifically customise them for high-performance, low-latency applications, such as for machine learning!

Intel offers a wide range of FPGAs with various features and benefits. For example, their Intel® Stratix® 10 NX FPGAs feature an AI Tensor Block that allows it to perform 143 INT8 TOPS or 286 INT4 TOPS – very impressive performance! To learn more about FPGAs, their uses and how they compare to ASICs, kindly visit my previous article on the subject.

Intel Movidius VPU

Intel’s Movidius™ Myriad™ X Vision Processing Unit (VPU) is specially designed to power visual intelligence on the edge. It features a Neural Compute Engine, which enables the VPU to reach over 1 TOPS of compute performance.

In addition, the Intel Movidius™ Myriad™ X VPU is equipped with:

  • 16 Programmable 128-bit VLIW Vector Processors for Concurrent Imaging & Vision Application Pipelines
  • Enhanced Vision Accelerators for Optimised Computation without Additional Overhead

Intel Movidius MA245X AI Kit Compatible w/ Intel Movidius Stick [HornedSungem]

The HornedSungem AI kit is based on Intel’s Movidius MA245X VPU, and designed with plug-and-play functionalities in mind. It is highly compatible with various PCs and even SBCs like the Raspberry Pi, and can be integrated to implement computer vision functionalities like face detection, recognition, object detection, and more!

If you’re curious to learn more about the Intel Movidius MA245X AI Kit, visit the Seeed Online Store today!

TinyML: Edge ML Inferencing on Microcontrollers?

So far, we’ve talked about specialised hardware for training and inferencing, and we know that ML inferencing uses far less computational hardware than training. Thanks to TinyML, we’re beginning to see applications that use ML inferencing on the tiniest of edge devices.

Yes – even including microcontrollers!

TinyML, short for Tiny Machine Learning, is a subset of machine learning that employs optimisation techniques to reduce the computational space and power required by machine learning models. Specifically, it aims to bring ML inference applications to compact, power-efficient, and most importantly affordable microcontroller units (MCUs).

Further fuelling the TinyML movement, companies like Edge Impulse & OpenMV are helping to make Edge AI more accessible to everyone through user-friendly platforms. With the versatility of MCUs, it’s no surprise that they’re one of the most popular areas of machine learning research and development today!

Best Microcontrollers for Edge AI

Meeting software improvements halfway, microcontrollers are also enjoying more powerful hardware performance with better and stronger microprocessors every year. With many microcontroller choices on the market, it can be difficult to make an informed decision. Today, I’ve got you covered with the following microcontroller recommendations for machine learning on the edge!

Seeeduino XIAO

The Seeeduino XIAO is the smallest Arduino compatible board in the Seeeduino Family. Despite its small size, the Seeeduino XIAO is equipped with the powerful SAMD21 microchip and a variety of hardware interfaces – truly putting the tiny in TinyML!

Product Features:

  • ARM Cortex-M0+ 32bit 48MHz microcontroller (SAMD21G18) with 256KB Flash, 32KB SRAM
  • Compatible with Arduino IDE & MicroPython
  • Easy Project Operation: Breadboard-friendly
  • Small Size: As small as a thumb(20×17.5mm) for wearable devices and small projects.
  • Multiple development interfaces: 11 digital/analog pins, 10 PWM Pins, 1 DAC output, 1 SWD Bonding pad interface, 1 I2C interface, 1 UART interface, 1 SPI interface.

Keen to learn more about the Seeeduino XIAO? Visit its product page on our Seeed Online Store now!

Wio Terminal

The Wio Terminal is a complete Arduino development platform based on the ATSAMD51, with wireless connectivity powered by Realtek RTL8720DN. As an all-in-one microcontroller, it has an onboard 2.4” LCD Display, IMU, microphone, buzzer, microSD card slot, light sensor & infrared emitter. The Wio Terminal is officially supported by Edge Impulse, which means that you can easily use it to collect data, train your machine learning model, and finally deploy an optimised ML application!

Product Features:

  • Powerful MCU: Microchip ATSAMD51P19 with ARM Cortex-M4F core running at 120MHz
  • Reliable Wireless Connectivity: Equipped with Realtek RTL8720DN, dual-band 2.4GHz / 5GHz Wi-Fi (supported only by Arduino)
  • Highly Integrated Design: 2.4” LCD Screen, IMU and more practical add-ons housed in a compact enclosure with built-in magnets & mounting holes
  • Raspberry Pi 40-pin Compatible GPIO
  • Compatible with over 300 plug&play Grove modules to explore with IoT
  • USB OTG Support
  • Support Arduino, CircuitPython, Micropython, ArduPy, AT Firmware, Visual Studio Code
  • TELEC Certified

If you’re interested to pick up a Wio Terminal, please visit its product page on the Seeed Online Store!

Cloud Computing

On the flip side, if you’re looking for hardware that can do heavyweight machine learning training, cloud computing might be for you. The general idea behind Cloud Computing in Machine Learning is that the processing power required for training or running inferences is abstracted away from the local device.

Source: DataFlair

Cloud computing is very flexible – You can use it to perform both machine learning training and deployment!

Training – Perform resource-intensive calculations for model training on large datasets.

Deployment – Host machine learning applications with trained models in the cloud, respond to input data from edge devices and return the output of the models.

Advantages of Cloud Computing

The most straightforward benefit of Cloud Computing for the average consumer or enthusiast is affordability and accessibility. For example, purchasing a powerful GPU for machine learning can cost several thousands of dollars for modern applications.

However, if we use cloud computing where the capital costs are covered by the service provider, we will only have to pay for the computing power we used. In general, this means lower costs for the individual! You can also count on cloud services to provide relatively recent and powerful hardware, which means you won’t have to fork out additional cash to stay updated.

The other advantage is access to data. All IoT devices are connected through the cloud, so these centralised computers can take advantage of many streams of data. Hence, we can train more robust models and gain more insights into, for example, supply chain performance. In addition, the centralised nature of cloud computing also makes it easier to update workflows and frameworks.

Popular Cloud Computing Platforms

As the demand for cloud and IoT grows, many cloud computing platforms are starting to appear on the market, such as those from Amazon, Google, & Microsoft. While they largely offer similar services, there are key differences in user experience and capabilities. In my opinion, your choice of cloud computing platform will ultimately boil down to cost and personal preference!

Amazon Web Services

Amazon Web Services (AWS) is the cloud computing platform run by Amazon that provides on-demand services including data storage and analysis. Perhaps what makes AWS popular is the ability to allocate actual hardware such as CPUs, GPUs, storage and RAM to create highly specific cloud computing instances for specialised tasks!

For machine learning, AWS offers the Amazon EC2 P4d instances that are equipped with the latest NVIDIA A100 Tensor Core GPUs. They also offer various AI-powered applications that can be readily deployed, such as their search and recommendation engines and tools for data extraction / business analysis.

Google Cloud Platform

Google Cloud Platform (GCP) is run by Google and offers an extensive suite of services for computing, network, storage, machine learning and IoT. Similar to AWS, the Google Cloud ML Engine also allows you to provision hardware for offloading machine learning workloads to the cloud. GCP also gives you access to Google’s machine learning APIs like Google’s Speech-to-Text, Vision AI, and Dialogflow.

To handle large datasets, GCP employs BigQuery which allows you to analyse petabytes of data efficiently with an SQL syntax. You can even directly and easily build machine learning models inside of BigQuery with BigQuery ML in SQL.

Special Mention: Google Colaboratory

Google also offers Colaboratory, which is an online runtime that allows you to run Jupyter notebooks that are commonly used in data science and machine learning. Here’s the amazing part: You will be provisioned with a GPU for machine learning workloads, even as a free user! If you need a more powerful GPU, you can also upgrade to Google Colab Pro at 10 dollars a month.

Microsoft Azure

Microsoft Azure is Microsoft’s cloud computing service, and similarly offers a wide range of cloud products and services ranging from virtual machines to data streams to IoT. For machine learning, Azure Machine Learning allows you to build, train and deploy machine learning models easily while taking advantage of cloud resources. You’ll also be able to use Microsoft’s own suite of AI products, such as Azure Cognitive Search, Azure Bot Services & Azure Cognitive Services.

Wrapping Up

To summarise this article, there are many ways to meet the hardware requirements of machine learning. First, we’ve covered the specialised hardware and products offered by Google, NVIDIA and Intel for both training and inferencing. Then, we talked about new possibilities surrounding Edge AI inferences with TinyML for microcontrollers. Finally, we also discussed the various cloud computing options as a means to save costs and take advantage of cloud services for our machine learning projects. 

Ultimately, which product or service you choose will depend greatly on your needs and your budget. Nonetheless, I hope that you’ve gotten a clearer picture of the tools at your disposal, and are better prepared to tackle your machine learning projects!

If you’re interested to learn more, here are some resources that may help you:

About Author


April 2021