Which AI accelerator shall I pick for my next device?

--

Artificial Intelligence on edge is booming. We see more and more categories of devices getting some sort of intelligence these days. From vacuum cleaners getting smarter with every generation and personal assistants popping up in all sorts of flavors, all the way to industrial solutions bringing automation across various businesses. Before all of these devices hit the shelves, their designers need to make at least few not so easy choices. Besides the question of deployment strategy (which I briefly covered in one of my previous posts), here goes another interesting one:

which hardware acceleration shall I embed into my device?

At the first glance, the question itself might not sound interesting from the end user perspective but it really impacts them significantly. It is a decision which gives or takes certain features from the end product. It might impact things like:

  • will the device be able to learn new things? or will it be mainly executing the pre-defined tasks?
  • how big is the device going to be?
  • how power efficient does it have to be?
  • how complex workloads or calculations is it going to handle?

And of course some of these can be worked around whatever choice we probably make. For instance I can imagine a device that per design cannot learn new things but it can get new skills with another software update. Also a device capable of handling small to medium size jobs can also execute larger tasks, only this might last like forever… etc. Therefore, let’s set a stage here: performance and near-real-time learning capabilities sometimes do matter.

Anyway, my team at byteLAKE took 2 most common AI accelerators (at least those that most often popped up in our projects to date — Jul’18) and benchmarked them. You can read the complete study in our presentation on SlideShare. However, let me briefly summarize the conclusions here.

Trending AI Articles:

1. Keras Cheat Sheet: Neural Networks in Python

2. Making a Simple Neural Network

3. Google will beat Apple at its own game with superior AI

We tested: NVIDIA GPU and Intel Movidius (Myriad 2).

Both were attached to an example edge device (edge server) which in our case was Lenovo’s Tiny PC. The models we used were: NVIDIA Quadro P1000 and a configuration with 2 Movidius cards (PCIe).

On the software side, we used both Caffe and Tensorflow frameworks. Also, we tested the performance of both solutions while using some of the most common computer vision pre-trained models.

The results of our study show that using a GPU for objects detection allows to analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips do not provide desired efficiency in the given scenario. However, it still can be successfully used in the applications where real-time processing is not necessary and near-real-time is enough.

Based on the knowledge gained during this study, we concluded that the advantage of NVIDIA GPU over Intel Movidius VPU is not only in performance of computations. The GPU allows for both: training of the DNNs and interference whereas Movidius is designed only for a cooperation with pre-trained models.

Another difference between both accelerators is about their support for various AI libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano.

The difference between these two accelerators can also be noticed on the side of the programming process. In many cases the implementation of an application which uses GPU does not require any special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support for GPU computing (both training and interference) out of the box. In Movidius case, however, it is required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in the chain.

When comparing both accelerators, another difference is also the area of usage. While the GPU is a powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators can be an obstacle in many areas. GPU offers notable high performance of computations (order of few TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is a low-power AI solution dedicated for on-device computer vision. The size of device and power consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security.

Given the context above, here are some additional remarks one might consider when deciding which accelerator is a better fit for a given design. However, it is important to emphasize that:

the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.

Therefore looking at these only thru the perspective of the performance benchmarking results might be misleading. To properly choose between Movidius and NVIDIA GPU one should foremost take into account the intended application rather than the performance benchmark results only. Movidius is primarily designed to execute the AI workloads based on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it really depends whether the planned device is to work in execute-only-mode or be capable of updating/re-training its models (brains) as well. And of course these make sense as long as we are talking of executing such tasks within a reasonable time frame.

Let me know what other accelerators you are using in your embedded designs. Reach out to me directly and let’s exchange experiences around the subjects.

--

--

Co Founder@byteLAKE | Turning Data Into Information for Manufacturing, Automotive, Paper,Chemical,Energy sectors | AI-accelerated CFD | Self-Checkout for Retail