Is my CPU-only HPC cluster any good for AI-accelerated CFD?

Published in

Becoming Human: Artificial Intelligence Magazine

7 min readApr 30, 2021

This is a typical question my team at byteLAKE is asked every time we only even mention the notorious acronym AI (Artificial Intelligence) in a sentence. In the CFD (Computational Fluid Dynamics) context, this is a very valid concern as the CFD world is mostly using CPU-only architectures to run their simulations. There are, however, some efforts and products that can help you accelerate such simulations with accelerators like GPUs or FPGAs. BTW, we have tried that route at byteLAKE (a story about which you can read in my previous posts i.e. www.byteLAKE.com/en/AI4CFD-pt1). Nevertheless, the efforts needed, the costs, and the outcomes or acceleration level if you like, collectively have not sounded too compelling for us and therefore we eventually abandoned this path. Going forward and seeing very promising results with machine learning for some of our HPC (High-Performance Computing) projects, we have made a strategic decision to invest in leveraging AI to significantly accelerate and cut time to results for CFD simulations. That’s how byteLAKE’s CFD Suite was born. If you want to quickly jump to a post where I described our efforts in that matter in more detail, click here www.bytelake.com/en/AI4CFD-pt5.

CFD Suite is a collection of innovative AI Models
for Computational Fluid Dynamics (CFD) acceleration. Learn more at www.byteLAKE.com/en/CFDSuite.

Going back to the question from the headline: is my CPU-only HPC cluster (or any other machine: a laptop/PC, server/single node) any good for AI-accelerated CFD and byteLAKE’s CFD Suite in particular? The background of the question is, as I understand it, along the lines that… since CFD Suite uses AI to accelerate CFD simulations, it also requires training of the AI models. Training happens on GPUs. But if I do not have any, will I be able to train the models within a reasonable time?

Well, the quick answer is yes. byteLAKE will do it for you. To get started with CFD Suite, you just need to follow these 3 simple steps:

In a growing number of cases, we must add that we offer off-the-shelf AI models that have already been generalized for certain use cases. Generalization means that the model can take any combination of input parameters and ranges like a regular solver and still produce quality outcomes. Let me emphasize one more aspect. CFD Suite has been designed to act as an add-on to your existing workflow/CAE tools. Therefore there is no need for any data formatting or changes. CFD Suite takes the data in the same format (i.e. 3D) as your traditional solver and the results are compatible with traditional solvers as well.

Nevertheless, we have performed a comprehensive benchmark to check whether it would be possible and if so, how long it would take to train the CFD Suite’s AI models on a CPU-only HPC cluster.

Is my CPU-only HPC cluster any good for AI-accelerated CFD and byteLAKE’s CFD Suite in particular? YES! First of all, you do not need to worry about the training part as byteLAKE can do so for you. However, if you still want to do the AI training on your own, below is a performance benchmark comparing CPU-based training vs. GPU-based training. SPOILER: at the end of the day what you care about is inferencing=predictions (AI-accelerated simulations) and here Intel Gold CPU delivers 10x faster results than NVIDIA V100. Details below.

CFD Suite: AI models training on CPU-only vs. CPU+GPU architectures

CFD Suite deployment starts with the AI model(s) training. Therefore, past CFD simulations are required (usually between 8–50 such simulations) to produce accurate predictions and ensure AI models' proper generalization level. Generalization enables predictions across a wide range of various input parameter values incl. geometries.

CFD acceleration with AI (CFD Suite, AI models training)

We executed such training on various architectures, including:

1. Single HPC node: 2* Intel Xeon Gold 6148 CPU @ 2.40GHz and 2 * NVIDIA V100 16GB, and 400GB RAM (Gold, V100 respectively)

2. Intel CPU-only HPC cluster, BEM supercomputer (860 TFLOPS); 22,000 cores; 724 nodes; 1,600 Intel Xeon CPU E5–2670 @ 2.30GHz — 12 cores processors; 74,6 TB RAM (BEM for cluster or E5–2670 for a single node)

3. Desktop platform: Intel Core i7–3770 CPU @ 3.40GHz — 4 cores (Core-i7 or i7) + NVIDIA GeForce GTX TITAN GPU (TITAN)

4. Single HPC node: Intel Xeon CPU E5–2695 @ 2.30GHz — 12 cores (E5–2695)

The results were as follows:

CFD Suite’s AI modules training duration across various hardware configurations

The CFD models are generally memory-bound algorithms. Here we use 3D meshes. For the AI acceleration, we have a relatively big amount of data to feed the model. It enforces the use of reduced AI model architectures to make it possible to feed the network. Consequently, we have a lot of data processed by a relatively small network (up to 16 layers).

As result, we can observe that:

· There is no expected speedup across a single node (1 Gold, 20 cores vs. 2 Golds, 40 cores). Poor performance improvement across an OpenMP (shared memory model) parallelization when using more than 20 threads within a single node (speedup by a factor of ~1.11x). The reason is that the training is not compute-intensive enough.

· There is also no big difference between a single V100 and TITAN — since the training is not compute-intensive enough.

· The performance improvement is much better in the case of distributed training (1xV100 vs 2xV100, or BEM — up to 64 nodes).

· The cluster implementation (BEM) based on the Horovod framework allows us to overtake the performance of a single V100 using 8 nodes. 16 nodes are 1.3x faster than 2xV100. By comparing the Gold results and a single node of BEM (single E5–2670 CPU), we can assume that a cluster with 8xGold would allow us to achieve comparable results versus 2xV100.

In summary, for AI models training, we can assume that 4 nodes powered by Intel Gold CPU will give comparable performance as a single NVIDIA’s V100 GPU.

Verdict

The answer is YES, you can still perform CFD Suite training on CPU-only machines. And if you happen to have a cluster powered by Intel Gold CPUs, 4 nodes of your machine will give comparable performance as a single V100 GPU. That’s it for the training part. However, if you consider upgrading your CFD tools with byteLAKE’s CFD Suite, what really matters is the inferencing part — prediction. And here, I have some great news for you. Due to the complexity of the AI models, we’re using, CFD Suite generates the CFD simulation results almost 10 times faster on a single Intel Gold CPU than on a V100 GPU.

If you want to take a deeper dive into the benchmark, have a look at my other post where I summarized the procedure and results. Link: www.bytelake.com/en/AI4CFD-pt7.

Is my CPU-only HPC cluster any good for AI-accelerated CFD and byteLAKE’s CFD Suite in particular? YES! First of all, you do not need to worry about the training part as byteLAKE can do so for you. However, if you still want to do the AI training on your own, below is a performance benchmark comparing CPU-based training vs. GPU-based training. SPOILER: at the end of the day what you care about is inferencing=predictions (AI-accelerated simulations) and here Intel Gold CPU delivers 10x faster results than NVIDIA V100. Details below.