GPU-Server
Presently a few servers equipped with Nvidia GPUs (Tesla V100, RTX6000) are available. They have CUDA, OpenCL and TensorFlow (Python) installed.
The Pool Computers belonging to the pools "Berlin" and "Brandenburg" also possess a Nvidia GPU and are available to use with CUDA, OpenCL and Tensorflow. All Pool Computers in the remaining pool rooms utilize an Intel-GPU (onboard). They are available to use with OpenCL for computations.
The servers are also accessible through SSH or RDP (gruenau[9-10]). Please open a VPN-Connection when working from outside the HU-network.
Overview of the GPU-Servers and Computers
The following table details the currently available servers and computers with a GPU including additional information on the GPU type.
Server/PCs | GPU | CUDA | Miscellaneous | Slurm gres |
---|---|---|---|---|
gruenau1 |
2x Nvidia Tesla V100 1x Nvidia RTX6000 |
Y (11.8) | OpenCL,TensorFlow | |
gruenau2 | 3x Nvidia RTX6000 | Y (11.8) | OpenCL,TensorFlow | RTX6000 |
gruenau7 | 4 x Nvidia RTX A6000 | Y (11.8) | OpenCL,TensorFlow | |
gruenau8 | 4 x Nvidia RTX A6000 | Y (11.8) | OpenCL,TensorFlow | |
gruenau9 | 3 x Nvidia Tesla A100 | Y (11.8) | OpenCL,TensorFlow | A10080GB |
gruenau10 | 3 x Nvidia Tesla A100 | Y (11.8) | OpenCL,TensorFlow | A10080GB |
PC-Pool (Berlin/Brandenburg) | 1x GeForce GTX 745 | Y (11.8) | OpenCL, TensorFlow | GTX745 |
restliche PC-Pools | 1x Intel Skylake GT2 | N | OpenCL |
Current usage values of the GPUs.
Further information about the GPUs
Following more detailed specifications on the available GPUs:
GPU | RAM (GB) | RAM Bandwidth (GB/s) | GPU Speed (MHz) | CUDA cores | Tensor cores | Raytracing cores | Compute Cap |
---|---|---|---|---|---|---|---|
GeForce GTX 745 | 4GB | 28.8 | 1033 | 384 | / | / | 5.0 |
Nvidia Tesla V100 | 32GB | 897.0 | 1530 | 5120 | 640 | / | 7.0 |
Nvidia Tesla T4 | 16GB | 320.0 | 1515 | 2560 | 320 | 40 | 7.5 |
Nvidia RTX6000 | 24GB | 672.0 | 1770 | 4608 | 576 | 72 | 7.5 |
GeForce RTX 3090 | 24GB | 936.2 | 1695 | 10496 | 328 | 82 | 8.6 |
Nvidia RTX A6000 | 48GB | 768 | 2100 | 10752 | 336 | 84 | 8.6 |
Nvidia Tesla A100 | 80GB | 1600 | 1410 | 6912 | 432 | / | 8.0 |
Please also use the tools "clinfo" and "nvidia-smi" to obtain additional information to help you choose the best fit for your project.
Selection Guide
Depending on the workload, it may be rewarding to prefer one system over the other. The below tables provide an overview of the throughput of the different GPU types based on the input.
Comparison of GPU High-End Systems:
GPU | FP16 (TFLOPS) |
FP32 (TFLOPs) |
FP64 (TFLOPS) | Deep Learning (TOPs) | Ray Tracing (TFLOPS) |
---|---|---|---|---|---|
Nvidia Tesla V100 | 30.0 | 15.0 | 7.5 | 120 | / |
Nvidia Tesla T4 | 16.2 | 8.1 | 0.25 | 65 | / |
Nvidia RTX6000 | 32.6 | 16.3 | 0.5 |
130 |
34 |
GeForce RTX 3090 | 35.58 | 35.58 | 1.11 | 142 / 284* | 58 |
Nvidia RTX A6000 | 38,7 |
38,7 | 1.21 | 309,7 |
75,6 |
Nvidia Tesla A100 | 77,97 |
19,49 |
9.746 | ? |
/ |
The recommendations for certain scenarios are highlighted in yellow.
Legende:
TFLOPs = Tera Floating Point Operations per Second
TOPs = Tera Operations per Second
INTX = Integer variable with X-bits
FPX = Floating point variable with X-bits
GRays = Giga Rays per second
* = Doppelte Performance, wenn Sparsity-Feature genutzt wird
Comparison of Servers:
Server |
Geekbench5 CPU (Single) |
Geekbench5 CPU (Multi) |
GPUs | Recommended Scenario |
---|---|---|---|---|
gruenau1 | 1078 | 25239 (36/72 Cores) | 2 x RTX6000 | Multi GPU Ray Tracing Deep Learning max. CPU |
gruenau2 | 1078 | 25239 (36/72 Cores) | 2 x RTX6000 | Multi GPU Ray Tracing Deep Learning max. CPU |
gruenau9 | 854 | 14169 (16/32 Cores) | 3 x T4 | FP64 Computation max. RAM |
gruenau10 | 1078 | 25239 (36/72 Cores) | 2 x V100 |
FP64 Computation |
PC-Pool (Berlin/Brandenburg) | 1109 | 4308 (4C/8T) | GeForce GTX 745 | / |
gruenau[5-8] | 695 | 27451 (60C/120T) | / | / |
For additional information on the specifications of the Compute-Servers, please klick on the respective servername in the table.
General Remark:
Since all resources are shared among the currently active users, it may be beneficial, after careful consideration, to choose the second or third best option on the list of resources for the envisaged project (assuming that the second or third best in this example has a considerably lower usage).
For better load balancing it is recommended to use SLURM.
Links
[3] https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html