Writing highly-parallelized code is non-trival. To aid this dilemma, a variety of "high-level" models and programming languages are built to provide some connection between sequential and parallel execution. In fact, one of the great achievements of modern C++ is the C++11 memory model, which provided a definition of ordering (keywords such as
atomic). This allows a compiler to look at multi-threaded code and optimize it aggressively based on how the developer defined the memory ordering (instead of abusing the
Of course, C++ really only applies to CPU-based systems. GPGPU, on the other hand, are highly threaded machines with a fundamental difference in architecture design. So instead of native C++, toolkits and libraries are used to target GPGPUs. CUDA is one such example, which provides a C library to describe computation to be performed on Nvidia GPUs. This gives a nice way to define compute workloads. Beyond compute workloads there are graphics APIs, which run a variety of hardware and are used primarily for gaming, but generally offer much less power than CUDA. Unfortunately, beyond CUDA, the tooling for compute workloads is terrible. CUDA only works with Nvidia products, OpenCL is officially deprecated by Apple (and was never really supported by vendors in the first place).
Vulkan has gained a lot of traction and support, and has been advancing in GPU features such as subgroups, pointers, and even a memory model. But the universality and features come at a cost of requiring developers to be more careful about the details (transferring data, specifying layouts and groups, etc). But there is now an even higher-level framework: WebGPU. WebGPU follows a similar approach of Vulkan, requiring explicit details, but allows for universal coverage (Vulkan, OpenCL, Metal, etc).
But once again, the amount of detail a programmer need to specify in order to do a simple compute operation (which would've been easy with CUDA), makes WebGPU and Vulkan look unattractive. Well...
alkomp is a GPGPU library written in Rust for performing compute operations. It's designed to work over WebGPU, enabling compute code to work on DirectX, Vulkan, Metal, and eventually OpenCL and the browser. This project gives the advantage of writing compute operation in a convenient way, while supporting many GPUs and distributors. In addition,
alkomp exposes a Python API to work with numpy arrays on a GPU. Although this project is at its humble beginnings, it will hopefully one day provide a feasible open-source toolkit to define compute workloads on a wide variety of GPUs.
Head on over to alkomp to see examples of Rust and Python compute code :)