Taiga Cloud is overcommitment-free.
Cloud-based GPUs and no look-in for noisy neighbors
Many cloud users are already familiar with a “noisy neighbor”. They are competing users who run their projects on the same machine on the basis of allocation by the provider, for example a hyperscaler. However, these “noisy” users consistently require the major share of the computing power booked and allocated. As a result, customers wanting to draw their computing cycles from the same resource or machine run into performance problems.
Performance is the bottleneck in the stack for users’ ambitious projects, especially in HPC. That is why Taiga Cloud rigorously applies the principle of “overcommitment-free” and has designed the allocation of resources to customers accordingly. Users are allocated physical CPUs and GPUs connected through PCI(e) passthrough including reserved memory, as well as high-speed storage via NVMe over Fabric, thereby enabling them to perform their jobs without interruption.
Performance just like in your own on-prem data center.
Many HPC tasks on the user side require high power. Running tasks in parallel and using several CPU threads can provide a solution, but the software deployed must be optimized for such strategies. Optimizations of this nature are a rarity, however, especially in the field of HPC where customized software is the norm. If such software encounters shared resources, problems are certain to occur. The simultaneous use of the same core by several single-thread optimized applications leads to performance losses, with users frequently unable to clearly identify the cause. In such cases, getting to the bottom of issues is often time-consuming and expensive. Debugging ties up internal resources, which are withdrawn from the actual purpose of the application – the development of an AI training model, for example.
With Taigas’ overcommitment-free approach, however, such issues are eliminated from the outset. This is due to the fact that each user of the HPC infrastructure is allocated physical assets which they alone use. By contrast with virtual shared resources, this performance is available to customers without restriction and with unlimited scalability in real time. Consequently, software engineers benefit from a high-performance foundation for their development projects with an open design and complete cost transparency.