+------------------------------------------------------------------------------+ | | | Dhananjaya D R @/logs @/software @/resume @/contact | | | +------------------------------------------------------------------------------+ OpenBLAS pthread Resource Exhaustion ________________________________________________________________________________ An AI/ML application running in Kubernetes had one pod failing for 3 days and 4 hours straight - over 27K failed attempts while others worked fine. The error message was ________________________________________________________________________________ OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 16: Resource temporarily unavailable OpenBLAS blas_thread_init: ensure that your address space and process count limits are big enough (ulimit -a) OpenBLAS blas_thread_init: RLIMIT_NPROC 1048576 current, 1048576 max The confusing and interesting part - `RLIMIT_NPROC` showed over 1 million processes available, yet thread creation was failing due to "resource temporarily unavailable." RCA ________________________________________________________________________________ The issue wasn't hardware resources or VM limitations. It was container level resource constraints that OpenBLAS couldn't see. OpenBLAS tries to create multiple threads for optimal performance - in this case, 16 threads. It checks the host system's `RLIMIT_NPROC` (which shows 1M+ processes) and assumes it can create those threads. However, the container itself has much stricter limits imposed by Kubernetes resource constraints. Reproduction ________________________________________________________________________________ I created a test to reproduce the exact error with a container that had limited resources, numpy with OpenBLAS, high thread count, and process limit restrictions. The reproduction setup is available at k8pthreadfail Why This Happens ________________________________________________________________________________ [1] OpenBLAS automatically detects CPU cores and creates threads accordingly. On a 16 core host, it tries to create 16 threads. [2] Kubernetes uses cgroups and namespaces to isolate containers. A container might only be allowed 0.5 CPU cores and 10 processes, regardless of host resources. [3] OpenBLAS sees the host's `RLIMIT_NPROC` but hits the container's actual limits when creating threads. The Fix ________________________________________________________________________________ [1] Set explicit thread limits for OpenBLAS and other math libraries using environment variables [2] Review and adjust container resource limits to match application requirements [3] Match thread count to allocated CPU resources rather than relying on auto detection +------------------------------------------------------------------------------+ Thanks to Henna Rose Joshi for bringing this problem to my attention. +------------------------------------------------------------------------------+ ________________________________________________________________________________