+------------------------------------------------------------------------------+ | | | Dhananjaya D R @/logs @/software @/resume @/contact | | | +------------------------------------------------------------------------------+ Linux Virtual Memory Overcommit Blocked Java Native Thread Creation ________________________________________________________________________________ /\ \/ / Q / |-+/ / \ Go, kite, fly! ________________________________________________________________________________ I found an interesting case at work, where one of the Java services was failing to start a thread, despite having abundant memory and CPU resources. The error from the logs ________________________________________________________________________________ Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 2048k, guardsize: 0k, detached. At first it seemed like a resource exhaustion issue. But the system metrics didn't justify my assumptions, +------------------------------------------------------------------------------+ | bash | +------------------------------------------------------------------------------+ | [root@ip]# cat /proc/sys/vm/overcommit_memory | | 0 | | | | [root@ip]# free -g | | total used free shared buff/cache available | | Mem: 185 78 99 0 7 105 | | Swap: 0 0 0 | | | | [root@ip]# ps -eLf | grep java | wc -l | | 16596 | | | | [root@ip]# cat /proc/meminfo | grep -E 'CommitLimit|Committed_AS' | | CommitLimit: 97115032 kB | | Committed_AS: 145024196 kB | | | | [root@ip]# vmstat | | procs --------memory--------- ---swap-- -----io---- -system-- ------cpu----- | | r b swpd free buff cache si so bi bo in cs us sy id wa st | | 0 0 0 101642848 3100 7649696 0 0 0 38 10 26 29 4 67 0 0 | +------------------------------------------------------------------------------+ Bottom to top approach ________________________________________________________________________________ The Linux kernel safe limit (CommitLimit) is 92GB, but we are committing (Committed_AS) 145GB, which is 58% dangerous overcommitment. Although overcommit_memory is set to 0, which allows some overcommit but refuses when unsafe. it's the virtual memory preventing new thread stack allocation. The thread limit (/proc/sys/kernel/threads-max) is likely not the issue here. because OOM killer didn't kick in, the system has physical memory (99GB). low CPU usage (~30%). confirms we are not resource starved. Top to bottom approach ________________________________________________________________________________ [1] The memory hierarchy * JVM memory (Heap + JVM Stack) <- Java manages this * Native thread stacks <- OS manages this (OS I mean Kernel + system tools/libraries/apps) * Linux VM system <- Kernel manages this (core part of the OS that manages the hardware and resources) [2] Thread creation chain * Application -> new Thread(() -> doWork()).start() * JVM calls OS -> pthread_create() * OS tries to allocate -> mmap() for 2MB native stack * Kernel checks -> current_committed + 2MB > CommitLimit? * Kernel says -> "NO! Already overcommitted" -> EAGAIN * pthread_create() -> fails * JVM logs -> "Failed to start thread... stacksize: 2048k" [3] Memory command chain on linux Application -> Virtual memory -> Actual memory -> Storage ^ | We blocked here! [4] Observations * Container memory limit != Virtual memory overcommit limit * Native thread stacks reserve virtual memory (like I might need it later), but don't use physical memory unless used. Summary of the findings ________________________________________________________________________________ [1] JVM stack/heap are managed by JVM and inside JVM process memory [2] Native thread stack is managed by OS, separate 2MB per thread [3] Container limit applies to actual memory usage [4] Virtual memory overcommit applies to Virtual memory reservations [5] You can have let say 8GB container but still hit virtual memory limits! We can fix this by ________________________________________________________________________________ [1] Reducing the thread size [2] Increase commit ratio "echo 75 > /proc/sys/vm/overcommit_ratio" You can test these solutions with the reproduction scripts at pthreadfail \ |\ ,| \ ,',| / ,',' |/ .o ,',' / ----( Bad doggy! Go home! Shoo! ) ,___o' \--8 ' ___>_>_________/\_______________________________________________ ________________________________________________________________________________