+------------------------------------------------------------------------------+
|                                                                              |
|  Dhananjaya D R                   @/logs   @/software   @/resume   @/contact |
|                                                                              |
+------------------------------------------------------------------------------+


Linux Virtual Memory Overcommit Blocked Java Native Thread Creation
________________________________________________________________________________

                                                                    /\
                                                                    \/
                                                                  /           
                                                           Q    /
                                                           |-+/
                                                          / \   Go, kite, fly!
________________________________________________________________________________

I found an interesting case at work, where one of the Java services was failing 
to start a thread, despite having abundant memory and CPU resources. 


The error from the logs
________________________________________________________________________________

Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) 
for attributes: stacksize: 2048k, guardsize: 0k, detached.

At first it seemed like a resource exhaustion issue. But the system metrics 
didn't justify my assumptions,

+------------------------------------------------------------------------------+
| bash                                                                         |
+------------------------------------------------------------------------------+
| [root@ip]# cat /proc/sys/vm/overcommit_memory                                |
| 0                                                                            |
|                                                                              |
| [root@ip]# free -g                                                           |
|            total        used        free      shared  buff/cache   available |
| Mem:         185          78          99           0           7         105 |
| Swap:          0           0           0                                     |
|                                                                              |
| [root@ip]# ps -eLf | grep java | wc -l                                       |
| 16596                                                                        |
|                                                                              |
| [root@ip]# cat /proc/meminfo | grep -E 'CommitLimit|Committed_AS'            |
| CommitLimit:     97115032 kB                                                 |
| Committed_AS:   145024196 kB                                                 |
|                                                                              |
| [root@ip]# vmstat                                                            |
| procs --------memory--------- ---swap-- -----io---- -system-- ------cpu----- |
|  r  b  swpd  free  buff cache   si   so    bi    bo   in   cs us sy id wa st |
|  0  0     0 101642848 3100 7649696  0   0    0   38   10   26 29  4 67  0  0 |
+------------------------------------------------------------------------------+


Bottom to top approach
________________________________________________________________________________

The Linux kernel safe limit (CommitLimit) is 92GB, but we are committing 
(Committed_AS) 145GB, which is 58% dangerous overcommitment. Although 
overcommit_memory is set to 0, which allows some overcommit but refuses when 
unsafe. it's the virtual memory preventing new thread stack allocation.

The thread limit (/proc/sys/kernel/threads-max) is likely not the issue here. 
because OOM killer didn't kick in, the system has physical memory (99GB). low 
CPU usage (~30%). confirms we are not resource starved.


Top to bottom approach
________________________________________________________________________________

[1] The memory hierarchy
    * JVM memory (Heap + JVM Stack)  <-  Java manages this
    * Native thread stacks           <-  OS manages this (OS I mean Kernel + 
                                         system tools/libraries/apps)
    * Linux VM system                <-  Kernel manages this (core part of the 
                                         OS that manages the hardware and 
                                         resources)

[2] Thread creation chain
    * Application           -> new Thread(() -> doWork()).start()
    * JVM calls OS          -> pthread_create()
    * OS tries to allocate  -> mmap() for 2MB native stack
    * Kernel checks         -> current_committed + 2MB > CommitLimit?
    * Kernel says           -> "NO! Already overcommitted" -> EAGAIN
    * pthread_create()      -> fails
    * JVM logs              -> "Failed to start thread... stacksize: 2048k"

[3] Memory command chain on linux
    Application -> Virtual memory -> Actual memory -> Storage
                         ^
                         |
                  We blocked here!

[4] Observations
    * Container memory limit != Virtual memory overcommit limit
    * Native thread stacks reserve virtual memory (like I might need it later), 
      but don't use physical memory unless used.

  
Summary of the findings
________________________________________________________________________________
  
[1] JVM stack/heap are managed by JVM and inside JVM process memory
[2] Native thread stack is managed by OS, separate 2MB per thread
[3] Container limit applies to actual memory usage
[4] Virtual memory overcommit applies to Virtual memory reservations
[5] You can have let say 8GB container but still hit virtual memory limits!


We can fix this by
________________________________________________________________________________
      
[1] Reducing the thread size
[2] Increase commit ratio "echo 75 > /proc/sys/vm/overcommit_ratio"

You can test these solutions with the reproduction scripts at pthreadfail


                            \
                            |\
                           ,| \
                         ,',| /
                       ,',' |/
               .o    ,','   /  ----( Bad doggy!  Go home!  Shoo! )
    ,___o'       \--8 '
  ___>_>_________/\_______________________________________________

________________________________________________________________________________