Interpreting Linux OOM Logs (1/2)

· 1067 words · 3 minute read

本文描述如何分析 Linux OOM 日志。下面的日志去掉了开头的时间戳。

HOST1 kernel: prometheus invoked oom-killer: gfp_mask=0x0, order=0, oom_score_adj=0
HOST1 kernel: prometheus cpuset=docker-f55e8d749684a0746aa1cb3d186df370848ba502cc19d249bf732450de5a2f30.scope mems_allowed=0-3
HOST1 kernel: CPU: 88 PID: 49521 Comm: prometheus Tainted: G           OE  ------------   3.10.0-327.36.1.el7.x86_64 #1
HOST1 kernel: Hardware name: Dell Inc. PowerEdge R930/0Y0V4F, BIOS 2.4.3 07/07/2017
HOST1 kernel: ffff88bf0612f300 00000000cbfaa699 ffff88ff0e857ce0 ffffffff81636301
HOST1 kernel: ffff88ff0e857d70 ffffffff8163129c 0000000000000297 ffff88ff0e857db0
HOST1 kernel: 0000000000000001 00000000fc107000 0000000000000001 0000000000000010
HOST1 kernel: Call Trace:
HOST1 kernel: [<ffffffff81636301>] dump_stack+0x19/0x1b
HOST1 kernel: [<ffffffff8163129c>] dump_header+0x8e/0x214
HOST1 kernel: [<ffffffff8116d21e>] oom_kill_process+0x24e/0x3b0
HOST1 kernel: [<ffffffff8116cd86>] ? find_lock_task_mm+0x56/0xc0
HOST1 kernel: [<ffffffff8116da46>] out_of_memory+0x4b6/0x4f0
HOST1 kernel: [<ffffffff8116daf1>] pagefault_out_of_memory+0x71/0x90
HOST1 kernel: [<ffffffff8162f6e5>] mm_fault_error+0x68/0x12b
HOST1 kernel: [<ffffffff81642192>] __do_page_fault+0x3e2/0x450
HOST1 kernel: [<ffffffff81642223>] do_page_fault+0x23/0x80
HOST1 kernel: [<ffffffff8163e508>] page_fault+0x28/0x30

第一行表明 OOM 发生在 prometheus 申请内存失败,触发了 oom-killer。后面有三个值:gfp_mask, order 以及 oom_score_adj.

gfp_mask, order and oom_score_adj 🔗

gfp_mask 🔗

代表申请内存时候的参数。

// c.f. include/linux/gfp.h
/* Plain integer GFP bitmasks. Do not use this directly. */
#define ___GFP_DMA              0x01u
#define ___GFP_HIGHMEM          0x02u
#define ___GFP_DMA32            0x04u
#define ___GFP_MOVABLE          0x08u
#define ___GFP_RECLAIMABLE      0x10u
#define ___GFP_HIGH             0x20u
...

order 🔗

如果是 -1 的话,代表 OOM 是通过SysRq 触发。

这里是 0,我们得到 2 ^^ 0 = 1,也就是请求分配 1 页(或者更多)。

// c.f. include/linux/oom.h
/*
 * order == -1 means the oom kill is required by sysrq, otherwise only
 * for display purposes.
 */
const int order;

oom_score_adj 🔗

代表进程的 badness score 的调整值。

OOM killer 选择被杀进程时,会把这个调整值加到进程的 badness score 上作为最终的分值。这个调整值的取值范围是 -1000 (OOM_SCORE_ADJ_MIN) 到 1000 (OOM_SCORE_ADJ_MAX)。进程的 badness 分值越高越容易被杀死。

$ cat /proc/$$/oom_score
0
$ cat /proc/$$/oom_score_adj
0

此外,我们还看到了进程对应的 CPU 编号,进程号,内核版本以及硬件描述等信息。

内存信息 🔗

OOM killer 杀掉进程(如下,19434)之前,会打印出系统的进程信息。其中, 内存占用是以页(默认 4 kB)为单位的,比如:

HOST1 kernel: [ 2535]     0  2535    29188 ...

这里的 29188 是指 29188 * 4 kB.

HOST1 kernel: Mem-Info:                                                                                                                                                                     HOST1 kernel: Node 0 DMA per-cpu:
HOST1 kernel: CPU    0: hi:    0, btch:   1 usd:   0
HOST1 kernel: CPU    1: hi:    0, btch:   1 usd:   0
HOST1 kernel: CPU    2: hi:    0, btch:   1 usd:   0
HOST1 kernel: CPU    3: hi:    0, btch:   1 usd:   0
...
HOST1 kernel: active_anon:136788940 inactive_anon:627992 isolated_anon:0#012
 active_file:6127082 inactive_file:8045807 isolated_file:0#012
 unevictable:1832 dirty:187 writeback:16 unstable: 0#012
 free:104942660 slab_reclaimable:4803157 slab_unreclaimable:257603#012
 mapped:1804194 shmem:1395209 pagetables:297437 bounce:0#012
 free_cma:0
...
HOST1 kernel: 15569613 total pagecache pages
HOST1 kernel: 0 pages in swap cache
HOST1 kernel: Swap cache stats: add 0, delete 0, find 0/1
HOST1 kernel: Free swap  = 4194300kB
HOST1 kernel: Total swap = 4194300kB
HOST1 kernel: 268408898 pages RAM
HOST1 kernel: 0 pages HighMem/MovableOnly
HOST1 kernel: 4261955 pages reserved
HOST1 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
HOST1 kernel: [ 1922]     0  1922    45185    26567      90        0             0 systemd-journal
HOST1 kernel: [ 1950]     0  1950    10973      461      21        0         -1000 systemd-udevd
HOST1 kernel: [ 2579]    81  2579     6686      479      17        0          -900 dbus-daemon
HOST1 kernel: [ 2604]     0  2604     4952      458      13        0             0 irqbalance
HOST1 kernel: [ 3359]     0  3359    20641      898      42        0         -1000 sshd
HOST1 kernel: [19434]     0 19434  8729888  8434255   16732        0             0 xxxx
...
HOST1 kernel: Out of memory: Kill process 19434 (xxxx) score 31 or sacrifice child                                                                                                      HOST1 kernel: Killed process 19434 (xxxx) total-vm:34919552kB, anon-rss:33698680kB, file-rss:38340kB

pid and tgid 🔗

pid 是指 process id,而 tgid 是指 thread group id。这个涉及到 Linux 的线程模型和调度模型。下面一张很好的解释了两者的关系。

                      USER VIEW
 <-- PID 43 --> <----------------- PID 42 ----------------->
                     +---------+
                     | process |
                    _| pid=42  |_
                  _/ | tgid=42 | \_ (new thread) _
       _ (fork) _/   +---------+                  \
      /                                        +---------+
+---------+                                    | process |
| process |                                    | pid=44  |
| pid=43  |                                    | tgid=42 |
| tgid=43 |                                    +---------+
+---------+
 <-- PID 43 --> <--------- PID 42 --------> <--- PID 44 --->
                     KERNEL VIEW

total_vm and rss 🔗

total_vm 即进程已经申请的虚拟内存总大小。rss 是驻留在内存页数。

这里我们看到进程 19434 被杀掉了。被杀掉前,它占用的虚拟内存是 8729888 * 4 kB,34 GB 多一点。占用的物理内存是 8434255 * 4 kB,差不多 33 GB。OOM 之前,系统总的页数:268408898,可用页数:104942660,占比 39.1%。

那么问题来了,为啥会发生 OOM 呢?毕竟 free 还剩下将近 40%.

mm
comments powered by Disqus