How is a soft lockup detected ?
The Linux kernel creates a watchdog process for each CPU in the system. This should be visible using in the standard "ps" command, this is shown as [watchdog/N], where N is the number of the logical CPU.
This watchdog process/thread wakes up once a second, gets the current time stamp for the CPU it is running on and saves it into a per-CPU structure.
The softlockup_tick() function that gets called from the timer interrupt(). This function gets the current time stamp for the specified CPU and compares it to the saved one in the per-CPU structure.
If the current time stamp is more than softlockup_thresh seconds later than the saved time stamp then it's because the watchdog thread has not been run recently and a soft lockup message is generated on the console. As the timer counter on each CPU can be slightly slow or fast the counter is compared to the previous tick.
Why was the process/task allowed to hog the CPU ?
Not all code that runs on the CPU is considered to be a process. Tasklets / Interrupt handlers and blah are kernel functions that do not show up in the standard process listing (with the ps command).
The kernel still allocates them CPU time, but trusts them to relinquish control of the CPU.
Common causes of not relinquishing control.
Software bugs can cause the process/code to not relinquish the control of the CPU. The code could be waiting on a lock, or may be running code which continues in an infinite loop.
The other problem may be that the scheduler has ignored the process and the process has not been removed from the CPU for some timeframe. There are a number of kernel options that can isolate a cpu and run a single process on it. As the process "hogs" the CPU it may be seen as causing a soft lockup.
Example of a user space "soft lockup"
Pid: 2642, comm: pausetest Not tainted 2.6.18-164.el5 #1
RIP: 0033:[<00000000004005c2>] [<00000000004005c2>]
RSP: 002b:00007fff8a5d7028 EFLAGS: 00000246
RAX: 0000000093338370 RBX: 0000000000000028 RCX: 0000000093338340
RDX: 0000000000000140 RSI: 00002b56f6317000 RDI: 0000000000000001
RBP: 00002b56f6317000 R08: 00000000ffffffff R09: 000000302f919840
R10: 0000000000000000 R11: 0000000000000246 R12: 000000302fb51780
R13: 0000000000000034 R14: 00000000004007da R15: 0000000000000001
FS: 00002b56f6329210(0000) GS:ffff81001fc147c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000302f8c5f70 CR3: 000000000db79000 CR4: 00000000000006e0
Note the lack of values beneath "Call Trace:" above. If a userspace process is the process that has not been scheduled/removed from the cpu no backtrace will be shown, only the registers and the process name.
Example of a kernel space "lockup".
BUG: soft lockup detected on CPU#0!
DWARF2 unwinder stuck at apic_timer_interrupt+0x66/0x6c
Leftover inexact backtrace:
[ ] :ext3:htree_dirblock_to_tree+0xa8/0xc9
If a kernel thread is responsible the backtrace for the kernel thread will be shown in the soft lockup message, however the backtrace includes code that shows the stack message. The DWARF2 undwinder attempts to show the actual stack trace. (shown above in bold).
Sometimes the call trace may only show the symptom of the soft lockup, and other kernel tasks will need to be investigated to show the cause.