Why I like Solaris - Ed's journal
Why I like Solaris
We have had a problem on Solaris. The machine was 'running badly'.
Load average of around 20-30. ('normal' is 1 per processor) but nothing showing up 'hogging' the processors. Highest process was running at 30% cpu, with the occasional jump to 70% when it re-ran it's indexing. This was normal, although just in case, we disabled it, and tested again.

Same problem.

mpstat, indicated that whilst cpu 1 was running fine, 2 was 100% busy processing. Unusually though, it was 100% in 'system' time, which meant it was the kernel working hard, rather than any of the processes.

Looking closer (and with reference to the Sunmanagers mailing list), we picked out that there were an unusually high number of interrupts and context switches on processor 2. The latter was caused by a dodgy process executing illegal instructions. The former didn't appear to have any immediate cause.

However, a nice chap pointed me at 'lockstat'. Specifically at 'lockstat -kgIW -D 20 -s 40 sleep 5'. Which showed me that a bunch of kernel objects, prefixed 'se_' were generating interrupts regularly.

Googling lead to /kernel/drv/se, which is the serial port device driver.

Having got someone to unplug the console server plugged into the serial port, this machine has returned to normal.

This is the way machine diagnostics should be. All of the above, was accomplished without needing a reboot. I suspect we'll need to schedule an outage to check the exact source of the problem, but that can be done at a more convenient time. (Although as a point of order, it was rebooted by the 'owner' of the machine, before they spoke to me about it, to see if it was a transient error.)
