Tracking down system freeze cause

Hi!

Maybe 2x monthly I come home to find my system completely frozen. Please suggest ways to locate the issue.

I suspect a power mgmt or display issue. The box turns off the display and idles until I return. Usually, I just move the mouse and I’m back in business. But once in a while no dice. I’ve tried plugging in and out wired keyboards, the HDMI, USB sticks… My only recourse is to hold the power button on the box until reset.

First, I’m running the backport updates just now, so maybe Plasma updates will fix it.

I’ve tried looking at logs, but I don’t recognize the problem, no big sign saying “Here’s your problem newb!”, not that I recognize as yet.

I think my hardware is mainstream, Gigabyte Celeron-based BRIX. I’ve poked around looking for Intel driver updates, but they all seem latest-and-greatest.

Intermittent bugs. Yikes!

I’ve set intel_idle.max_cstate=1

I’ll update if I get another freeze

Keep us updated on this. This sounds like an interesting problem that could be solved with a newer kernel.

It’s been more than a week with no freeze AFTER/WITH setting intel_idle.max_cstate=1 on the command line in GRUB2. Refer to https://bugzilla.kernel.org/show_bug.cgi?id=109051

My box is Gigabyte mobo with latest UEFI v. F8 (2015) and Intel® Celeron® CPU N2807 (Baytrail/Silvermont)

The monitor can no longer sleep, but I can live with this/maybe I’ve changed some setting or will find a workaround

The kernel has updated to 4.12.0-1-amd64, a couple minor versions since I set the max_cstate=1. Inspired by leszek I’ll allow the CPU to change cstate again just to check if the issue has resolved. However, a recent post to the bug report https://bugzilla.kernel.org/show_bug.cgi?id=109051 is reporting the issue at 4.13-rc4.

Also in a recent post to bug report is a claim the kernel should be custom built to include specific microcode that the BIOS/UEFI has omitted. I’ll try to track this down, as well.

Not sure this is relevant: I’ve recently moved to Netrunner backports repos as well as stretch-stable and stretch-updates. Before I found this bug report my sense was the freeze has to do with a driver not waking up from suspend. It’s possible a driver was patched to fix my issue, but I think the Intel drivers have not updated in years. I realize I’ve created too many variables.