[sf-lug] Overheating and CPU throttling

Michael Paoli Michael.Paoli at cal.berkeley.edu
Sat Jan 12 13:06:34 PST 2019


Maybe I'm mistaken, but ...
I thought for most non-ancient x86 hardware - notably CPUs, much/most of
that overtemp protection is built into the CPU itself - past certain
temperature threshold(s), it will significantly drop into (increasingly)
lower performance modes to keep it from getting too hot ... and worst
case at some critical threshold of temperature, CPU will suspend
operations.  But I'm definitely not fully sure about that and
the details and such.  I seem to recall in the early Pentium
days, there wasn't such protection/throttling in CPU - or it was highly
crude and limited.  But successor CPU models build in such "features"/
safeguards - mostly to prevent CPUs from self-destructing due to heat
from their own extreme performance & power consumption density - and
if not otherwise adequately cooled.
Perhaps the Intel etc. specs, give much more details of not only their
thermal operating tolerances, but what types of thermal protections
(e.g. throttling vs. just a simple halt) have been added with what
models and what capabilities.  Not sure about GPUs, but there may also
be similar for those too (I'd guess fairly likely so on the modern
ones).

Also, check BIOS settings and such - sometimes there may be applicable
settings there - e.g. "performance" and other settings ... sometimes
battery life vs. performance, energy efficiency vs. performance,
fan noise vs. performance, etc.

As for Operating System tweaks, daemons, etc.  Seems like those would
mostly be "nice to have" (and may often be provided) additional controls -
and may be more fine-grained - that the (I presume) built-in CPU throttling
for the CPU to keep from cooking itself (or worse) in the more extreme
cases.  Anyway, seems the *hardware* should be at least minimally capable
of protecting itself ... but sure, Operating Systems stuff for more
fine-grained control - and of course monitoring.

Oh, and dust bunnies.  Check not only around the fan, but the entire air
flow path from intake to out.  I've seen multiple cases where there's
> ~=95% blockage ... not at the fan, but at the finer heat sink fins by
the air outlet on the end of the heat piping away from CPU/GPU.
And also check that the fan rotates freely - it should take only the
tiniest of push/pressure for it to turn, and if given a little manual
flick, it should spin quite freely and at least for some moderate bit.
It's also possible - but not as likely, that the fan works find but for
some reason it's not being commanded to (adequately) spin.

> From: "Akkana Peck" <akkana at shallowsky.com>
> Subject: [sf-lug] Overheating and CPU throttling
> Date: Sat, 12 Jan 2019 12:49:44 -0700

> My Thinkpad X201 laptop has developed an overheating problem.
> Randomly, when I'm doing something lengthy and CPU intensive
> like building Firefox, it will shut down without warning. Afterward,
> I have messages like this in /var/log/kern.log:
> thermal_zone0: critical temperature reached (100 C), shutting down
>
> I've found lots of pages with people with similar problems,
> getting lots of responses like "Any modern Linux computer should
> automatically throttle its CPU when temperatures get high". No one
> explains how this automatic throttling is supposed to happen, or how
> to enable it if it's not happening, or what "modern" means (is it
> the CPU that needs to be modern? The BIOS? The kernel? How modern?)
>
> What I'd really like is a daemon or kernel setting that monitors
> the temperature and, if it exceeds max (well before it reaches
> critical), scales down the CPU frequency, or kills or (preferably)
> suspends whatever process is running away with the CPU, or suspends
> the machine rather than shutting down. I have started down the path
> of writing such a daemon, but it's complicated by not wanting to
> suspend certain processes like X even if their CPU usage looks high
> due to some other app. And it's hard to believe Linux doesn't
> already offer a solution to this problem.
>
> More system details:
>
> This X201 has been my main workhorse for 5+ years and never had
> temperature problems until a few weeks ago. I have opened it
> and don't see any dust bunnies around the fan.
>
> Processor is a quad-core Intel(R) Core(TM) i5 CPU M 540 @ 2.53GH.
> Distro is Debian Testing. Kernel was 4.18.0-2-amd64, which I was
> stuck on because of a modeset bug in 4.18.0-3, but it looks like
> 4.19.0-1 has fixed it so now I've upgraded.
>
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
> "ondemand", if that matters; though it doesn't seem from
> https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
> like any of the governors look at temperature at all.
>
> Any suggestions? Any good articles I could read on how this
> scaling/governor/thermal/cpufreq stuff is supposed to work?




More information about the sf-lug mailing list