Rescuing A Frozen Linux System With Some Magic

By

We’ve all had to face it: the frozen, hanging, crashed system. No fun for sure, especially if you were in the middle of a game before saving (or, I don’t know, doing work? no judgment). If you’ve been around Linux for a while you’ve probably picked up some favorite ways of dealing with a less than responsive system, but if you are new you may reach for Ctrl-Alt-Delete or that reset button right away.

To help out new Linux users, and hopefully show some tricks for the more experienced hands, here is a quick guide and tips on trying to save an unresponsive program or system. My experience is mostly with lightweight window managers rather than desktop environments like Gnome, and on X rather than Wayland, but I’ll try to cover this as well. I’ll present these roughly in order of easy and more typical, to less common or more drastic.

Likewise, the problems these are targeted to fix go from the common to less likely but more serious:

  1. frozen or unresponsive program
  2. the GUI or display being frozen
    • In this latter case, the less disruptive problem is the window manager (what coordinates displaying programs)
    • and more serious is a crash in the graphics drivers
    • or the windowing server (X or Wayland) itself. Thankfully, these last two crashes are least likely, but unfortunately more common with games.

In any event, the underlying processes, at least the Linux kernel, are still there ticking away. As long as you can talk to them, you can recover the system or limit any data loss. Very rarely will the kernel itself have fallen over. So the general advice is to “not be a Windows user”: we have a robust and communicative system with Linux that can survive a lot. A forced reboot should be rare and of absolutely last resort.

One last general piece of advice: make sure you have enough memory. Physical RAM is best, of course, but having swap space (memory on disk) is an important fail-safe. Linux window managers and the kernel do not like running out of memory. While there are built-in methods for emergency recovery of memory, this can be extremely slow or get stuck in a fight with something constantly trying to use up the newly freed memory. I’ve learned this recently with Red Dead Redemption 2 which is very memory hungry, as well as compiling things like Firefox, even with 16 gigs of good RAM. I was used to living without swap when I had 24 gigs of RAM, so you may be okay when you have lots, but best to have the fallback of swap space. See your distro’s documentation on setting this up, though most will have done that at install.

Now, let’s learn some magic! (You’ll know what I mean by the end.)

Ask nicely

First, you want to see if the whatever seems unresponsive can still be put to rest gracefully. The standard quit key shortcut is Ctrl-q, while often Windows programs run through Wine or Proton will exit with Alt-F4. Just make sure what you want to exit is what you have focus on, i.e. by clicking on its window, before you hit those keys.

Sometimes a game may not let you switch windows with a keyboard shortcut, especially if it has locked up. It is a good idea to try other shortcut keys, like switching to another desktop, pulling up a Quake-style drop-down terminal like yakuake (thanks Patola for this tip!), or an application launcher (like rofi, dmenu, or something built-in to your desktop environment).

Who are you?

If that doesn’t work, perhaps a more direct “please shutdown” message will work. This will give the program a chance to do any cleanup it needs to do before quitting. In a terminal, or a virtual console (use the keys Ctrl-Alt-F# like Ctrl-Alt-F2, where some distros only use F2 and F3), you will want to send the troublesome process a signal to terminate: SIGTERM (numeric signal 15).

How do you find out what the process is? There are a couple of options to find the process number to send the signal to:

## List all processes and find by name
ps aux | grep <name>
## Find all processes matching name (can press TAB for completion)
pgrep <name>

And then send the signal by process number (pid from above) or directly by name:

## Or use -15 instead of -TERM
kill -TERM <pid>
## Can use tab completion for the next two
## pkill will send to all (partial) matches of name
pkill -TERM <name>
## Send to all processes exactly matching name
killall -TERM <name>

The default signal to send if you don’t specify it is actually SIGTERM, so you can leave that off above. I’ve included it here to be extra clear and to contrast for what comes next. You can find out more information by reading the manual, e.g. man kill (see, Linux games!) or using online man pages, or searching the internet, of course.

If they won’t come quietly…

Yes, killing, like in your favorite shooter but more practical. This is the general name for ending processes in Linux, and there are quite a few ways to go about it, just as we saw above. In this case we send the signal SIGKILL (number 9), which doesn’t give the program a chance to do any cleanup, it just dies. In other words, anything unsaved will likely be lost (autosave files to the rescue, hopefully).

Repeat the above commands, but with this signal, like so:

kill -9 <pid>
pkill -9 <name>
## Or use KILL if you want to yell at it
killall -KILL <name>

There’s of course more you can do and some differences between these commands, but those are the basics that you need 99% of the time. There is also the more advanced issue of “zombie” processes, which are the undead: you can’t kill them directly because they are already dead, often leftovers of some other process. You’ll want to deal with the parent process, which you can see detailed in these guides.

Another option is to use the Xorg (windowing server that displays everything) utility xkill. When this is run, your mouse cursor will change and you can click on a window to sever the connection to Xorg. This could be handy to bind to a shortcut key for easy use. As far as I know, there is not an equivalent for Wayland. (Thanks to the fine folks in our Matrix room for telling me about this utility. A fountain of knowledge there!)

When the window manager stops managing

The above commands may not solve your problem if it is the window manager itself that has frozen or stopped responding. Often this is the case, with, for example, Gnome Shell (for Gnome) or Kwin/Plasma Shell (for KDE), having the problem but the underlying X or Wayland process still okay. In this case you can kill or restart these processes to recover, giving the programs running on top a chance to cleanup.

For Gnome, use Alt+F2 followed by r to restart Gnome Shell. Or, in a virtual console, use

gnome-shell --replace

For KDE, the commands to restart the relevant processes is

## on X
plasmashell --replace
kwin_x11 --replace
## on Wayland
killall kwin_wayland
## alternatively (or for older versions of Plasma)
kquitapp5 plasmashell
kstart5 plasmashell

One important difference here between X and Wayland is that (currently, at least) on Wayland you can’t restart Ghome Shell without losing your application state, while on KDE the window compositor crashing or restarting also kills all applications.

Another thing to try is to restart the display manager with

sudo systemctl restart <dm>

where <dm> is gdm for Gnome Display Manger or lightdm for Light Display Manager (or whatever you use, take a look at what your distro uses by default if you haven’t set it up manually). This will bring you back to the initial login screen, which means anything you had running will have closed (hopefully you saved or have autosaves handy).

(Thanks to Ekianjo for the above info, as I haven’t lived with Gnome, KDE, or other destkop environments in a long time.)

X/Wayland bites the dust

Ah, but what if that display server is the one that is all frozen? Well you can kill the whole X (or Wayland compositor) process with the above commands, looking for X or Xorg (or your Wayland compositor) as the name. There is also a shortcut key to shutdown (or restart) X, Ctrl-Alt-Backspace, but that has been disabled by default for several years now. Of course, you can re-enable it, by adding a few options to your configuration. (Note that this is a potential security concern, depending on how X is run and if it dies ending up in a logged in console.)

By the way, if you are not sure if you use X or Wayland, probably the simplest way to check is to use the command echo $WAYLAND_DISPLAY. If it prints something, you are using Wayland.

Note that killing X or the Wayland compositor will mean everything running on it (probably everything you’ve run since logging in to the session) will be killed with it. As per killing any process, that means it won’t have a chance to save anything and you may lose data.

While we’re on that note, sometimes you might just want to reboot or shutdown at this point (if you are forced to log out anyway). This is a very easy, just

sudo reboot
## or
sudo shutdown now

Advanced aside: ssh

A more advanced tip is to set up SSH (Secure Shell) access to your computer. This is not advanced as in difficult, but there are important security concerns to be sure of. Namely, for the purposes of just system recovery, you should only allow access from your local network, with the SSH not exposed to the wider Internet. While generally this will be the default when behind most routers, it is something you want to be sure of.

Setting up SSH access can be as easy as just running sshd or using systemctl enable --now sshd after tweaking the configuration in /etc/ssh/sshd_config. For example, see the Arch wiki page on SSH, or the relevant page in your distro’s documentation. Then as long as sshd is running and responsive, you have yet another path to talk to your machine, even if it seems locked up or the keyboard doesn’t work.

With SSH access, you can then connect to your machine from another computer to run the above commands. You can even use an SSH client from your phone, like with Termux on Android. Note that you may need to specify the DISPLAY or WAYLAND_DISPLAY variable for some commands relating to X/Wayland to work, like DISPLAY=:1 kquitapp5 plasmashell. DISPLAY is usually :0 or :1, but depends on your distro and software setup. You can find this on your machine with the command echo $DISPLAY or echo $WAYLAND_DISPLAY when in a graphical session.

Of course, you can use the commands from the previous section to reboot or shutdown the machine remotely as well, which can be quite handy.

When you need some magic

What if those steps haven’t helped regain control of your computer? Unfortunately crashes with games and display drivers can mean the above steps won’t help, or you can’t get to a virtual console to kill X. You’re in luck, because our final option is some kernel magic, communing directly with Linus Linux.

We can use the “magic SysRq key”! The actual key is Alt-SysRq where SysRq is probably PrtScr (“Print Screen”), though this may differ based on region and hardware. The Wikipedia page has some info on this. These keys let you directly talk to the kernel no matter what it is doing, unless it has truly gone out to lunch. (You may need to enable it or change the settings available, see the Arch wiki.)

There are quite a few commands you can do, appended to Alt-SysRq (or you can let go of SysRq and keep holding Alt as you press the next key). A common sequence for a stuck system often includes REISUB, or, if you like mnemonics: “*r*aising *e*lephants *i*s *s*o *u*tterly *b*oring,” “*r*eboot *e*ven *i*f *s*ystem *u*tterly *b*roken,” or “busier” backwards. The commands, with Alt+SysRq+[command key], are:

command key function
r regain control of keyboard from X
e end processes with SIGTERM
i kill all processes with SIGKILL
s sync all file systems
u remount file systems as read-only
b reboot system

The idea here being to safely try to stop processes before forcefully killing, and then make sure data is written to disks before making them ready for a reboot. There are many command keys, but here are a few that I’ve used or seem more useful

command key function
v restore the virtual console
f call the out of memory killer
o shutdown
h display help

(These keys can be selectively enabled in kernel configuration as this can also be a security risk, allowing someone to kill processes, reboot, or shutdown a system, for instance.)

At the end of your rope

And if none of that works…then sadly you are left only with the manual option of the power or reset key. You may have to hold down the power key for several seconds to do a manual off. Since this is the harshest way to shutdown a system, when you restart do look for any messages on booting about a file system check. Unfortunately, this can result in some files being lost, yet another good reason to have backups. That said, file systems these days will often be pretty robust to these unclean shutdowns, doing their best to limit the damage.

As my dad likes to remind me, sometimes you just need to “reflect, repent, and reboot.” Or, if you are waxing poetic, in haiku form:

Chaos reigns within.
Reflect, repent, and reboot.
Order shall return.