Lecture Notes/Outline by Rick Moen
Understanding a Unix system's init architecture lets you follow its startup and shutdown process in detail, along with how to configure, adjust, and diagnose its services, and conveys an understanding of how processes get started.
On any Unix system, an "init" process (usually but not inevitably named that) is the "parent of all processes", runs as a daemon with PID = 1, and controls system startup, shutdown, and changes of operation. By tradition, another name for the same process, "telinit", is used to change runlevels.
Specifically: The Linux (or other Unix) kernel is process zero, forks, then starts process 1 = /sbin/init (or specified substitute). /sbin/init is hard-coded to read and process /etc/inittab ("init table"), and run initdefault level's scripts unless you specify otherwise at boot time.
Consequently, just about everything of interest occurs in /etc/inittab, which we must thus explain.
The System V Init aka SysVInit (used, these days, by almost all Linux and other Unix implementations) works as follows: Bootloader starts kernel, which spawns init, which parses inittab, which instructs init to run script "rc" to run "S" (start a service) symlinks in the boot runlevel and then in the default operating runlevel. Subsequently, if/when init gets invoked to enter a different runlevel, that same mechanism uses "rc" to run "K" (kill = stop a service) and "S" (start a service) symlinks and in the new runlevel to stop and start services per local policy.
Don't worry if the above quick summary seems fuzzy: It will be clarified below.
/etc/inittab's per-line syntax is: id:runlevels:action:process
Each of these elements will be explained below, with particular emphasis on "action", which is the truly interesting one.
"id" = identity label is a simple two-letter identifier, used as a literal for reference purposes but having no other function.
"runlevels" are one or more single digit or letter: 0 through 9, plus letters a through c, plus S = s -- denoting several operating states in each of which some group of identified actions (e.g., stopping of some services and startup of others) should occur whenever the system is put into that operating state -- all of which values will be detailed below, to suggest ways to better use this facility.
/sbin/init ran just a simple /etc/rc script through Version 7 UNIX (released 1979).
Primordial init logic was sort of like this:
id:1:initdefault:
rc::bootwait:/etc/rc
1:1:respawn:/etc/getty 9600 tty1
2:1:respawn:/etc/getty 9600 tty2
3:1:respawn:/etc/getty 9600 tty3
4:1:respawn:/etc/getty 9600 tty4
Init calls script /etc/rc, which reads master configuration file /etc/rc.conf (and also additional ones like rc.conf.local, rc.local, rc.securelevel and at shutdown rc.shutdown). It then calls a small number of rc ("run control") scripts, e.g., rc1, rc2, which start up services. There are no "runlevels" as such. Drawback: The BSD init is prone to causing catastrophic system failure if some package tries to install itself into the boot process and makes an error, or if the sysadmin does likewise. I.e., scalability problem.
However, almost all Unixes shifted in the 1980s to the alternative init style, that of AT&T Unix. (Notable holdouts are: FreeBSD, NetBSD, OpenBSD, and DragonFly BSD.)
Current Linux init was written by Miquel van Smoorenburg of the Netherlands, and reimplements AT&T's SysVinit.
Runlevels: zero through six (numeric), plus S (aka s) and on-request runlevels abc. A runlevel is a software configuration in which a specified group of processes run. S = startup. 1=single-user (for local-console-based maintenance). 6=reboot. 0=halt. The other numbers (2 through 5) are used diversely on distributions to define normal operating states. On Red Hat and SUSE, 5=graphical operation. Runlevels 7, 8, 9 are also valid, but almost never used.
Red Hat convention (theoretical):
2 - Multiuser, without NFS
3 - Full multiuser mode
4 - Unused
5 - Full multiuser mode with an X11 display manager
(In practice, RH really just uses runlevel #5, and disregards 2-4.)
I should mention that the practice of automatically starting an X11 window manager (if one is installed) at the end of the X11 runlevel is a RedHat/SUSEism, which you will not see on other Unixes or most other Linux distributions, where the notion of a distinct X11 runlevel is simply not present.
SUSE convention:
2 - Full multi-user with no networking
3 - Full multi-user NO display manager
4 - Not used/User definable
5 - Full multiuser mode with an X11 display manager
HP/UX convention
2 - multi-user with most daemons started and Common Desktop Environment launched
3 - multi-user, nearly identical to runlevel 2 with NFS exported
4 - multi-user with VUE started instead of CDE
5 - user-defined
AIX convention
AIX defines run levels from 0 to 9, 0 and 1 are reserved, 2 is the default normal multiuser mode and run levels from 3 to 9 are defined by administrator.
Solaris convention (prior to Solaris 10, which departs from SysVInit)
2 - multi-user with most daemons started.
3 - multi-user, identical to 2 (runlevel 3 runs both /sbin/rc2 and /sbin/rc3), with filesystems exported, plus some other network services started.
4 - alternative multi-user, user defined
5 - shut down, power-off if hardware supports it
Debian and Ubuntu convention (theoretical)
Debian/Ubuntu make no real distinction among runlevels 2 through 5. However, in practice, they use 2, and disregard 3-5.
Each of these actions can be done either from the command line (using init/telinit) or at boot time by passing these as arguments to the booting kernel from the bootloader, e.g., GRUB.
Runlevel S (=s) goes directly to a single-user session.
"init q" (or Q) forces re-read of inittab.
"init u" (or U) forces init to re-execute itself, preserving state.
"init -b" (or emergency) bypasses inittab, going into an emergency single-user shell.
"init -a" (or auto) sets the AUTOBOOT environment variable to "yes".
These fields (in the third position, of four) are where a number of the really clever tricks can be performed.
initdefault: Runlevel that will be entered at the end of the boot process (initial default runlevel). Default is to request a number at the end of boot. When /sbin/init starts, it searches for this line in /etc/inittab, before anything else. If it's not found, /sbin/init prompts the user for a runlevel.
sysinit: Process will be executed immediately during system boot, before any boot or bootwait entries, ignoring runlevels. Used to run /etc/init.d/rcS at startup.
boot: Process will be executed during system boot, ignoring runlevels. "boot" entries are run before "wait" entries upon arrival at multiuser runlevels.
bootwait: Process will be executed during system boot, ignoring runlevels; init will wait for its termination before proceeding. "bootwait" entries are run before "wait" entries upon arrival at multiuser runlevels.
ctrlaltdel: Special keypress interrupt handler for Ctrl-Alt-Del; special case of kbrequest action. Usually mapped to either single-user mode or reboot (e.g., "/sbin/shutdown -t1 -a -r now".
respawn: Process will be restarted whenever it terminates. Indispensable for keeping essential processes open, e.g., getty instances, emergency sshd access. Example of the latter:
ss:12345:respawn:/usr/sbin/sshd -D
The "-D" option means "don't detach"
Don Marti of Linux Journal suggests elaborations on this scheme here, e.g., starting a second, statically compiled sshd instance with its own locked-down configuration:
http://www.linuxjournal.com/article/6526
Out of control respawn syndrome: If a respawn process is exec'd
more than 10 times in two minutes, respawns are suspended for
five minutes (or until receipt of a signal), and the console user
is notified, to prevent exhausting system resources. If you ever
see a console message that respawn of service X is being suspended
for five minutes, you have a serious system problem that must
be debugged soon.
once: Process will be executed once upon entry to the cited runlevel(s). Used to implement rc.local correctly, e.g.:
lo:2:once:/etc/rc.local
Unlike the usually recommended solution of creating a /etc/rcN.d/S99local symlink, this gets run once only, after each boot, on first-time entry to runlevel N (2, or whatever), but not subsequent entries.
wait: Process will be executed once upon entry to the cited runlevel(s); init will wait for its termination before proceeding. Used to implement the regular runlevels.
kbrequest: Special invocation of a process by pressing a specified key on the console keyboard (only), where that key has been specified in your keymaps file (/etc/kbd/default.map.gz) as being mapped to the KeyboardSignal action. The keyboard signal numbers are as given in the kbd package (keyboard handler). You can thus define a "panic button" or whatever, using this mapping.
Specifically, the keyboard handler notices the mapped keypress, and generates signal SIGWINCH. init notices that signal, and accordingly calls the action identified in a kbrequest line.
Example: Defining Alt UpArrow to halt the system.
In your rc.local script:
loadkeys << EOF
alt keycode 103 = KeyboardSignal
EOF
This creates the keymap at boot time, mapping Alt UpArrow ("alt keycode 103") to the KeyboardSignal action.
In /etc/inittab
kb::kbrequest:/sbin/shutdown -h now
(You will often see KeyboardSignal referred to by an alias name, Spawn_Console, which you can also use, but which is less clear in this context. The alias suggests one of the signal's other uses, that of popping open a new console process, e.g., if otherwise the system seems hung. To do that, include "spawn_login &" at the end of your keymap -- and don't bother handling the signal in inittab.)
powerwait: What to do if it's been detected that power is going down, e.g., when init is notified of this by a UPS monitor, and thus there is only limited battery power available. Init waits for termination before proceeding.
powerfailnow: What to do when informed that power's been restored.
powerokwait: What to do when informed that battery power from the UPS is almost exhausted.
ondemand: Process that can be invoked without changing runlevel by typing "telinit x" where x=a, b, or c.
off: Used to declare a null action.
Field "process" is most commonly /etc/init.d/rc, which in turn defines and runs the numbered runlevels, but can be any other script or other executable. (Remember that any library dependencies must be satisfiable.)
PATH - normally set to /bin:/usr/bin:/sbin:/usr/sbin
INIT_VERSION - test for this being non-null, to determine if a script is being run from init
RUNLEVEL - current runlevel
PREVLEVEL - previous runlevel CONSOLE - normally passed along unchanged from the kernel, but init will set it to /dev/console if for some reason it's unset.
init (or telinit) -e VARIABLE=VALUE will set additional variables as desired.
Bonus tidbit: init listens to a fifo in /dev, /dev/initctl, for messages. This is very obscure; I've never heard of it being used. (See initreq.h in the Linux init package, if by some weird chance you're interested.)
Red Hat sources /etc/rc.d/init.d/functions for useful bash functions (daemon, killproc, pidofproc, status). An init script is expected to be able to carry out all of these standard actions, via parameters passed to them:
start
stop
status
restart
reload
status
Runlevel directories /etc/rcN.d/ where N = runlevel are populated solely by symlinks of form {S|K}NNname, where S means start the process, nn is relative numeric order (absolute order having no meaning), and name is a name identifying what's being started or stopped. E.g.:
K20nfs -> ../init.d/nfs
K50inet -> ../init.d/inet
S60lpd -> ../init.d/lpd
S80sendmail -> ../init.d/sendmail
The main way one controls services on a Unix machine (that uses SysVInit) is by invoking, creating, rearranging, and deleting these symlinks and the scripts they point to. There are various front-end utilities to do this on your behalf, or you can do it directly.
On modern Linux distributions, typing "halt", "reboot", or "shutdown" invokes a wrapper program (/usr/sbin/consolehelper) that ties into both the PAM layers (to do authentication) and the SysVInit system, to do orderly system halt, reboot, or shutdown. After authenticating the user for the requested action, consolehelper invokes init to change to runlevel 0 or 6 (for halt or reboot), blocks logins, sends any remaining processes the SIGTERM signal to kill them, and only at the end of that process invokes /sbin/halt, /sbin/reboot, or /sbin/shutdown to actually stop or reboot the system.
Slow, bloated, inelegant. Possible namespace collisions in the scripts. Assigning order via numbering is unintuitive to those who aren't already trained to do so, and thus requires some getting used to.
Modern systems' processes, especially on modern desktop and laptop systems, need to start and stop in an event-driven fashion, rather than in a predictable time sequence, e.g. triggered by plugging in or removing hot-swappable hardware, availability or unreachability of network filesystems, firmware initialisation, negotiation with WAPs, receipt of IP addresses from networks, login sequences. Attempting to make all this happen in a predetermined order keeps startup slow, prevents parallel startup, and creates problems.
SystemVInit's lack of facilities to make sure that services remain running is also a particularly grievous lack. The utility "monit" is one of one of several add-ons that retrofit that function, and there are others:
http://www.tildeslash.com/monit/
http://db.assam-glug.org/documentations/Howto/Process-Monitor-HOWTO.html
boot-scripts: Event-driven. Each script declares its dependencies. Run order gets sorted out.
initNG: asynchronous startup
Apple launchd: init-substitute (PID 1) launchd runs /etc/rc, scans through /System/Library/LaunchDaemons and /Library/LaunchDaemons and acts on the plists ("property lists", which are XML goo) as needed, and starts the login window. (Something called SystemStarter was used prior to OS X v. 10.4.)
Solaris SMF (Service Management Facility): better service monitoring, debugging, and automated recovery, event-driven w/dependency tracking, better support for delegating some administrative tasks to non-privileged users, parallel startup.
runit: provides reliable interface to supervise daemon startup, shutdown, control, supervision. Provides clean process state, reliable logging, fast startup/shutdown. Small, portable.
syscan: by Daniel J. Bernstein, provided in daemontools
Ubuntu upstart: asynchronous/parallel and fast startup/shutdown.
eINIT: "full replacement of init designed to start processes asynchronously, but with the potential of doing things without shell scripts". Uses an XML configuration file.
All of these share the trait of being able to run parallel startups / shutdowns, and in aggregate being faster and more flexible. One or more of them will probably be the long-term replacement for the SysVInit design. In the shorter term, most suffer lack of transparency to the sysadmin, and the fact that they are simply not yet standard.