init is the driving force that keeps our Linux box alive and is also the one that can put it to death. This article summarizes why init is so powerful and how you can instruct it to behave differently from its default behaviour. (Yes, init is powerful, but the superuser rules over init.)
by Alessandro Rubini
In UNIX parlance, the word ``init'' doesn't identify a specific program, but rather a class of programs. The name ``init'' is used generically to call the first process executed at system boot--actually, the only process that is executed at system boot. When the kernel is finished setting up the computer's hardware, it invokes init and gives up controlling the computer. From that point on, the kernel processes only system calls without taking any decisional role in system operation. After the kernel mounts the root file system, everything is controlled by init.
Currently, several choices of init are available. You can use the now-classic program that comes with the SysVinit package by Miquel van Smoorenburg, simpleinit by Peter Orbaek (found in the source package of util-linux), or a simple shell script (such as the one shown in this article, which has a lot less functionality than any C-language implementation). If you set up embedded systems, you can even run the target application as if it were init. Masochistic people who dislike multitasking could even port command.com to Linux and run it as the init process, although you won't ever be able to restrict yourself to 640KB when running a Linux kernel.
No matter which program you choose, it needs to be accessed with a path name of /sbin/init, /etc/init or /bin/init, because these path names are compiled in the kernel. If none of them can be executed, then the system is severely broken, and the kernel will spawn a root shell to allow interactive recovery (i.e., /bin/sh is used as the init process).
To achieve maximum flexibility, kernel developers offer a way to select a different path name for the init process. The kernel accepts a command line option of init= exactly for that purpose. Kernel options can be passed interactively at boot time, or you can use the append= directive in /etc/lilo.conf. Silo, Milo, Loadlin and other loaders allow specifying kernel options as well.
As you may imagine, the easiest way to get root access to a Linux box is by typing init=/bin/sh at the LILO prompt. Note that this is not a security hole per se, because the real security hole is physical access to the console. If you are concerned about the init= option, LILO can prevent interaction using its own password protection.
Now we know that init is a generic naming, and almost anything can be used as init. The question is, what is a real init supposed to do?
Being the first (and only) process spawned by the kernel, the task of init consists of spawning every other process in the system, including the various daemons used in system operation as well as any login session on the text console.
init is also expected to restart some of its child processes as soon as they exit. This typically applies to login sessions running on the text consoles. As soon as you log out, the system should run another getty to allow starting another session.
init should also collect dead processes and dispose of them. In the UNIX abstraction of processes, a process can't be removed from the system table unless its death is reported to its parent (or another ancestor in case its parent doesn't exist anymore). Whenever a process dies by calling exit or otherwise, it remains in the state of a zombie process until someone collects it. init, being the ancestor of any other process, is expected to collect the exit status of any orphaned zombie process. Note that every well-written program should reap its own children--zombies exist only when some program is misbehaving. If init didn't collect zombies, lazy programmers could easily consume system resources and hang the system by filling the process table.
The last task of init is handling system shutdown. The init program must stop any process and unmount all the file systems when the superuser indicates that shutdown time has arrived. The shutdown executable doesn't do anything, it only tells init that everything is over.
As we have seen, the task of init is not too difficult to implement, and a shell script could perform most of the required tasks. Note that every decent shell collects its dead children, so this is not a problem with shell scripts.
What real init implementations add to the simple shell script approach is a greater control over system activity, and thus a huge benefit in overall flexibility.
As suggested above, the shell can be used as an init program. Using a bare shell (init=/bin/sh) simply causes a root shell to open in a completely unconfigured system. This section shows how a shell script can perform all of the tasks you need to have in a minimal running system. This kind of tiny init can be used in embedded systems or similar reduced environments, where you must squeeze every single byte out of the system. The most radical approach to embedded systems is directly running the target application as the init process; this results in a closed system (no way for the administrator to interact, should problems arise), but it sometimes suits the setup. The typical example of a non-init-driven Linux system is the installation environment of most modern distributions, where /sbin/init is a symbolic link to the installation program.
Listing 1 shows a script that can perform acceptably as init. The script is short and incomplete; in particular, note that it runs only one getty, which isn't restarted when it terminates. Be careful if you try to use this script, as each Linux distribution chooses its own flavour of getty. Type grep getty /etc/inittab to know what you have and how to call it.
The script has another problem: it doesn't deal with system shutdown. Adding shutdown support, however, is fairly easy; just bring everything down after the interactive shell terminates. Adding the text shown in Listing 2 does the trick.
Whenever you boot with a plain init=/bin/sh, you should at least remount the root file system before you do anything; you should also remember to do umount -a before pressing ctrl-alt-del, because the shell doesn't intercept the three-finger salute.
The util-linux package includes a C version of an init program. It has more features than the shell script and can work well on most personal systems, although it doesn't offer the huge amount of configurability offered by the SysVinit package, which is the default on modern distributions.
The role of simpleinit (which should be called init to work properly) is very similar to the shell script just shown, with the added capability of managing single-user mode and iterative invocation of console sessions. It also correctly processes shutdown requests.
simpleinit is interesting to look at, and well-documented too, so you might enjoy reading the documentation. I suggest using the source distribution of util-linux to get up-to-date information.
The implementation of simpleinit truly is simple, as its name suggests. The program executes a shell script (/etc/rc) and parses a configuration file to determine which processes need to be respawned. The configuration file is called /etc/inittab, just like the one used by the full-featured init; note, however, that its format is different.
If you plan to install simpleinit on your system (which most likely already includes SysVinit), you must proceed with great care and be prepared to reboot with a kernel argument of ``init=/bin/sh'' to recover from unstable situations.
Most Linux distributions come with the version of init written by Miquel van Smoorenburg; this version is similar to the approach taken by System V UNIX.
The main idea is that the user of a computer system may wish to operate his box in one of several different ways (not just single-user and multi-user). Although this feature is not usually exploited, it is not so crazy as you might imagine. When the computer is shared by two or more people in one family, different setups may be needed; a network server and a stand-alone playstation can happily coexist in the same computer at different runlevels. Although I'm the only user of my laptop, I sometimes want a network server (through PLIP) and sometimes a netless environment to save resources when I'm working on the train.
Each operating mode is called a ``runlevel'', and you can choose the runlevel to use at either boot or runtime. The main configuration file for init is called /etc/inittab, which defines what to do at boot, when entering a runlevel or when switching from one runlevel to another. It also tells how to handle the three-finger salute and how to deal with power failure, although you'll need a power daemon and a UPS to benefit from this feature.
The inittab file is organized by lines, where each line is made up of several colon-separated fields: id:runlevel:action:command.
The inittab(5) man page is well written and comprehensive as a man page should be, and I feel it is worth repeating one of its examples--a stripped-down /etc/inittab that implements the same features and misfeatures of the shell script shown above:
id:1:initdefault: rc::bootwait:/etc/rc 1:1:respawn:/sbin/getty 9600 tty1This simple inittab tells init that the default runlevel is ``1'', at system boot it must execute /etc/rc, and when in runlevel 1 it must respawn forever the command /sbin/getty 9600 tty1. You're not expected to test this script out, because it doesn't handle the shutdown procedure.
Before proceeding further, however, I must fill in a couple of gaps. Let's answer two common questions:
Naturally, the typical /etc/inittab file has many more features than the three-line script shown above. Although bootwait and respawn are the most important actions, several other actions exist in order to deal with issues related to system management. I won't discuss them here.
Note that SysVinit can deal with ctrl-alt-del, whereas the versions of init shown earlier didn't catch the three-finger salute (i.e., the machine would reboot if you pressed the key sequence). Those interested in how this is done can check sys_reboot in /usr/src/linux/kernel/sys.c. (If you look in the code, you'll note the use of a magic number 672274793: can you imagine why Linus chose this number? I think I know the answer, but you'll enjoy discovering it yourself.)
Let's look at how a fairly complete /etc/inittab can take care of everything required to handle the needs of a system's lifetime, including different runlevels. Although the magic of the game is always on display in /etc/inittab, several different approaches to system configuration can be taken, the simplest being the three-line inittab shown above. In my opinion, two approaches are worth discussing in some detail; I'll call them ``the Slackware way'' and ``the Debian way'' from two renowned Linux distributions that chose to follow them.
Although it has been quite some time since I last installed Slackware, the documentation included in SysVinit-2.74 tells me that it still works the same. It has fewer features but is much faster than the Debian way. My personal 486 box runs a Slackware-like /etc/inittab just for the speed benefit.
The core of an /etc/inittab as used by a Slackware system is shown in Listing 3. Note that the runlevels 0, 1 and 6 have a predefined meaning. This is hardwired into the init command, or better, into the shutdown command part of the same package. Whenever you want to halt or reboot the system, init is told to switch to runlevel 0 or 6, thus executing /etc/rc.d/rc.0 or /etc/rc.d/rc.6.
This works flawlessly because whenever init switches to a different runlevel, it stops respawning any task not defined for the new runlevel; actually, it kills the running copy of the task. In this case, the active task is /sbin/agetty.
Configuring this setup is fairly simple, as the roles of the different files are clear:
By the way, if you haven't guessed what ``rc'' means, it is the short form of ``run command''. I had been editing my .cshrc and .twmrc files for years before being told what this arcane ``rc'' suffix meant--some things in the UNIX world are handed down only by oral tradition. I feel I'm now saving someone from years of being in the dark--and I hope I won't be punished for defining it in writing.
Although simple, the Slackware way to set up /etc/inittab doesn't scale well when adding new software packages to the system.
Let's imagine, for example, that someone distributes an ssh package as a Slackware add-on (not unlikely, as ssh can't be distributed on official disks due to the illogical U.S. rules about cryptography). The program sshd is a stand-alone server that must be invoked at system boot; this means the package should patch /etc/rc.d/rc.M or one of the scripts it invokes to add ssh support. This is clearly a problem in a world where packages are typically archives of files. In addition, you can't assume that rc.local is always unchanged from the stock distribution, so even a post-install script that patches the file will fail miserably when run in the typical user-configured computer.
You should also consider that adding a new server program is only part of the job; the server must also be stopped in rc.K, rc.0 and rc.6. Things are now getting quite tricky.
The solution to this problem is both clean and elaborate. The idea is that each package which includes a server must provide the system with a script to start and stop the service; each runlevel will then start or stop the services associated with that runlevel. Associating a service and a runlevel can be as easy as creating files in a runlevel-specific directory. This setup is common to Debian and Red Hat, and possibly other distributions that I have never run.
The core of the /etc/inittab used by Debian 1.3 is shown in Listing 4. The Red Hat setup features exactly the same structure for system initialization, but uses different path names; you'll be able to map one structure to the other. Let's list the roles of the different files:
#!/bin/sh level=$1 cd /etc/rc.d/rc$level.d for i in K*; do ./$i stop done for i in S*; do ./$i start doneWhat does this mean? It means that /etc/rc2.d (for example) includes files called K* and S*; the former identifies services that must be killed (or stopped), and the latter identifies services that must be started.
Okay, but I didn't explain where the K* and S* files come from. Each software package that must run for a particular runlevel adds itself to all the /etc/rc?.d directories, as either a start entry or a kill entry. To avoid code duplication, the package installs a script in /etc/init.d and several symbolic links from the various /etc/rc?.d directories.
To show a real-life example, lets's see what is included in two rc directories of Debian:
rc1.d: K11croni K20sendmail K12kerneld K25netstd_nfs K15netstd_init K30netstd_misc K18netbase K89atd K20gpm K90sysklogd K20lpd S20single K20ppp rc2.d: S10sysklogd S20sendmail S12kerneld S25netstd_nfs S15netstd_init S30netstd_misc S18netbase S89atd S20gpm S89cron S20lpd S99rmnologin S20pppThe contents of these two directories show how entering runlevel 1 (single-user) kills all the services and starts a ``single'' script, and entering runlevel 2 (the default level) starts all the services. The number that appears near the K or the S is used to order the birth or death of the various services, as the shell expands wild cards appearing in /etc/init.d/rc in alphanumeric order. Invoking an ls -l command confirms that all of these files are symbolic links, such as the following:
rc2.d/S10sysklogd -> ../init.d/sysklogd rc1.d/K90sysklogd -> ../init.d/sysklogdTo summarize, adding a new software package in this environment means adding a file in /etc/init.d and the proper symbolic link from each of the /etc/rc?.d directories. To make different runlevels behave differently (2, 3, 4 and 5 are configured in the same way by default), just remove or add symbolic links in the proper /etc/rc?.d directories.
If this seems too difficult and discouraging, all is not lost. If you use Red Hat (or Slackware), you can think of /etc/rc.d/rc.local like it was autoexec.bat--if you are old enough to remember the pre-Linux age. If you run Debian, you could create /etc/rc2.d/S95local and use it as your own rc.local; note, however, that Debian is very clean about system setup, and I would rather not suggest such heresy. Powerful and trivial seldom match--you have been warned.
At the time of writing, Debian 2.0 is being released to the public, and I suspect it will be in wide use by the time you read this article.
Although the structure of system initialization is the same, it is interesting to note that the developers managed to make it faster. Instead of executing the files in /etc/rc2.d, the script /etc/init.d/rc can now source (read) them, without spawning another shell. Whether to execute or source them is controlled by the file name: executables whose name ends in .sh are sourced, the other ones are executed. The trick is shown in the following few lines:
case "$i" in *.sh) # Source shell script for speed. ( trap - INT QUIT TSTP set start; . $i ) ;; *) # No sh extension, so fork subprocess. $i start ;; esacThe speed benefit is quite noticeable.
Alessandro is a Stone Age guy who runs old hardware, rides an old bike and drives an old car. He enjoys hunting (using grep) through his (old) file system for information that can be converted from C language to English or Italian. If he's not reading e-mail at rubini@linux.it, then he's doing something else.