Customize Linux from the Bottom--Building Your

Own Linux Base System

Can't find a system that has everything you want? Build your own.

by He Zhu

We have seen various Linux distributions, and yet many others continue to appear. Some are as small as DLX which sits on a single floppy; others are as big as Red Hat 6.2, packed in five CDs. Things seem to become more complex and harder to manage as systems grow. How is a Linux system put together from pieces of free code? How can we assemble and customize our own system for a particular purpose? It seems to be a hard task.

However, from the view of the base system, in principle, all distributions are assembled in a similar way. The difference is that big ones are armed with more packages and more fancy stuff targeted at more general audiences, and small ones have fewer goodies and are aimed at relatively narrow and specific user groups.

Fully featured Linux distributions are usually unnecessarily big for specialized situations. For example, embedded applications need a slim base for their particular situations, and there are already small Linux distributions available for these purposes. However, because there are so many factors to consider, no one can claim that their distribution is comprehensive and satisfies all customers.

Usually an application needs a customized base system to work efficiently. You can pay for a solution from many Linux service providers, or you can do it yourself. Sometimes knowing how to build the base system on your own is more beneficial. With such skills and knowledge, engineers can easily control and improve the system's behavior for the needs of their customers. DIY (Do-It-Yourself) is not just fun, but strategically important in some cases, and more people are recognizing the value Linux offers. By doing so, you acquire the ability to customize not only the Linux kernel but also all other components of your system to achieve the optimizations best suited for your requirements.

This article tries to tell readers that building your own base Linux is not a daunting task. It tells our experiences and gives a brief introduction to building and customizing a Linux system. It is a base system: small, clean and ready to go. We try to make the complex simple, without losing generality and effectiveness. We show how to make the building steps as easy as 1-2-3, and how to customize this system to be minimal, accommodated in one floppy. After all, it should be good enough to be used as a start point for a base system to run any typical applications. This is possible because we build the system directly from unchanged sources. This allows us to always use the latest stable versions and make all required kernel services available.

Linux System

A system is simply defined as a combination of inherently connected parts. A Linux system (only considering software in this article) is a combination of a Linux kernel and other components which make the kernel useful. All software components except the kernel need to be resident in a root file system (there may be a few other file systems, but they must be mounted under the root tree to become visible). So, technically, we can simply consider a Linux system as a combination of a kernel and a root file system. All Linux distributions are arranged in this way. For example, a fully installed Linux system is a kernel plus a big root file system. A Linux installation disk and a rescue system are used to install a full system and to repair problem systems, respectively. They are also organized in the same way; that is, consisting of a kernel and an initial root file system, but the initial root file system is small and only holds a few basic components necessary to do limited jobs (see Figure 1). The Linux kernel has been coded to take special measures to locate the root file system, either from a normal file system or from a compressed image of an initial root file system.

Figure 1. A Linux System

The Approach

Given an application, we want to run it above the kernel using shared libraries on a box. Assume the application doesn't require any rare features which Linux doesn't provide at the moment. We want a base system that can run this application and provide some basic control and management as well. A typical case of the base system is shown in Figure 2. Note that this figure is not complete because a utility may be statically linked, which doesn't require a shared library and a utility may be an a.out style, which doesn't use a dynamic loader.

Figure 2. A Base System

If the application is self-sufficient, that is, statically linked with everything required at runtime, it can run right over the kernel without any support from shared libraries. The base system in this case may mean only the Linux kernel itself. However, almost all systems need support from one or more utilities to manage things like file operations and system monitoring, using commands like mount and ps. We consider a base system to be a combination of the kernel, the dynamic loader, a set of libraries and a set of utilities.

Like many other systems, our goal is to show how to create the system on a floppy with both the kernel and a compressed image of an initial root file system, as shown in Figure 3. This compressed initial root file system will be uncompressed by the kernel and put into a ramdisk, that is, a space of RAM set aside to hold a small Linux file system. Creating such a base system is often straight-forward, but rather tedious for most people. We simplify the procedure. The idea is to design a well-organized hierarchy of makefiles which will extract sources, compile, and setup the contents of the initial ramdisk, then prepare and pack the whole system.

Figure 3. Arrangement of System Components

Customization of the above base system depends on the requirements of the application that will run on the box. Choose a configuration for the kernel, a dynamic loader, a set of libraries that are necessary for the application and utilities, and a set of basic utilities which are required to control and manage the system. Then, compile all these things in a consistent environment. After that, package the results and make it bootable. If something is missing, you are free to add it to the list, and make it again.

Our Modest Goal

A curious reader might ask why we do this and what it is good for. Like many others, we want to run some complex software on a box, something that can be generalized as a multitasking application on a typical PC-like machine without hard disk or monitor. We need an OS kernel and some elements as the base experimental platform. This platform should be robust, maintainable and customizable. Writing a good OS kernel for this purpose is too scaring for many. Thanks to Linux and the Open Source community, we now have an excellent option.

Basic materials are ready and available for free. Now it is time to pick up pieces we need, assemble our own engine and control it. Then, it is time to enjoy.

Before we start, we need to know the answer to some key questions: How to compile the kernel? How to compile a shared library? How to create an initial root file system? How to put the kernel image and compressed file system onto a floppy or EPROM? How to run an application using shared libraries? How to debug? There are many questions like these. The answers are already documented, not, as far as we know, in a single place, but scattered over a wide range of documents. We don't want to write a comprehensive document for these questions but, rather, tell our story and major part of our answers.

Steps

Once the plan has been made for customization, detailed steps can be put into action. General steps in our work are described in the following.

1. Setup the development environment Install a full Linux distribution such as Red Hat 6.0 as the development platform. Make sure it's gcc-supported. To make things easy, assume the target machine on which we run our customized system and the host machine, that is, the development machine, are using the same type of CPUs, in our case Intel x86. Otherwise, we have to prepare a cross-compiler.

2. Customize the kernel Get the latest stable kernel source (version 2.2.13 at this article's writing). How to configure and compile the kernel are well documented in the source files. We don't want to repeat the details here. But we need to choose support for initial ramdisk, loadable modules and other necessary options. If we use the serial console port, choose serial console support. If we have a piece of hardware we want to set up, like an Ethernet card, we can select them as modules. Then we can install and use these modules in our base system.

3. Prepare the standard libraries Get the latest stable standard library glibc (versions 2.2.1). This includes almost everything we want: the dynamic loader, the standard C library and math lib, etc. Although we don't need all that glibc provides, it's better to build them all together, then choose what we want. The documents in the glibc source tree tell in detail how to compile it. Because we use it for our target machine, we have to compile it using the kernel header files consistent with Step 2. This can be specified using --with-headers during configuration. Also, we have to install all library header files so that the compilation of other components for the target machine can use them.

4. Let the compiler know how to cross-compile After installing kernel and glibc header files, we need to prepare the compiler, gcc, to use them. Glibc2-HOWTO describes this in more detail. Briefly, we need to tell gcc where to find specifications using the option -b. In our case, because the target and the host machine are basically the same, we need to use only the host machine specs. This can be found by using the command gcc -v. For example, on my box, the reply is:

Reading specs from 
/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314/Linux 
(egcs-1.1.2 release)

Compile other components of the system by adding the option -b as:

gcc -b i386-redhat-linux

5. Select and prepare utilities and other libraries To compile a utility program such as mount or libraries other than those in glibc, such as termcap, consistently, we have to tell the compiler where to find all the header (include) files and the libraries. First, we tell the compiler not to search header files in default paths by specifying --nostdinc. Second, tell the compiler we are compiling sources for the target machine by using option -b $MACHINE, as described in Step 2. Third, state exactly where to find the kernel header files and the standard library header files by specifying option -I. Fourth, tell the loader what libraries to use and where to find them by using -L and -l options.

6. Build applications Like compiling for utilities and libraries, we need to cross-compile our applications in order to use them on the base system. There are no particular considerations needed from the system view, except to make sure everything on which the application is dependent is installed already.

7. Package things together Once all the components are ready, we need to arrange them in such a way that the system be booted, and any everything can be correctly located. This issue is discussed in more detail in the following section.

8. Move on Use the base system as a start point for whatever you want to do. More features can be added, as needed, over the base system. A separate section is devoted to this issue.

Creating a Base System

As far as we know, it is hard to find a document that tell us in detail how to put images, executables, binaries and scripts together; in other words, to package things together, to assemble a system. Different systems may take different approaches to packaging, although components can be created in the same way. The easiest and most popular way is packaging on floppies. The general steps of packaging a bootable system on a single floppy, that is a boot/root floppy, can be summarized in the following few steps:

To show more details, we wrote a set of makefiles. They are actually instructions to implement the above steps and to create a small base system from freely available packages. We run a simple application called a netperf sever which tests TCP/IP stack performance and is also freely available from the Web. We provide these instructions in Resources. It may not be simple to run, but curious readers can dig into the lines and find how to build a base system from scratch.

An application can be started in different ways depending on how the base system is configured. In most of cases, a Linux kernel is configured to run a startup script or a binary executable, called init or linuxrc, in the initial root file system after the kernel is up. This init program usually does things like remount the root file system to allow read/write permissions, mount other file systems like proc, and initialize other parts of the system, such as starting a shell interface or running the application immediately. The SysVInit program is very popular in most Linux distributions for this purpose.

For our base system, we don't need a complex init sequence to demonstrate. So, we simply write a shell script like the following. Anyone is free to change and add more commands to it:

mount -n -o remount,rw /
mount /proc /proc -t proc
echo  MyCompanyName, Version X.Y. Built Z, August 2000
exec /bin/sh

As an exercise, it might look better if you have the above echo line in your application, and start the application at the end of the script instead of running the standard shell. An example in C++:

cout << COMPANY << VERSION_NO << BUILD_NO <<
__DATE__ << __TIME__;

In our case, the system prompts after it is up.

pipe-elinux> MyCompanyName, Version X.Y,
Build Z, August 2000
pipe-elinux>

Making It More Attractive

After the base system is up, you might think it is not much use without any interesting applications. But it is a base from which you could start your big project. One by one, you can gradually add things into this base system, making it more and more attractive. The following examples might be worth considering:

a init program : SysVInit is a good choice, but it seems too big for simple applications
a security facility: add login support
an editor : vi or emacs
more networking services: telnet or ftp dæmons
a GUI : X is a choice
non-volatile storage: flash memory and hard disk support.
add more loadable modules
use rpm to manage packages

We don't really need to recompile every package we choose, because we can easily find a binary already compiled for a processor. Like our system, the host and the target machines are the same type, so, we can just use most of the binaries found on the host machine. For example, for the utility top, we just copy the binary into our base system; and then it runs. Things are not always that simple, however. Because most often we have to sort out the dependencies for an executable, that is, the shared libraries it needs and configuration files it reads, usually we don't know until we run it. However, some tools can help us with this. ldd and strace can tell us these dependencies. For example, once I tried successfully to run Emacs on the base system by simply copying the executable (emacs-nox), a few shared libs and configuration files to the base system. This usually helps us a lot during the development and saves a lot of time.

You might not be satisfied with booting from floppies. Instead, you can implement booting from EPROM or others. To do this, you have to redesign your packaging approach, but the components are mostly unchanged. What's specific here is the kernel image loader. Booting can be implemented like:

boot from EPROM
boot from Networks
boot from devices like flash, hard disk partition, CD-ROM and ZIP disks; that is, any devices other than floppies
boot from other OS

If you like and want to spend more time, you can make your system as fancy as many other successful systems already in the field.

Problem Solving

One of the advantages of using Linux is that there are many documents and tools to help you customize your system and solve problems. The code is no secret. Everything inside and outside is open. Besides, there are many other useful sources in print and on the Web. There is no other system which can compare to Linux in this respect, not even Free BSD, let alone any proprietary operating systems.

Typically, for a problem, we might work out a few different solutions. We always want to pick the best one, of course, but it is not easy to know which is the best until we have tried each of them. To solve a problem, in many cases, we can find answers by consulting Linux HOWTOS and docs, or asking Linux guys in our organization. As an alternative, you can post a message on a Linux newsgroup and hope someone on there can give you a quick reply. If you want to pay, there are many Linux-related companies providing technical services as well. (If the problem is stubborn, as the last straw, kick your buggy box a few times, as I did sometimes. You must be careful--don't break it and then reboot. That should work; otherwise repeat the problem-solving sequence from the beginning again.)

gdb is an excellent debugging support for applications on the base system. If we don't want or are unable to run a full gdb on the target system, that is, the base system, we can run small remote gdb facilities as either a gdb stub or a gdb server on the target. Besides, things like the syslogd dæmon can also help debugging on the target system.

There are many good problem-solving strategies. Whatever approaches we use, the goal is to find the proper solution. It is usually safe to follow a successful example. For example, we learn something by checking things inside a Red Hat rescue system. We can do this simply with the following few commands:

cat rescue.img | gzip -d > rescue_root.img
mkdir rescue_root
mount -o loop rescue_root.img rescue_root

Here rescue.img is the compressed rescue floppy image found in the Red Hat distribution's images directory. Then we can check its contents by:

ls rescue_root

It displays:

bin dev etc lib lost+found mnt proc sbin tmp usr

You get all the detail in the floppy.

Conclusion

This article is only an introduction to customization of the Linux base system. For a particular situation, it could be rather complex, especially when modifications at the code level are required, such as to support specialized hardware. But, we have shown that it is a manageable task. Our purpose is to make things simple in order to encourage people to take the challenge. By creating our own customized base system with a moderate effort, we get a power engine which can drive us into the bright future.

Resources

He Zhu (hezhu@yahoo.com) is interested in system software and networking. He is currently working for Bell Labs, New Jersey.