return to first page linux journal archive
keywordscontents

Kernel Korner

An introduction to block device drivers

Last month, we inaugurated a column on Linux kernel programming with an article on how to write Linux device drivers without doing any kernel programming. This month we touch the kernel as we explore block device drivers.

by Michael K. Johnson

It is customary for authors explaining device drivers to start with a complete explanation of character devices, saving block device drivers for a later chapter. To explain why this is, I need to briefly introduce character devices as well. To do that, I'll give a little history.

When Unix was written 25 years ago, its design was eclectic. One unusual design feature was that every physical device connected to the computer was represented as a file. This was a bold decision, because many devices are very different from one another, especially at first glance. Why use the same interface to talk to a printer as to talk to a disk drive?

The short answer is that while the devices are very much different, they can be thought of as having most of the same characteristics as files. The entire system is then kept smaller and simpler by only using one interface with a few extensions.

This is fine, except that it hides important differences between devices. For example, it is possible to read any byte on a disk at any time, but it is only possible to read the next byte from a terminal.

There are other differences, but this is the most fundamental one: Some devices (like disks) are random-access, and others (like terminals) are sequential-access. Of course, it is possible to pretend that a random-access device is a sequential-access device, but it doesn't work the other way around.

A practical effect of the difference is that filesystems can only be mounted on block devices, not on character ones. For example, most tapes are character devices. It is possible to copy the contents of a raw, quiescent (unmounted and not being modified) filesystem to a tape, but you will not be able to mount the tape, even though it contains the same information as the disk.

Most textbooks and tutorials start by explaining character devices, the sequential-access ones, because a minimal character device driver is easier to write than a minimal block device driver. My own Linux Kernel Hackers' Guide (the KHG) is written the same way.

My reason for starting this column with block devices, the random-access devices, is that the KHG explains simple character devices better than it does block devices, and I think that there is a greater need for information on block devices right now. Furthermore, real character device drivers can be quite complex, just as complex as block device drivers, and fewer people know how to write block device drivers.

I am not going to give a complete example of a device driver here. I am going to explain the important parts, and let you discover the rest by examining the Linux source code. Reading this article and the ramdisk driver (drivers/block/ramdisk.c), and possibly some parts of the KHG, should make it possible for you to write a simple, non-interrupt-driven block device driver, good enough to mount a filesystem on. To write an interrupt-driven driver, read drivers/block/hd.c, the AT hard disk driver, and follow along. I've included a few hints in this article, as well.

The Heart of the Driver

Whereas character device drivers provide procedures for directly reading and writing data from and to the device they drive, block devices do not. Instead, they provide a single request() procedure which is used for both reading and writing. There are generic block_read() and block_write() procedures which know how to call the request() procedure, but all you need to know about those functions is to place a reference to them in the right place, and that will be covered later.

The request() procedure (perhaps surprisingly for a function designed to do I/O) takes no arguments and returns void. Instead of explicit input and return values, it looks at a queue of requests for I/O, and processes the requests one at a time, in order. (The requests have already been sorted by the time the request() function reads the queue.) When it is called, if it is not interrupt-driven, it processes requests for blocks to be read from the device, until it has exhausted all pending requests. (Normally, there will be only one request in the queue, but the request() procedure should check until it is empty. Note that other requests may be added to the queue by other processes while the current request is being processed.)

On the other hand, if the device is interrupt-driven, the request() procedure will usually schedule an interrupt to take place, and then let the interrupt handling procedure call end_request() (more on end_request() later) and then call the request() procedure again to schedule the next request (if any) to be processed.

An idealized non-interrupt-driven request() procedure looks something like this:

static void do_foo_request(void) {
repeat:
  INIT_REQUEST;
  /* check to make sure that the request is for a
     valid physical device */
  if (!valid_foo_device(CURRENT->dev)) {
     end_request(0);
     goto repeat;
  }
  if (CURRENT->cmd == WRITE) {
     if (foo_write(
          CURRENT->sector,
          CURRENT->buffer,
          CURRENT->nr_sectors < 9)) {
        /* successful write */
        end_request(1);
        goto repeat;
     } else
        end_request(0);
        goto repeat;
     }
  if (CURRENT->cmd == READ) {
     if (foo_read(
          CURRENT->sector,
          CURRENT->buffer,
          CURRENT->nr_sectors << 9)) {
        /* successful read */
        end_request(1);
        goto repeat;
     } else
        end_request(0);
        goto repeat;
     }
  }
}

The first thing you notice about this function may be that it never explicitly returns. It does not run off the end and return, and there is no return statement. This is not a bug; the INIT_REQUEST macro takes care of this for us. It checks the request queue and, if there are no requests in the queue, it returns. It does some simple sanity checks on the new CURRENT request if there is another request in the queue to make CURRENT.

CURRENT is defined by default as

blk_dev[MAJOR_NR].current_request
in drivers /block/blk.h. (We will cover MAJOR_NR and blk.h later.) This is the current request, the one at the head of the request queue that is being processed. The request structure includes all the information needed to process the request, including the device, the command (read or write; we'll assume read here), which sector is being read, the number of sectors to read, a pointer to memory to store the data in, and a pointer to the next request. There is more than that, but that's all we are concerned with.

The sector variable contains the block number. The length of a sector is specified when the device is initialized (more later), and the sectors are numbered consecutively, starting at 0. If the physical device is addressed by some means other than sectors, it is the responsibility of the request() procedure to translate.

In some cases, a command may read or write more than one sector. In those cases, the nr_sectors variable contains the number of contiguous sectors to read or write.

end_request() is called whenever the CURRENT request has been processed--either satisfied or aborted.

If it has been satisfied, it is called with an argument of 1 and, if it has been aborted, it is called with an argument of 0. It complains if the request was aborted, does magic with the buffer cache, removes the processed request from the queue, "ups" a semaphore if the request was for swapping, and wakes up all processes that were waiting for a request to complete.

It may allow a task switch to occur if one is needed.

end_request() is a static function defined in blk.h. A separate version is compiled into each block device driver, using special #define'd values that are used throughout blk.h and the block device driver. This brings us to...

blk.h

We have already seen several macros which are very helpful in writing block device drivers. Many of these are defined in drivers/block/blk.h, and have to be specially set up.

At the top of the device driver, after including the standard include files your driver needs (which must include linux/major.h and linux/blkdrv.h), you should write the following lines:

#define MAJOR_NR FOO_MAJOR
#include "blk.h"
This, in turn, requires that you define FOO_MAJOR to be the major number of the device you are writing in linux/major.h.

Now you need to edit blk.h. One section of blk.h, right near the top, includes definitions of macros that depend on the definition of MAJOR_NR. Add an entry to the end which looks like this:

#elif (MAJOR_NR == FOO_MAJOR)
#define DEVICE_NAME "foobar"
#define DEVICE_REQUEST do_foo_request
#define DEVICE_NR(device) (MINOR(device) >> 6)
#define DEVICE_ON(device)
#define DEVICE_OFF(device)
#endif

These are the required macros for each block device driver. There are more macros that can be defined; they are explained in the KHG.

DEVICE_NAME is the name of the driver. The AT hard drive driver uses the abbreviation "hd" in most places; for example, the request() procedure is called do_hd_request(). However, its DEVICE_NAME is "harddisk". Similarly, the floppy driver, "fd", has a DEVICE_NAME of "floppy". Other drivers are even more descriptive; read blk.h and follow suit.

DEVICE_REQUEST is the request() procedure for the driver.

DEVICE_NR is used to determine the actual physical device. For example, the standard AT hard disk driver uses 64 minor devices for each physical device, so DEVICE_NR is defined as (MINOR(device)>6). The SCSI disk driver uses 16 minor device numbers per physical device, so for it, DEVICE_NR is defined as (MINOR(device)>4). If you have only one minor device number per physical device, define DEVICE_NR as (MINOR(device)).

DEVICE_ON and DEVICE_OFF are only used for devices that have to be turned on and off. The floppy driver is the only driver that uses this capability. You will most likely want to define these to be nothing at all.

All these macros, as well as many others, can be used in your driver where appropriate. blk.h includes a lot of macros, and studying how they are used in other drivers will help you use them in your own driver. I won't document them fully here, but I will briefly mention some of them to make your life easier.

DEVICE_INTR, SET_INTR, and CLEAR_INTR make support for interrupt-driven devices much easier. DEVICE_ TIMEOUT, SET_TIMER, and CLEAR_TIMER help you set limits on how long may be taken to satisfy a request.

The First Shall Be the Last

I've saved the first, and perhaps most important, thing for last. Before you can read or write a single block, the kernel has to be notified that the device exists. All device drivers are required to implement an initialization function, and there are some special requirements for block device drivers. Here is a sample idealized initialization function:

long foo_init(long mem_start, int length)
{
  if (register_blkdev(FOO_MAJOR,"foo", & foo_fops)) {
    printk("FOOBAR: Unable to get major %d.\n",
           FOO_MAJOR);
    return 0;
  }
  if (!foo_exists()) {
    /* the foobar device doesn't exist */
    return 0;
  }
  /* initialize hardware if necessary */
  /* notify user device found */
  printk("FOOBAR: Found at address %d.\n",
	   foo_addr());
  /* tell buffer cache how to process requests */
  blk_dev[FOO_MAJOR].request_fn = DEVICE_REQUEST;
  /* specify the blocksize */
  blksize_size[MAJOR_NR] = 1024;
  return(size_of_memory_reserved);
}

The three things here that are specific to block device drivers are:

It is worth noting that the hardware device detection and initialization, which I have denoted as foo_exists() here, is very delicate code. If you can rely on a string somewhere in the BIOS of the computer to determine whether the device exists and where it is, it's relatively easy. However, if you have to check various I/O ports, you can hang the computer by writing the wrong value to the wrong port, or even reading the wrong port. Check only well-known ports if you must check ports, and provide kernel command-line arguments for other ports. To do this, read init/main.c and add a section of your own. If you can't figure out how to do it, an explanation is forthcoming in the next version of the KHG.

Of course, none of this initialization will happen if foo_init() is never called. Add a prototype to the top of blk.h with the other prototypes, and add a call to foo_init() in ll_rw_blk.c in the blk_dev_init() function. That call should be protected by #ifdef CONFIG_FOO like the rest of the *_init() functions there, and a corresponding line should be added to the config.in file:

bool `Foobar disk support' CONFIG_FOO y

drivers/block/Makefile should have a section added that looks like this:

ifdef CONFIG_FOO
OBJS := $(OBJS) foo.o
SRCS := $(SRCS) foo.c
endif

This done, configuration should work correctly. Your device driver file does not need to have any references to CONFIG_FOO; the only specific reference to it is commented out in ll_rw_blk.c, and the makefile only builds it if it has been configured in.

Now all you have to do is write and debug your own new block device driver. I wish you the best of luck, and I hope that this whirlwind tour has given you a head start.

Other Resources

Michael K. Johnson is the editor of Linux Journal and author of the Linux Kernel Hackers' Guide (the KHG). He is using this column to develop and expand on the KHG.