Ever since I started using Linux two years ago, I've wanted to build a cluster. During the early days of Linux self-instruction, I scoured the Internet for clustering information, looking forward to the day that I would be able to run a parallel computer in my own home. Everything I read on the subject made it sound like clustering with Linux was straightforward, almost simple. But it was slow going for a newbie, and I didn't get very far. Clustering information was dispersed all over the Internet, and I couldn't make a cohesive whole out of it. I was elated when Sterling, Salmon, Becker and Savarese published How to Build a Beowulf. I figured I was set. I bought five computers with AMD 400MHz K6s in anticipation of my homemade supercomputer. Alas, the book was very informative but lacked the one thing I had trouble finding everywhere else: the software for the cluster installation. I was stuck.
The moment I discovered that O'Reilly was going to publish Building Linux Clusters I ordered the book. The advertisement claimed that the book would contain the software to build a cluster. I was saved. But I had months to wait for its arrival, so I planned and schemed about the really cool things I would do with my cluster, and I don't mean just crunching DES keys.
I was at the O'Reilly Open Source Software conference in Monterey, California, July 2000, when I discovered that O'Reilly was going to ship the title to the meeting. Two days later, I had what I hoped was the Beowulf grail in my greedy little hands, with a 20% off promotional discount to boot. In Monterey, separated from the boxes I wanted to turn into a cluster, I jammed through the 280 odd pages of Building Linux Clusters in under 24 hours. It had everything I, or anyone else, would ever need. I was ecstatic.
The material covered in the first four chapters is well written and designed to ease newbies into the waters. Topics covered include: the history of parallel computing, concepts of networks and parallelism, parallel programming systems and libraries, cluster types, cluster design, construction and assembly considerations, and hardware. Expectant cluster builders, like myself, will be familiar with most of the introductory material but may glean some useful bits on advanced topics such as meshes, hypercubes and symmetric multiprocessing (SMP). The author's recommendations for entry-level, mid-level and advanced-level cluster configurations will seem unrealistic to most Linux users. I felt that the hardware I had assembled for my cluster was pretty extravagant for a hobbyist to be experimenting with. However, my hardware didn't measure up to the seven 450MHz Pentium processors suggested for the entry-level cluster. At least I had that configuration beat on Ethernet bandwidth and memory. The author is aware of the low to no-budget, roll-your-own approach taken with most Beowulf clusters in the past, but he clearly stated his intention was to cover the construction of ``serious'' systems. Of course, the ``seriousness'' of a cluster is clearly relative, and the author's recommendations should be regarded as guidelines, not rules. However, if you do install the clustering software in this book, you will need about 1G of hard drive space per node, as the software requires about 946M of disk space.
The last three chapters contain excellent information and resources on parallel programming development environments, tools, libraries and applications. Common tools like GNU Emacs, GCC, G77, the Fortran compiler and GDB (the GNU debugger) are included on the CD. Two sets of parallel programming tools, PADE (Parallel Applications Development Environment) and XPVM are also found there. These packages are incredibly helpful GUI tools that allow users to create and modify parallel virtual machines (PVMs) on their cluster. The Local Area Multicomputer (LAM) package is also included. LAM helps users execute programs that utilize the message passing interface (MPI) libraries (also included) by lending a hand in creating a processing environment for MPI-based programs. All of these packages come with on-line documentation as well. Several parallel programming libraries like SMARTS (Shared Memory Asynchronous Run time System), SILOON (Scripting Interface Languages for Object-Oriented Numerics), PAWS (Parallel Application WorkSpace), POOMA (Parallel Object-Oriented Methods and Applications), as well as the math libraries PETSc, PLAPACK and ScaLAPACK, are included on the CD. Several debugging and performance tools such as TAU (Tuning and Analysis Utilities), PCL (Performance Counter Library) and PDT (Program Database Toolkit) are also included. For those hard-chargers that want to move beyond the typical Beowulf and build more tightly integrated systems, two sets of kernel patches, BPROC (Beowulf-distributed process space) and MOSIX, are included on the CD.
As if that weren't enough, the CD includes two applications, mp3pvm and PVMPOV, to test your cluster with.
Clearly, the CD is loaded with copious and formidable tools to build and run parallel programs. The coverage of the topics found in the seven chapters I've summarized above is extensive. I consider this book serious reference material for the beginner as well as the advanced clustering enthusiast. That said, let's talk about why I can never recommend this book to anyone.
The editing and proofreading done on this book was nearly nonexistent. Eight of the first eleven figures in the book were either missing or out of order. The number of spelling and grammatical mistakes is unacceptable. We've all come to expect more from O'Reilly, and this sloppy offering came as a shock. Through lack of editing, enough stumbling blocks are encountered to quickly befuddle the Linux beginner. Those familiar with Linux and the introductory material can probably get by with a few assumptions and guesses as to what the author meant. I did just that and moved to the installation phase.
In a nutshell, the author's methodology for building a Linux cluster involves the Kickstart installation method developed by Red Hat, DHCP and installation of Linux over a network. The software is provided to create two types of boot floppies. One floppy is created for a master node. The master node boot floppy is used to perform a CD installation of Red Hat Linux 6.2 on the master node. Once the master node is up and running, it is configured to act as a DHCP server for the rest of the nodes in the cluster. The commands to enter the information the master node requires to serve DHCP requests for IP addresses is clearly noted. Once the master node is ready, the second type of boot floppy created is used to perform automated Kickstart installations of Linux onto the remaining nodes from the master node CD over the network. In theory, it's simple and straightforward. So much so that the author mentions he was able to complete software installations on several clusters in about 20 minutes. Just the kind of software solution I needed.
There are so many mistakes, including errors in the commands to create the boot floppies, configure the DHCP server and perform the installation, that my software installation took eight hours. The very first command line I needed to enter in order to create the first installation boot disk was incorrect. It was to be one of many such obstacles. Scripts were also frequently mislabeled and often were not where they were supposed to be. The disparity between what is written and what is on the CD is appalling.
After hours and hours of making assumptions, cruising the file system, editing the boot disk software and a little black magic, I completed the installation. Now, I just had the simple task of initializing the cluster database using the cluster management software provided, and I would be able to run the parallel applications provided. You would think that I had endured the worst of it. But it's always darkest before dawn.
The cluster management software (another feature I was thrilled to see) included on the CD simply doesn't work. Essentially a bunch of Perl scripts, it was designed to work through both a web browser interface and at the command line. However, several links are missing on the cluster management package home page, making the browser interface moot. I possess a decent amount of Apache knowledge, but I couldn't make things work. Maybe I don't know enough Perl. In any event, when I tried to skirt the browser interface and populate the cluster database manually with the provided script, I was greeted with what I consider the death knell of this project. Running the script returned the message: This command will be provided in an updated Cluster Management Package. What? Not only was the software not working, it was incomplete. Things accelerated steeply downhill from there.
Refusing to give up, I tried adding nodes to my ``cluster'' (the single master node) by setting up a PVM using XPVM, no luck. I tried adding nodes manually at the pvm> command line prompt, no luck. My experience up to that point suggested that nothing was going to work without a struggle. So I reluctantly quit before giving the LAM package a try.
Apparently, I'm not alone in my suffering and disappointment. While writing this article, eight of the nine reader reviews posted at oreilly.com conveyed sentiments similar to my own. Other readers have submitted error reports that cover in greater detail what I've mentioned here. To O'Reilly's credit, they are posting confirmed errata on their website. It's quite the uphill battle for them in this case.
I'm dumbfounded at how this work made it out of any publishing house, let alone O'Reilly's. To avoid this kind of debacle in the future, let me go on the record as stating that I will personally review any upcoming revisions/editions of Building Linux Clusters and test the software within, for free.
Building Linux Clusters should be considered a beta release. Much work needs to be done on the next version before I will be able to recommend it to anyone. Technically, I am ahead of where I was with my cluster before I got the book. But it sure doesn't feel like it.
Glen Otero has a PhD in Immunology and Microbiology and runs a consulting company called Linux Prophet in San Diego, California. He can be reached at gotero@linuxprophet.com. Surfing, in the ocean that is, is his favorite pastime.