return to first page linux journal archive
keywordscontents

Introducing Modula-3

One of the main tenets of the Unix philosophy is using the right tool for the right job. Here is a well-crafted tool well-suited for many large jobs that are difficult to do well in C.

by Geoff Wyant

Suppose you want to develop a large, complex application for Linox. The application is going to be multiprocess, perhaps distributed, and definitely has to have a GUI. You want to build this application fairly quickly, and you want it to be relatively bug free.

One of the first questions you might ask yourself is "What programming language and environment should I use?" C might be a good choice, but probably not for this project. It doesn't scale as well as you'd like, and the tools for doing multiprocess/distributed programming for C just aren't there. You might consider C++, but the language is fairly complex. Also, you and others have discovered from past experience that a fair amount of time goes into debugging subtle memory management problems.

There is an alternative, the Modula-3 programming system from Digital Equipment Corporation's Systems Research Center (SRC). Modula-3 is a modern, modular, object-oriented language. The language features garbage collection, exception handling, run-time typing, generics, and support for multithreaded applications. The SRC implementation of this language features a native-code compiler; an incremental, generational, conservative, multithreaded garbage collector (whew!); a minimal recompilation system; a debugger; a rich set of libraries; support for building distributed applications; a distributed objectoriented scripting language; and, finally, a graphical user interface builder for distributed applications. In short, the ideal environment for the type of application outlined above. Moreover, the system is freely available in source form, and pre-built Linux binaries are available as well.

The remainder of this article will touch on the pertinent features of the language and provide an overview of the libraries and tools.

The Basics of the Modula-3 Language

One of the principal goals for the Modula-3 language was to be simple and comprehensible, yet suitable for building large, robust, long-lived applications and systems. The language design process was one of consolidation and not innovation; that is, the goal was to consolidate ideas from several different languages, ideas that had proven useful for building large sophisticated systems.

Features of Modula-3

You can think of Modula-3 as starting with Pascal and re-inventing it to make it suitable for real systems development. Beginning with a Pascal-like base, features were integrated that were deemed necessary for writing real applications. These features fall roughly into two areas: those which make the language more suitable for structuring large systems, and those which make it possible to do "machine-level" programming. Real applications need both of these.

Supporting Large Systems Development

There are several features in Modula-3 that support structuring of large systems. First is the separation of interface from implementation. This allows for system evolution as implementations evolve without affecting the clients of those interfaces; no one is dependent on how you implement something, only what you implement. As long as the what stays constant, the how can change as much as is needed.

Secondly, it provides a simple single-inheritance object system. There is a fair amount of controversy over what the proper model for multiple inheritance (MI) is. I have built systems that use multiple-inheritance extensively and have implemented programming environments for a language that supports MI. Experience has taught me that MI can complicate a language tremendously (both conceptually and in terms of implementation) and can also complicate applications.

Modula-3 has a particularly simple definition of an object. In Modula-3, an object is a record on the heap with an associated method suite. The data fields of the object define the state and the method suite defines the behavior. The Modula-3 language allows the state of an object to be hidden in an implementation module with only the behavior visible in the interface. This is different than C++ where a class definition lists both the member data and member function. The C++ model reveals what is essentially private information (namely the state) to the entire world. With Modula-3 objects, what should be private can really be private.

One of the most important features in Modula-3 is garbage collection. Garbage collection really enables robust, long-lived systems. Without garbage collection, you need to define conventions about who owns a piece of storage. For instance, if I pass you a pointer to a structure, are you allowed to store that pointer somewhere? If so, who is responsible for de-allocating the structure in the future? You or me? Programmers wind up adopting such conventions as the explicit use of reference counts to determine when it is safe to deallocate storage. Unfortunately, programmers are not very good about following conventions. The net result is that programs develop storage leaks or the same piece of storage is mistakenly used for two different purposes. Also, in error situations, it may be difficult to free the storage. In C, a longjmp may cause storage to be lost if the procedure being unwound doesn't get a chance to clean up. Exception handling in C++ has the same problems. In general, it is very difficult to manually reclaim storage in the face of failure. Having garbage collection in the language removes all of these problems. Better yet, the garbage collector that is provided with the SRC implementation of Modula-3 has excellent performance. It is the result of several years of production use and tuning.

Most modern systems and applications have some flavor of asynchrony in them. Certainly all GUI-based applications are essentially asynchronous. Inputs to a GUI-based application are driven by the user. Multiprocess and multi-machine applications are essentially asynchronous as well. Given this, it is surprising that very few languages provide any support at all for managing concurrency. Instead, they "leave it up to the programmer". More often than not, programmers do this through the use of timers and signal handlers. While this approach suffices for fairly simple applications, it quickly falls apart as applications grow in complexity or when an application uses two different libraries, both of which try to implement concurrency in their own way. If you have ever programmed with Xt or Motif, then you are aware of the problems with nested event loops. There needs to be some standard mechanism for concurrency.

Modula-3 provides such a standard interface for creating threads. In addition, the language itself includes support for managing locks. The standard libraries provided in the SRC implementation are all thread-safe. Trestle, which is a library providing an interface to X, is not only thread-safe, but itself uses threads to carry out long operations in the background. With a Trestle-based application, you can create a thread to carry out some potentially long-running operation in response to a mouse-button click. This thread runs in the background without tying up the user interface. It is a lot simpler and less error prone than trying to accomplish the same thing with signal handlers and timers.

Generic interfaces and modules are a key to reuse. One of the principal uses is in defining container types such as stacks, lists, and queues. They allow container objects to be independent of the type of entity contained. Thus, one needs to define only a single "Table" interface that is then instantiated to provide the needed kind of "Table", whether an integer table, or a floatingpoint table or some other type of table is needed. Modula-3 generics are cleaner than C++ parameterized types, but provide much of the same flexibility.

Supporting Machine-Level Programming

One of the important lessons from C was that there are times that real systems need to be programmed essentially at the machine level. This power has been nicely integrated into the Modula-3.

Any module that is marked as unsafe has full access to machine-dependent operations such as pointer arithmetic, unconstrained allocation and de-allocation of memory, and machine-dependent arithmetic. These capabilities are exploited in the implementation of the Modula-3 I/O system. The lowest levels of the I/O system are written to make heavy use of machine-dependent operations to eliminate bottlenecks.

In addition, existing (non-Module-3) libraries can be imported. Many existing C libraries make extensive use of machine-dependent operations. These can be imported as "unsafe" interfaces. Then, safer interfaces can be built on top of these while still allowing access to the unsafe features of the libraries for those applications that need them.

The Modula-3 Programming System

How often have you avoided changing a base header file in a C/C++ system because you didn't want to recompile the world? How many times have you restructured your header files, not because it was the right thing to do, but because you needed to cut down on the number of recompilations after each change?

The SRC implementation of Modula-3 has a rather elegant solution to this problem. If an item in an interface is changed, only those units that depend on that particular item will be recompiled. That is, dependencies are recorded on an item basis, not on an interface file basis. This means much less recompilation after each set of changes.

m3gdb is a version of GDB that has been modified to understand and debug Modula-3 programs. One of the nice features of m3gOb is that it understands M3 threads and allows you to switch from thread to thread when debugging a problem.

Also very exciting is Siphon. Siphon is a set of servers and tools to support multi-set development. The basic idea is that you can create a set of packages. A package is just a collection of source files and documentation. Siphon provides a simple model for checking out and checking in a package. Checking out a package locks it so that no one else can check it out and modify the contents. This probably doesn't sound that exciting. The exciting thing that Siphon does is to automatically propagate modified files to other sites when the package is checked back in. It does this in such way that packages are never seen in a "half-way" state; that is where part of the sources have been copied but not yet all of them. Further, it does this in the face of failure. One of the really interesting parts of multi-site development is making sure that everyone has the most recent copy of the sources. This is especially hard when communication links can go down. Siphon takes care of all of these problems for you. A system like Siphon can save you considerable amounts of work if you are involved in multi-site development. By the way, Siphon is not restricted to Modula-3 source files. It can manage any type of source or documentation file.

The Libraries

A good, simple object-oriented language makes a nice starting point, but that in itself probably doesn't provide sufficient motivation for considering a new language. Real productivity comes about when there are good reusable libraries. This is one of the real strengths of SRC Modula-3 system. It provides a large set of "industrial strength" libraries. Most of these libraries are the result of a number of years of use and refinement. They are as well- or better-documented than most commercially available libraries.

Libm3

Libm3 is the workhorse library for Modula-3. It is the Modula-3 equivalent of libc (the standard C library), but it is considerably richer.

Libm3 defines a set of abstract types for I/O; these are called "readers" and "writers". Readers and writers present an abstract interface for writing to "streams". Streams represent buffered input and output. Stdin, stdout, and stderr represent represent streams that are familiar to most programmers. The streams package was designed to make it easy to add new kinds of streams.

In addition to the standard I/O streams, one can open file streams and text streams (that is, streams over character strings). There is also a set of abstractions for unbuffered I/O. In addition to the File type, there are Terminal and Pipe. The Fmt interface provides a type-safe version of C's printf. A big source of errors in C programs is passing one kind of data into a printf, but trying to format it as a different kind of data. The Fmt interface was designed to have the flexibility of printf, but without introducing its problems.

Libm3 also defines a simple set of "container" types as generic interfaces. The basic container types include tables, lists, and sequences. A table is an associatively indexed array. The list type is the familiar "lisp" style list. A sequence is an integer (CARDINAL, actually) addressed array which can grow in size.

Finally, Libm3 provides a simple persistence mechanism called Pickles. Writing code to convert complex data structures to and from some disk format is tedious and error prone. Many programmers don't do it unless they absolutely have to. With the Pickle package, you no longer need to write this kind of code. Since the runtime knows the layout of every object in memory, it can use this information to walk a set of structures and read them from or write them to a stream. The programmer does not have to write object-specific code for writing an object to a stream, although he or she can if a better representation is known. For example, the programmer of a hash table may choose to write out individual entries if the table is below a certain size.

Trestle and VBTKit

Most user interfaces (UI) are a spaghetti of event handlers, timers, and signals. This is because they need to deal with user input coming in at arbitrary times, they need to deal with refreshing the screen, and they have to make sure that long running operations don't cause the application's windows to freeze up. All of these constraints make developing user interfaces in traditional languages and libraries very difficult.

The SRC Modula-3 implementation provides a UI library known as Trestle. The notable thing about Trestle is that it is highly concurrent. It was written to make extensive use of threads and to be used in a multithreaded environment.

This simplifies the development of user interfaces considerably, since you don't have deal with the event loop any more. An event loop is essentially "poor man's multithreading". Since the language and libraries support first-class threads, these can be used instead. If the action associated with a button may take a long time, the action can merely fork off a thread to handle the bulk of the action. This thread can make arbitrary calls into Trestle to update the screen with new results. Trestle protects itself through judicious use of locks.

Trestle provides two sorts of objects: graphics objects such as paths and regions, and a base set of user-interface objects. These user-interface objects are known as "VBT"s. These play the same role in Trestle as the X intrinsics play in the world of the X toolkit. They define how screen-space is allocated among different "widgets". Trestle provides a simple set of buttons and menus in its set of base UI items.

VBTKit is a higher-level toolkit built on top of Trestle. VBTKit provides a much wider array of UI object kinds. It also provides a Motif-like 3-D "look and feel" (on color displays). The same interface can be used on monochrome displays without change, but without the appropriate visual appearance. VBTKit provides the usual complement of scrollbars, buttons, menu items, numeric I/O objects, and the like.

FormsVBT and FormsEdit

FormsVBT is a User Interface Management System (UIMS) structured as a library and is built on top of VBTKit. FormsVBT provides a simple language for describing the layout of a user interface and an event interpreter for that language. The layout language follows a "boxes and glue" model. Boxes hold some set of UI objects. A VBox arranges those objects in a vertical display, while an HBox arranges them in a horizontal display. Glue is used to force a certain amount of space between items in a visual display. As you might expect, boxes can be nested arbitrarily.

The FormsVBT library allows you to specify callbacks to handle input. The FormsVBT specification that you write specifies the "syntax" of your user interface; the event handlers that you write provide the "semantics".

FormsEdit is a simple UI creation tool built on top of FormsVBT. It reads and displays graphically, and in source text form, a UI specification written in the FormsVBT language. It also allows for interactive modification of the source.

m3tk

There are times when you just need to develop some language-specific tools for a project. The problem is that very few language implementations give you any support in doing this. Many times, all you have is a public domain YACC grammar that you have to modify and then build from there. This is where m3 tk comes in. It provides a complete toolkit for parsing M3 source files; and generating and manipulating abstract syntax tree representations of M3 sources. Thus a M3 specific tool can be built with relative ease. In fact, the Network Objects (see below) stub generator was built using it.

Network Oblects

Distributed systems are quickly becoming commonplace in the '90s. Most languages provide little or no support for distributed programming. Most distributed applications are still built directly on top of sockets or use libraries that provide a simple stream or RPC interface. These libraries are poorly integrated into the language and introduce a severe impediment between the language and the distributed system.

Network Objects is a facility in the SRC implementation of Modula-3 that allows Modula-3 objects to be exported across address spaces and machines. With Network Objects, a program can't tell if the object it is operating on is one that it created in its own address space or was one that was created and exists in another address space. This provides a very powerful mechanism for developing distributed applications.

To turn an object type into a network object, that object must inherit (either directly or indirectly) from the type NetObj.T. The object cannot contain any data fields. The interface containing the declaration is then run through a tool called a stub compiler. This generates all the coding necessary to handle network interactions. That's all that is required to allow an object to be passed around the network. Pretty simple. Below is an example of a network object. It defines an interface called "File" that defines the operations on a file, and an implementation of that interface.

INTERFACE File;
TYPE T = NetObj.T OBJECT 
METHODS 
getChar(): CHARACTER; 
putChar(c: CHARACTER); 
END;

END File;

INTERFACE FileServer; 
IMPORT File;
TYPE T = NetObj.T OBJECT 
METHODS 
create(name: Text): File.T 
open(name: Text): File.T
END;

END FileServer;

The above code defines two types: File.T, which is an object with two methods to get and put a single character; and FileServer.T, an object which manages file objects. A server someplace defines a concrete implementation of these abstract types.

MODULE FileServerImpl; 
IMPORT File, FileServer;

TYPE FileImpl = File.T
...state for a file... 
OVERRIDES 
getChar := GetChar; 
putChar := PutChar;
END;

TYPE FileServerImpl = FileServer.T OBJECT 
...state for a file server... 
OVERRIDES 
create := Create; 
open := 0pen;
END;

VAR 
fileServer := NEW(FileServerImpl); 
BEGIN 
NetObj.Export("FileServer",
END;

MODULE FileServerClient; 
IMPORT File, FileServer;

VAR
fileServer := NetObj.Import("FileServer file");
file := fileServer.Create("someFile"); 
BEGIN 
file.putChar('a'); 
END FileServerClient;

In the above code, FileServerImpl creates an instance of a file server and puts it into a name server. (The NetObj.Export call does this.) The module FileServerClient (which would be running in a different address space or machine) imports the file server implementation. This gives a valid Modula-3 object back to the client. From that point on, the client invokes methods on it as if it were local. It then creates a File object which it begins adding characters to.

If you have done any development with SunRPC or DCE, you will immediately appreciate how much simpler this is than programming on top of either of these systems. Network Objects is similar in scope to these systems, but is tightly integrated into the programming model instead of being a poorly integrated adjunct.

Two interesting systems have been built on top of network objects. The first is Obliq which is an object-oriented, distributed scripting language. Obliq can call into existing Modula-3 packages. You can also create Obliq objects and hand them to other programs running on other machines. Obliq is similar in scope to Telescript or TCL-DP (TCL with Distributed Programming extensions). The other system is Visual Obliq, which can be thought of as 'distributed Visual Basic for Modula-3'. It includes an interactive, graphical application builder. Callbacks are handled by Obliq scripts. This makes it a very powerful tools for prototyping and building distributed applications. It can also be used as the basis of interesting collaborative applications.

Some Personal Experiences with Modula-3

Our group has been using Modula-3 for about six months now, although I have been involved with it since 1989 or so. Our group consists of experienced C/C++ programmers. Two of have been involved with C++ since version 1.2 and two of us worked on the implementation of a C/C++ programming environment.

Our experience with Modula-3 has been completely positive. The group members feel that the language, libraries, and supporting tools have made us far more productive than we were when using C++. The libraries are of higher quality and have better documentation than many commercially available libraries. To accomplish a given task, we write considerably less code than we used to and we believe the code is of higher quality. We attribute this to two things. The first is that the language is clean and simple; far less mental effort is required to understand how to accomplish something. The second contributing factor is much heavier use of libraries. Instead of writing some piece of functionality, we first see if the standard libraries provide it or something close to it. Most of the time we find something close enough that we can take it as a starting point.

On a more personal level, I have rarely seen a language, tools, and set of libraries that so neatly combined simplicity, elegance, and power.

Conclusions

Modula-3 and the implementation from SRC provides an excellent basis for developing Linux applications. It is a system designed to meet the programming challenges of the '90s. The language is clean, simple, and powerful. The provided libraries are almost unequalled. The support for distributed programming is among the best available.

One way to think of Modula-3 and the SRC implementation is bringing a "NeXTStep-like" environment to Linux. They both start with a simple object-oriented language (though M3 is both safer and more powerful) and build useful and sophisticated libraries on top of it. Of course, Modula-3 has the advantage of being freely available and running on Linux!

Geoff Wyant is a researcher in distributed systems with SunMicrosystems Laboratories. In his past, he has built programming environments for C++, worked on distributed file systems and RPC systems, and hacked operating systems kernels. He can be reached at geoff.wyant@east.sun.com.