What is Distributed Programming?
For this article, distributed programming
means using different autonomous computers connected over
a network to solve a single problem. Such a hardware configuration is called
a multi-computer; these computers don't have direct access to each other's
memory and peripherals. Programming multi-computers requires models
which are different from normal systems and in which the programmer can transfer
data between different parts of the program through shared memory, either in
global variables or in the stack.
Among the reasons for writing distributed programs are the following:
- To achieve more processing speed by using more than one
computer at a time.
- Having more than one computer could improve
reliability. If one computer fails, others might be able to compensate for
the loss.
- Some problems, like remote data acquisition, require a
distributed environment in which to work.
To run a distributed application, the following problems
should be addressed:
- It should be possible to start processes on remote
computers.
- The necessary data for these processes must be provided
before they can do any work.
- Some mechanism for synchronizing these processes should be
available, so that they know when to access the data, and produce any results.
Starting a program on another computer is not very hard using
programs like telnet or rsh. Exchanging data and synchronizing,
however, can be quite difficult and complicated. These problems can distract
the programmer from his original project and can be the source of numerous
bugs.
Linux already has some mechanisms for processes in the same computer to
exchange data and synchronize between themselves. This is called Inter-Process
Communication or IPC. One prominent example is the System V IPC, first
introduced in AT&T's System V UNIX. This is a set of mechanisms, which
includes:
- Shared memories can be shared by more than
one process and can be read or written at any time. When one process writes
some information in a shared memory, other processes can immediately use
it.
- Messages are data structures which can be exchanged
between different processes. Unlike shared memories, sending and receiving
messages requires the use of system calls.
- Semaphores can be used by different processes for
synchronization purposes.
These mechanisms are documented in many UNIX programming books, and provide
a familiar interface for IPC in many flavors of UNIX.