Using Linux as a base to implement advanced telephony applications.
by David Sugar
In ``Let Linux Speak'' (LJ, January, 1997), I demonstrated some fun applications for the SPO256 text-to-speech board. Buried in that article was a brief discussion on the potential for using text-to-speech as a telephony resource and for using Linux as a telephony services platform.
Talking about ``telephony services'', or computer telephony in general, can mean many things. The history of computer telephony, while an interesting subject, is not our primary focus. Instead, I will discuss the use of Linux as a platform for voice response and path switching, for PBX integration and switch call control, and for the extension of traditional voice applications onto the Internet.
Voice response includes many applications such as traditional voice mail and interactive voice response (IVR) systems utilized as automatic ``dial and survey'' machines. These applications are typically built around multi-channel voice telephony boards that capture and play back digitized speech and that generate and listen for DTMF digits and call progress tones. The more advanced of these boards offer on board DSP resources, emulate FAX/modem services and perform speech recognition. The largest single vendor of this kind of board is the Dialogic Corporation.
PBX integration involves direct computer control of a PBX switching system. Many vendors have specialized boards and/or serial interfaces which run proprietary protocols for gaining access to different switch features. Generally, PBX integration is implemented as a telephony server or API, such as Microsoft TAPI or TSAPI (currently supported only under Novell Netware). Usually, these APIs implement first party call control for desktop applications (such as putting a telephone image and dialer on a desktop, controlling a digital telephone directly as a ``terminal device'') or as third party call control for server applications (such as ACD: automatic call distributors).
The whole area of Internet telephony is vastly interesting and intriguing. Most often, the first thing that comes to mind when one says ``Internet telephony'' are those nifty programs that allow computer users to place low-grade international telephone calls for free over the Internet. This same technology, when applied on a private corporate LAN with sufficient bandwidth, could provide a cheap means of inter-office switching (much like tie line services and expensive private T-1 networks) and a better solution for ACD agent positions.
Traditional low-cost telephony solutions have historically been implemented either under MS-DOS (with, perhaps, a custom real-time kernel) or under OS/2. The need for highly specialized real-time operating systems to drive multi-channel voice applications has disappeared as a result of increased CPU power, and equally important, the increased sophistication and power of add-on telephony boards. Many of these boards now manage I/O and call state largely on their own with only an occasional need for direct intervention. In the past guaranteed maximum interrupt latency was the mantra for evaluating real-time performance in complex voice processing systems; I now find support for real-time predictable scheduling policies more important.
While Windows NT is commonly touted as a ``telephony operating system'' these days, there are several serious considerations. The first is simply expense; a Windows NT machine means a machine with a video display. More often, telephony applications are deployed on dedicated stand-alone machines which sit in phone closets and, ideally, require only remote management.
Some of the same optimizations that make NT work better than Windows 95 as a desktop machine get in the way of using it for telephony applications, or for use as a telephony server and workstation concurrently. For example, one finds strange scheduling quirks that occur as NT optimized video drivers, which are now given the highest priority, update large areas of the screen.
Finally, even in today's world of cheap RAM, NT requires a minimum of 32MB, while Linux runs smoothly in 8MB or less. Even in the low-end commodity voice market, where MS-DOS-based voice mail systems predominate and a $50.00 change in margins can make or break a product, these costs are very important. Now, imagine a $2000 voice mail system, or, even better, a $1000 voice mail ``machine'', with desktop integration, multi-site networking and voice/email exchange, and what that system would do to the bottom of the commercial voice mail market.
Why not OS/2? Well, first there is always the question of ``will it be around?'' Second, the last non-desktop optimized release of OS/2 was 1.3, and it is still the most commonly used and supported release of OS/2 in voice processing products today. OS/2 driver support still exists in the voice response OEM marketplace, but it is not a rising star.
Why not DOS? Simply put, one cannot easily run network services from a DOS machine. In tomorrow's world, voice mail will have to present voice messages on the desktop, whether through proprietary means or through a web server and standard e-mail protocols. Other advanced user applications and networking services will need to be leveraged onto these once dedicated stand-alone voice processing machines.
And what about Unix? For many years, some variants of Unix have been used successfully in voice processing, typically in vertical market applications. The complete failure of the major Unix vendors to understand the CTI market and create appropriate software licensing terms or stripped down embedded releases have kept the cost of using these systems prohibitively expensive as a general purpose CTI platform. For example, a Unix machine for voice processing may not need NFS, many user utilities or X Windows. However, it does need sockets and a web server for administration and desktop telephony. No major Unix vendor seems to know how to properly license such a stripped down, embedded configuration.
So what are we left with? An inexpensive operating system capable of using inexpensive hardware, of running a mix of user and real-time scheduled processes, of remote management without the need for a local console, of integrated networking and of giving months of reliable service unattended. Only Linux and Free-BSD fit these criteria.
As mentioned earlier, most vendors support some form of direct PBX integration. By far the most common method of integration, in the broadest sense, is ``call detail''. Virtually every PBX on the market is capable of generating call detail, also available as serial data, that can be used for call accounting systems.
Call detail is often represented by different formats from different manufacturers, and may change within different releases of the same base product. The kind of information varies from simple information on in-bound/out-bound trunk usage (for example, number called, date and time, duration and extension called from) to complete call traces, whereby every aspect of a call's passage through a PBX is detailed.
Most PBXs support ``programming ports'', which are essentially serial ports hooked up to a dumb terminal or PC running a terminal program like ProComm. These programming ports are typically connected to modems to allow remote maintenance of the PBX.
The programming port is quite useful and is often overlooked. From it one can find and change every aspect of a typical PBX system, such as hunt groups, directory number assignments, ringing and restriction patterns, forwarding, feature buttons on digital sets, etc. However, each piece of telephony equipment is different in the way that it handles programming. The Fujitsu F9600 has perhaps one of the easiest command languages to integrate, in that the programming interface is represented as a shell, with a shell-like command language and predictable response strings for both successful and failed commands. This makes it ideal for use with dumb serial terminals or automatic program control.
Some systems, such as the Nitsuko 384i, implement their program port as a menu system on the assumption that the tech maintaining the switch is on a form of ANSI terminal. Others that have a serial programming interface, such as the Panasonic DBS, behave and accept commands as if they were a digital telephone set, and are more unusual. These systems require a lot of work to be usable from an application standpoint.
One universal form of integration, though not widely implemented, is SMDI. SMDI is a protocol originally developed by Bell Labs for central office automated call response equipment. SMDI, when available, is commonly used for voice mail integration.
SMDI is carried on a serial port, and involves a simple, standardized protocol for sending commands and receiving messages from a PBX. Unfortunately, SMDI is pretty much limited to identifying the origin of a call, when it appears on a ``message detail'' extension (e.g., the extension of an attached voice mail system), and control of ``message waiting'' indicator lamps on phones. However, as this functionality is useful to basic voice mail systems, we will discuss SMDI again later in this article.
Systems that do not provide SMDI use tones for signaling and integration of third party voice mail systems. These tones are DTMF, and appear on the port(s) where the attached voice mail or IVR application handles its voice path. Since this form of integration is very limited and involves voice boards, I won't discuss it further.
This leaves us with a broad class of ``other'' integration ports. These are typically serial ports with a custom API or communication board. They use proprietary protocols to indicate signaling and control activity as calls pass in to, or are generated out of, a telephone switch. The nature of these messages vary widely from one vendor to another.
Some vendors are very keen on providing support for third parties to develop call control applications. Harris, in particular, not only documents their primary serial control protocols (known as Harris HIL and WIL respectively), but also provide third parties with actual source code that demonstrates how to implement their protocols. Other vendors, such as Fujitsu, provide full technical documentation and specifications for their protocols to interested third parties. The English is a little poor in the Fujitsu documentation, but at least they make some effort to promote their interface (TCSI).
Alas, many vendors feel there is need to keep such information secret. Doing so has slowed the widespread development of general purpose computer telephony application interfaces and restricted development to a very limited set of players; namely, Novell TSAPI and Microsoft TAPI. This wall of secrecy limits most third parties from creating more advanced integrations around the strengths of a specific brand of equipment's feature set and permits the manufacturer to pick and choose who will be allowed to directly integrate their product.
Let's take a look at the Fujitsu F9600 PBX in detail. It includes a serial port for programming. Fault messages and system status and call detail also flow out of this port. It has an optional board with a synchronous 19.2Kbps TCSI interface known as TCSI. Considering the fact that it is a slow speed link, it seems unnecessary.
In addition, Fujitsu has their own telephony architecture built around a telephony server. This telephony server runs under SCO (ODT 3) Unix and provides a separate server for each port, as well as some application servers. A primary telephony server is used to control the TCSI interface through an Eicon HDLC/X.25 board. A second, separate server is used to control the programming port. Several applications can reside on the telephony server, each of which uses a separate server and communicates via TCP/IP sessions to the MML (PBX programming interface) and TCSI servers. In addition, an interface driver can be loaded under a separate machine running Novell Netware, along with Novell Netware Telephony Services, to provide a TSAPI interface to the PBX via the SCO machine.
The multi-tiered server structure Fujitsu uses enables applications to directly utilize TCSI services, the MML interface or a generic universal API set (TSAPI). However, it uses two separate machines and operating systems to build this structure, and each low-level server is not directly aware of potentially useful information contained in the other.
Other vendors hide the low-level interface to the PBX completely and provide only a TSAPI (or TAPI) interface for their switch. These interfaces use only the information found on one port (the call control port) and ignore information available on the programming port and the potential to create remote network administration through the universal API.
In fact, the universal API concept as expressed in TAPI and TSAPI is based on creating a model PBX call control system as a server. A single API and single protocol is used between the application and this universal server. The server is then implemented by providing a lower level driver (service provider interface) to figure out how this generic PBX model applies to the peculiarities of the vendor's specific hardware and call control model.
The single-model API fails in one of two ways. First, some features of the model API are not available under a specific vendor's low-level SPI driver due to the inability to represent and recreate specific functionality within the universal call control model. Second, really neat and unique features the vendor may support beyond the basic universal model are unavailable unless, like Fujitsu, a second direct protocol server is provided.
To solve these and other problems in advanced integration for free and open systems, I completely abandoned the universal API and single protocol hierarchical model used in TSAPI and TAPI. I feel a better solution is a multi-protocol ``TServer'' used as a single unified server and network point of entry for access to all services that can be made network accessible from a telephony switch:
The Panasonic DBS is a widely used digital Hybrid Key system (not to be confused with the Panasonic KXT). The choice of creating a model server around the DBS was predicated on both immediate familiarity with the internals of this switch and availability of hardware on which to test.
The DBS has two interfaces that can be used. The first, a built-in serial port, is used for both programming and call detail recording. For this reason, the server must be sensitive to the current ``operating state'' of the port (i.e., whether it is currently being used for programming). As mentioned earlier, commands to the DBS programming port are represented in a manner analogous to the way one would program the switch from a digital key telephone.
To simplify the DBS serial programming for application use, I created a simple protocol to send the ``change value'', ``get value'' and ``clear value'' requests which make up the conceptual heart of the operations used for programming the DBS from a telephone. These commands, and the specification of an address for the values being affected, make up the essential elements of my DBS programming protocol.
The DBS also includes an optional API card. This card has a serial port used for call control and switching. This low-level protocol is unidirectional and involves a message packet and access negotiation protocol. The interface is serial, with no provision for hardware flow control. Messages are essentially exchanged between the host machine and the switch, similar to Fujitsu TCSI, Harris HIL or other equipment. Later versions of the DBS API include two serial ports, one of which has extended features to support a Netware TSAPI server.
With all these protocol interfaces to maintain and the need to connect a single point of network entry, a monolithic server cannot be used. Since the points of connection (API interface and programming interface) can be used as a shared resource by multiple applications simultaneously, a simple ``fork()ing'' server common in Unix is inappropriate. Instead, there is a set of protocol servers connected to a single ``traffic cop''. This multi-protocol traffic cop binds itself to a single network port address providing universal access from a single network session and acts as the primary TServer.
In the simplest approach, each protocol (interface) server is run as a child process using anonymous pipes. The TServer accepts and manages user application sessions. Anonymous pipes for the protocol server are very fast and require low overhead, especially with Linux.
In the DBS are several layers of servers for the API portion of the interface. This layering behaves a little like a stream driver stack, and it could probably be implemented as such under a streamable Unix.
At the lowest layer is the DBS packet link driver (papild). This driver negotiates the unidirectional packet exchange with the DBS and normalizes DBS messages to a single fixed packet. The reason for a fixed size packet is to allow transfer of all messages to occur between the different protocol layers by performing atomic read() and write() calls. This method allows clean tracking of ``split'' messages on a non-blocking pipe that has been overwritten past its buffer.
The DBS packet channel state driver (papicsd), is a second stream layer built from pipes above papild. This layer is aware of the peculiar nature of the DBS wherein all message events are initiated with regard to one or more channels, which are virtual station ports associated with the DBS API card. Requests that may generate reply messages are queued within papicsd. If a request with an expected reply fails, the papicsd layer can generate timeout events. Similarly, the papicsd layer is responsible for glare resolution, a situation that occurs when attempting to take a station off hook at the same time that an incoming call occurs. Finally, the papicsd layer can trap ridiculous or meaningless commands, such as hanging up a port that is already known to be on hook and returning generated error messages back to the application.
By having a separate process for papild and papicsd, they become easier to write and debug. Also, since the DBS has no flow control, papild can be dedicated to I/O specific operations and, perhaps, even run within real-time scheduling constraints in Linux (2.0 and above). This, in conjunction with the non-blocking atomic pipe as a means to drive packets up and down the protocol layers, requires less overhead and works much more smoothly than some of Panasonic's original QNX application products for the DBS.
On the programming interface (labeled SMDR on the DBS), a single process is used. When no programming requests are active, it simply spools CDR data for call accounting purposes. When a programming request occurs, the server changes state to a programming port interface server and remains in that state until 30 seconds pass without any program requests being made.
In addition to the TServer itself, there is one more protocol layer on the papi stack. This utility layer maps DBS API channel states to station card and trunk states based on the live DBS API data. This mapping layer, papiltu, is essentially a SPI-like layer, in that it represents the DBS's current state in an intermediary form in terms of line and trunk activity that may make more sense than the DBS API channel architecture. This layer also holds the DBS station directory map, which is learned during initialization from the programming interface. The information it maps and maintains is shared with other DBS related protocol servers (such as the SMDI server) through a shared memory block.
By using the programming port and papiltu, the papi protocol stack is able to automatically configure itself by reading current DBS programming without user application programming and setup. This replaces the clumsy manual approach of a ``channel configuration'' screen to separately program the API port configuration, as used in earlier Panasonic application products. The papiltu functionality was originally an integral part of the primary TServer application server, but with the restrictions on publication of DBS API material, it was made into a separate layer to facilitate publication of the primary TServer source code, if none of the papi stack sources can be published. This matter is still under discussion. (See comments at the very end of article.)
The TServer line/trunk state maps actually make the TServer behave a little like the SPI layer of TAPI or TSAPI, as it represents an intermediate form between a generic PBX model and the actual physical model. In this case, in addition to driving direct hardware interface protocols, our TServer can communicate with logical protocol drivers which are not directly connected to any physical hardware.
One such logical protocol driver is the DBS SMDI driver. Connected through anonymous pipes, like the papicsd and smdr layer drivers, the smdi protocol driver talks to the shared papicsd by going back through its connection to the TServer. Since it does not depend on a single shared hardware resource, as does a real (physical) driver, an instance of the SMDI driver is created (forked) for each client application. This driver inherits the TServer socket accepted for the client connection and the pipe back to TServer; hence, communications between the SMDI emulation driver and the client application is always direct (i.e., it has no TServer routing overhead).
Based on the SMDI example, one can easily create protocol modules and matching application interfaces to fully emulate TSAPI or TAPI services directly. One can also talk to a vendor-specific hardware protocol directly (such as the papi layers or SMDR interface) or create entirely new arbitrary integration protocols. The TServer, as a single point of contact traffic cop, allows the client to perform authentication and to select the interface protocol that the client application wishes to use.
An interesting feature of the DBS is the ability to turn a large display telephone into a simple kind of terminal device. This allows the display content to be controlled by the application, and the application to receive input events when keys are pressed on the telephone. Another feature is a special ``hot key'' (also known as the ACD key), that functions as an ``attention'' key that generates an API event when pressed, regardless of the current telephone state.
One of the first DBS applications I created is a simple menu program for attaching ``applets'' to a telephone. When the attention key is pressed, a simple menu of applications appears from which one can be selected. One such application is used to immediately show status information for my server (how many users on-line, uptime, etc.) along with a soft key menu item to force a server reboot.
Another digital telephone application of mine is a more advanced speed dialer that has no capacity limit and is programmable from the telephone. This application resembles the Fujitsu ``Dial-by-Name'' server application in concept. The DBS has its own internal speed dialing directory. Since alphanumeric text is hard to enter through the phone, I wrote a simple Visual Basic program to connect to the SMDR programming protocol in order to program DBS speed dialing.
A possible future application that comes to mind is empowering users to program their own phones from the desktop or perhaps from web pages.
Many opportunities exist for the use of free and open systems in computer telephony, especially for those telephone vendors wise enough to expand their marketing opportunities by allowing third parties to freely address issues and applications beyond their own immediate scope. While I choose to use the Panasonic DBS and, with it, have accepted restrictions on disclosure and source publication, several other vendors have expressed interest in having their equipment featured in a follow-up article.
When I started this article, I became aware of the effort to create a standard and open Internet protocol for telephony integration, known as ``stp'', Simple Telephony Protocol. After some debate, I have chosen to fully embrace stp, and the current software described in this article is being rewritten to support and comply with the evolving stp standard. The name ``SwitchLink'' has also been adopted for it (swilink for short). My intention is that swilink will become widely available as a free and open software package for release with all mainstream Linux distributions. Thus, free and open telephony will become the norm rather than the exception for Linux.
Recently, there has been considerable change in the attitudes of several key hardware vendors in the telephony business with regard to Linux. I now believe opportunities to write on the use of Linux as a general purpose and high performance Internet telephony platform may be possible much earlier than I anticipated.
Best known for WorldVU, a public BBS system for Linux, David Sugar is currently employed as Director of Software Engineering for Fortran Corp. and uses Linux for commercial telephony development. He maintains his own Internet server under Linux, and may be reached for comment via http://www.tycho.com/. He can be reached via e-mail at dyfet@tycho.com.