return to first page linux journal archive
keywordscontents

Home Entertainment Linux MP3 Player

Here's a way to store your CDs and tapes forever, while still enjoying the music.

by Goran Devic

Imagine lying back on your sofa in the living room with your remote control unit. You press the channel selection button, and a synthesized voice speaks out, ``Alternative''. You keep pressing the channel-up button and the voice speaks out different music categories: ``Children's'', ``Classical'' and others. You chose the ``Classical'' category and press ENTER on your remote control, and the voice starts listing the albums in the same manner. You select ``John Williams, Spanish Guitar Music'', and the near CD-quality guitar music starts playing.

No, the music is not playing off some CD changer, and the synthesized voice is actually interactive, responding to your remote controller actions. The music stored is your complete CD collection, plus those old tapes (somehow still around), digitized and compressed as MP3 files. The whole show is controlled and played by a Linux server hidden in your closet, connected by an audio cable to your home entertainment amplifier.

As soon as I realized the scenario described above is already within our reach, I decided to code the missing pieces and somehow glue it all together.

MouseREMOTE from X-10

The remote control unit is the central part of my design. The controller I used, called ``MouseREMOTE'', is part of the ``BigPicture'' package that lets you transmit audio/video signals from your computer remotely to your TV. Being a ``control freak'', I purchased the whole package some time ago and set up all the house lights to be remotely controlled. The remote mouse is especially useful, as it performs like any other controller for audio/video components (it can also control X-10-based home automation devices). In addition to every other button commonly found on a universal remote, it has a rubber mouse pad on the face and two buttons on the back of the unit. The controller sends RF signals when its buttons are pressed to the receiver unit which, when its buttons are pressed, has a regular computer mouse pass-through, so you can still use your regular serial or PS/2 mouse. The remote-mouse software packets will be inserted in the stream of packets sent by your regular computer mouse. Unfortunately, the MouseRemote comes with only the MS Windows software to assign actions to different keys. I installed a high-quality audio cable from the Linux sound card line-out connector in the server room down to the living room amplifier unit's RCA input.

That was all I had to do on the hardware side. For software support, I had to modify the mouse-server program to accept the codes sent by the MouseRemote unit and pipe them to an MP3 player program, which will perform different functions depending on the remote-mouse selections.

Table 1

I determined the MouseRemote specification by reading codes using the modified gpm program. MouseRemote is detected as a bare-mouse type, and it returns packets of three bytes per event. Table 1 shows the codes returned for the events. As you can see, almost all of the keys return some code, giving this remote unit tremendous potential versatility.

There are few peculiarities: mouse button codes (left/right buttons located at the bottom of the controller unit) are transmitted three times in a row. Only one button can be pressed at a time; the other one is ignored while the first one is being pressed. You can press and hold one mouse button and press any key. If you hold down mouse buttons, the codes will not start repeating; only mouse pad (mouse move) codes repeat with no apparent delay. All other keys are repeating with a short delay of approximately 1/20 of a second.

The mouse movement pad detects three levels of pressure that can be used by the mouse driver software to accelerate movement. These levels set bits 0, 1 and 2 of the second and third byte in the mouse packet. Those bytes are used to determine the difference in movement between two consecutive packets, so the firmer the pressure, the faster the movement appears.

Modifying gpm

I decided to modify mouse server gpm (version 1.13). The key function added is x10codes in the file gpm.c (see Listing 1). The function argument data is an array of three bytes making up the current mouse packet. The detection of the MouseRemote device is not deterministic. I rely on the property that all its buttons return codes in the specific range; that is, the range of the possible, but not probable, combination of codes returned by the normal mouse movement. In particular, the first code is always in the range of 44 to 47, with the third code always being 0x3F. A regular mouse would have to be moved quickly at just the right speed in order to get the same code out. In the unlikely case it happens, the second level of safety is the existence of a named pipe, /dev/x10. If the pipe does not exist, the packet will be passed on for normal processing. I used that named pipe to connect to a reader of remote-mouse codes.

Listing 1

Transforming mp3blaster into Remote Player

Of all MP3 player programs available for Linux, only mp3blaster has a user interface nice enough for easy control of directories and files to be played. The program supports multiple groups and can interactively select among directories. I decided to use both features. All albums are stored in separate groups, and at any time, I can toggle to directory browsing and select albums hierarchically. The prerecorded voice files are good guides for where you are and what you are doing (see Speech Synthesis).

Given that all your MP3 files reside in some hierarchical directory structure, say at /home/mp3, you need to set the environment variable MP3_ROOT to it. That way, the player will know where the files are, and during the directory browse, it will not allow you to accidentally change directories to one above it. (Remember, we are physically too far from the keyboard and display to fix any mishap.)

The mp3blaster is invoked with the option -x which I added to activate all the remote features.

In order to get full use of the groups features of mp3blaster, you need to manually set the current working directory to the MP3_ROOT directory (where your music hierarchy starts), start mp3blaster and press the F1, F5 keys. The F5 function key will add all directories as groups, thus effectively listing all your albums. Then, you can save the list by pressing the F6 key. So now, you would start mp3blaster with the following syntax:

/usr/bin/mp3blaster -l 
You can run the program from the init script if you wish, or from an idle console; it doesn't matter, since it will connect to the remote control unit and perform its function in the background. Now, using your remote controller, you are able to browse the directories and play songs. As you will see, the speech synthesis is also coded in to give you feedback.

mp3blaster has two modes of operation: group and file. The group mode of operation accepts the following remote controller keys:

The file selection mode is more complicated, as we are allowed to traverse directory structure and play arbitrary albums. The remote keys accepted in this mode are:

During play, the following keys are available:

Technically speaking, I added two threads to the mp3player program. One thread is always busy waiting for the remote codes from the /dev/x10 pipe. As soon as it gets them, it sets some variables used by the player class. The other thread is looking for the sound files (I call them voice files) that need to be ``spoken''. In essence, via some mutexes and signals, the player requests presynthesized sound waves to be sent to /dev/audio, and this thread is making sure they actually end up there. (All voice files are pre-recorded and stored in a known location.) Voice files are spoken representations of directory names, named after the music categories and artists, and contain their songs.

Looking at the code, the process of inserting actions into the input loop of mp3blaster can be viewed as somewhat hacky, but most codes are just inserted as keys that would be pressed for an equivalent action from the keyboard anyway. The number of changes is rather large to be printed here, so please see the file src/main.cc for details.

Speech Synthesis

When wanting to browse your mp3 music albums with no computer monitor to guide you, the natural substitute for vision is the voice. I decided to use festival, an excellent speech synthesis package. It is not only a current research development project that is growing and improving daily, but one you can actually use as soon as you install it.

festival can generate speech on the fly, as you type any text interactively, or you can pipe in a text file and it will synthesize it. None of these real-time approaches seems to be fast enough for interactive menu selection. I needed immediate voice response, and generation on the fly introduced a delay proportional to the length of the album names, noticeable and annoyingly long for normal use. The solution was to create a subdirectory containing all voice files to be used during the browsing. This way, the MP3 player program does not have to call festival to generate each album name as we browse it, but can use wave files cached in that specific directory. A drawback to this approach is the disk space taken up for the voice files, but that space is negligible in comparison to the actual MP3 files which amount to 50 to 60MB per album.

Once you generate voice files using the festival program, you can test each of them by simply piping them to /dev/audio. Also, you may want to change diphone for some albums (I found the Spanish diphone to make much better pronunciation for the groups of International albums). Alternatively, you could manually record all your voice files, thus eliminating the need for a speech synthesis program.

The Perl script in Listing 2 is used to traverse all the subdirectories under the MP3 files root directory and in creating all necessary voice files used by the mp3blaster player.

Listing 2

In order to generate necessary voice files, you would run this script every time you add an album or change the directory structure. You can run the script with the option -clean to ensure all old files are deleted before creating a new set.

All voice files are stored in your root mp3 directory under the subdirectory .vocals. They are vocalized interpretations of all subsequent subdirectories, and thus all the album names as well (they are just subdirectories at some terminal node, and they contain only MP3 files).

The Perl script first creates text files (original subdirectory name with the extension .txt). They contain a slightly modified name stripped of all non-alpha characters. This is done to help the speech synthesis program generate more precise sounds. Lastly, the u-law audio files are created based on the content of those files. If you are not satisfied with how it sounds, you can change the phonetics inside the text files, delete the voice file and rerun the script in order to get the optimal pronunciation.

Conclusion

The technology to do almost anything with the music is already available. In my opinion, without Linux and open-source software, the task of building such a remote-directed MP3 player would be much more difficult. Since my Linux server is up 24/7, it makes sense to use it any time I want to listen to music as well. Although the system I dedicated for it is a rather modest one (Cyrix 6x86 running at 166MHz), the MP3 player is using around 40% of the CPU time when playing, and there are no audible interruptions even when concurrently serving web pages. In addition to compressing all my CD collections and storing them on my Linux server as MP3 files, I also digitized my old tapes and, after some processing of the sound in order to improve it, stored them in the same music hierarchy tree. Finally, I could remove all the CDs and tapes from my living room and store them away for good. Now, every musical piece is quickly accessible by a touch of a remote-controller button.

Resources

Goran Devic (goran@3dfx.com) has a BS in computer science from the University of Texas in Austin. He has worked on developing Cirrus Logic's Laguna 3D graphics accelerator (5464/65) and has three patents issued and ten pending in the area of graphics accelerators. He is currently working on new generation graphics accelerators at 3dfx Interactive, Inc. He spends his free time with his year old son, Siddartha. His hobbies include playing classical guitar, photography and Eastern spirituality.