Let us say we are given a text file containing Academy Award (Oscar) winners by year and category, formatted as follows:
1995:Actor:Nicholas Cage 1995:Picture:Braveheart 1995:Supporting Actor:Kevin Spacey 1994:Actor:Tom Hanks 1994:Picture:Forrest Gump 1928:Picture:WINGS
We would like to provide the following services:[2]
[2] To see real historical databases for the Oscars, look at http://oscars.guide.com/. (Illustra, an object-oriented database from Informix, is used for the grunge work.)
Since we would like to retrieve entries by category or by year, we use a double indexing scheme, as shown in Figure 2.2.
Each entry includes a category, a year, and the name of the corresponding winner. We choose to keep this information in an anonymous array (an anonymous hash would do just as well). The two indices %year_index and %category_index map the year and category to anonymous arrays containing references to the entries. Here is one way to build this structure:
open (F, "oscar.txt") || die "Could not open database: $:"; %category_index = (); %year_index = (); while ($line = <F>) { chomp $line; ($year, $category, $name) = split (/:/, $line); create_entry($year, $category, $name) if $name; } sub create_entry { # create_entry (year, category, name) my($year, $category, $name) = @_; # Create an anonymous array for each entry $rlEntry = [$year, $category, $name]; # Add this to the two indices push (@{$year_index {$year}}, $rlEntry); # By Year push (@{$category_index{$category}}, $rlEntry); # By Category }
Notice that each push statement does a fair bit of work. It creates an entry in the index (if required), hangs an anonymous array off that entry (if required), and pushes the reference to the entry into that array.
Another important thing to notice is how braces have been used to specify the correct precedence in the expression @{$year_index{$year}}
. If we had omitted the braces, the expression @$year_index would be evaluated first and then indexed as a hash, according to the rules explained in the section "Confusion About Precedence" in Chapter 1.
This is a simple matter of traversing the %year_index hash:
sub print_entries_for_year { my($year) = @_; print ("Year : $year \n"); foreach $rlEntry (@{$year_index{$year}}) { print ("\t",$rlEntry->[1], " : ",$rlEntry->[2], "\n"); } }
We already know how to print entries for a given year. Find out all years for which we have data, sort them, and call print_entries_for_year for each year:
sub print_all_entries_for_year { foreach $year (sort keys %year_index) { print_entries_for_year($year); } }
We can traverse either index, and we choose to traverse the %year_index index, since there are substantially fewer categories per year than the number of years for which a category is valid:
sub print_entry { my($year, $category) = @_; foreach $rlEntry (@{$year_index{$year}}) { if ($rlEntry->[1] eq $category) { print "$category ($year), ", $rlEntry->[2], "\n"; return; } } print "No entry for $category ($year) \n"; }