You want to see whether a value only consists of alphabetic characters.
The obvious character class for matching regular letters isn't good enough in the general case:
if ($var =~ /^[A-Za-z]+$/) { # it is purely alphabetic }
That's because it doesn't respect the user's locale settings. If you need to match letters with diacritics as well, use
locale
and match against a negated character class:
use locale; if ($var =~ /^[^\W\d_]+$/) { print "var is purely alphabetic\n"; }
Perl can't directly express "something alphabetic" independent of locale, so we have to be more clever. The \w
regular expression notation matches one alphabetic, numeric, or underscore character. Therefore, \W
is not one of those. The negated character class [^\W\d_]
specifies a byte that must not be an alphanumunder, a digit, or an underscore. That leaves us with nothing but alphabetics, which is what we were looking for.
Here's how you'd use this in a program:
use locale; use POSIX 'locale_h'; # the following locale string might be different on your system unless (setlocale(LC_ALL, "fr_CA.ISO8859-1")) { die "couldn't set locale to French Canadian\n"; } while (<DATA>) { chomp; if (/^[^\W\d_]+$/) { print "$_: alphabetic\n"; } else { print "$_: line noise\n"; } } __END__ silly façade coöperate niño Renée Molière hæmoglobin naïve tschüß random!stuff#here
The treatment of locales in Perl in perllocale (1); your system's locale (3) manpage; we discuss locales in greater depth in Recipe 6.12; the "Perl and the POSIX Locale" section of Chapter 7 of Mastering Regular Expressions
Copyright © 2001 O'Reilly & Associates. All rights reserved.