Daily Archives: August 24, 2007

Wherein Avaragado donates code to a grateful world

For a recent
project
I wrote some simple client-side form validation. Most
of it was trivial; but I also wanted to check for a syntactically
valid email address (I didn’t care whether it reached someone’s
inbox or not).

I performed an exhaustive Google search (no I didn’t) and found
one or two bits of code, but most of them annoyed me. I didn’t
want something that encoded a list of allowed top-level domains,
ferchrissakes, or some script kiddie idea of an email address that
disallowed subdomains. I wanted a regular expression that
referenced the appropriate RFC, that appeared to know what it was
talking about.

Of course I can find one now, now that I’ve written the
regular expression myself, but then I was dissatisfied. I
invoked the holy rite of Not Invented Here with a
soupçon of Never Knowingly Underengineered, and
settled down with a cup of tea, a copy of RFC 2822 and a
pile of shortbread.

The object of my desire was a regular expression that matched
the RFC 2822 ‘mailbox’ token, minus a few things. RFC 2822, not
the most gripping of reads, concerns itself with the format of
email messages for transmission over the net and so has to worry
itself ragged about line lengths and CRLF, and dealing with what
might charitably be called prior misunderstandings (those who
deployed software that didn’t conform to RFC822, the previous
version of this specification). I decided to wave a Dilbertian
hand at all that nonsense. I mean, does anyone actually put
comments inside their email address? (This is a valid mailbox,
according to RFC 2822: Pete(A wonderful \) chap)
<pete(his account)@silly.test(his host)>
– the bits
in parentheses are comments, which you can nest to an arbitrary
and pointless level.)

Here’s what I ended up with (with added line breaks for the web – remove before use). I’m reasonably confident it’s
correct; I used test-driven development techniques to derive it.
It’s licensed as cc-attrib mainly to reduce annoyance – GPL is
overkill for a regular expression, I feel. This licence allows
people to port it to their language of choice as long as they
credit me, and to incorporate it into their own code without any
additional licence burden. The copious comments are there to annoy
Roger.

function bValidMailbox(s) {
// This function (but not any surrounding code) is copyright
// (c) 2007 David Smith (dave a t sheepshank d o t org).
// This work is licensed under a Creative Commons Attribution
// 2.0 UK: England & Wales License.
// http://creativecommons.org/licenses/by/2.0/uk/

// The regular expression below is based on RFC2822. It matches
// the 'mailbox' token defined in that RFC, with the following
// changes: no obsolete parts; no comments; no domain literals;
// no spaces within or around the domain; no unquoted spaces in
// the local-part; at least one dot in the domain; no CRLF
// allowed.

// It is believed to be accurate but YMMV. Use at your own risk.

// Examples that PASS this test (one example per line):
// jdoe@example.org
// <boss@nil.test>
// John Doe <jdoe@machine.example>
// Who? <one@y.test>
// "Joe Q. Public" <john.q.public@example.com>
// Joe "Q." Public <john.q.public@example.com>
// "Giant; \"Big\" Box" <sysservices@example.net>
// Giant \'Big\' Box <sysservices@example.net>
// "john q. doe"@machine.example
// John "Q." Doe <"john q. doe"@machine.example>

// Examples that FAIL this test (reason after the dash):
// me  - no domain
// me@you  - domains must have a dot
// me@you.  - that's not at the end
// me@.you  - or the beginning
// me me@example.com  - address spec not within <>
// my.name <me@example.com>  - no unquoted dots allowed there
// me@example . com  - no spaces allowed there
// me@ example.com  - or there
// me @example.com  - or there
// me < me@example.com >  - or there
// me@[1.2.3.4]  - domain literals not supported
return /^(([\x20\x09]*[\x21\x23-\x27\x2a\x2b\x2d\x2f\x30-\x39\x3d\x3f
\x41-\x5a\x5e-\x7e]+[\x20\x09]*|[\x20\x09]*\x22([^\x00\x0a\x0d\x22\x5c\
x80-\xff]|\x5c[\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0b\x0c\x0e-\x7f])*
[\x20\x09]*\x22[\x20\x09]*)*[\x20\x09]*\x3c([\x21\x23-\x27\x2a\x2b\x2d\
x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+(\x2e[\x21\x23-\x27\x2a\x2b\x2d
\x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+)*|[\x20\x09]*\x22([^\x00\x0a\
x0d\x22\x5c\x80-\xff]|\x5c[\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0b\x0c
\x0e-\x7f])*[\x20\x09]*\x22[\x20\x09]*)\x40[\x21\x23-\x27\x2a\x2b\x2d\x
2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+(\x2e[\x21\x23-\x27\x2a\x2b\x2d\
x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+)+\x3e[\x20\x09]*|([\x21\x23-\x
27\x2a\x2b\x2d\x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+(\x2e[\x21\x23-\
x27\x2a\x2b\x2d\x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+)*|[\x20\x09]*\
x22([^\x00\x0a\x0d\x22\x5c\x80-\xff]|\x5c[\x01\x02\x03\x04\x05\x06\x07\
x08\x09\x0b\x0c\x0e-\x7f])*[\x20\x09]*\x22[\x20\x09]*)\x40[\x21\x23-\x2
7\x2a\x2b\x2d\x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+(\x2e[\x21\x23-\x
27\x2a\x2b\x2d\x2f\x30-\x39\x3d\x3f\x41-\x5a\x5e-\x7e]+)+)$/.test(s);
}

6 Comments

Filed under Random