Regex validation, was Re: Programmers with network engineering skills

You're using your OS wrong if you are quoting/escaping the arguments.
You do not need a shell involved to use fork() + exec() + wait(), as
the shell is not involved (assuming Unix; I also suspect libc has a
nice packaged function for this that is not insecure like system(),
but it's not all that hard to roll your own). In Perl, use the
multi-argument form of system(), not the single argument version().
In both cases you should clear the environment as well prior to the
exec()/system() unless you know nobody can play with LD_PRELOAD, IFS,
etc.

This is one of my pet peeves about programming - programmers calling
out to insecure functions when secure alternatives are available.

The same goes for SQL statements - if you need to quote things to
prevent SQL injection, you're using your SQL database wrong. Look up
prepared statements. Generally, it's very bad practice to dynamically
build SQL strings. It's also very common practice, hence why so many
applications have SQL injection vulnerabilities. It's the Perl/PHP
equivalent of the buffer overflow that simply wouldn't exist if
developers, instead of trying to figure out how to quote everything,
simply used prepared statements and placeholders.

As for checking for bogus email addresses, read the RFC and code it
right. That's not with a too-simple regex, nor is it with a complex
regex. You need a parser, which is the right tool for the job. Regex
is not. But there is value in not passing utter garbage to another
program (it has a tendency to clog mail queues, if for no other
reason) - just make sure you do it right.

I might add that the same goes for names. People don't just have a
first name and a last name - some people just have one name, some
people have three or four names, some people have surnames with
spaces, hypens, or apostrophes (remember what I said about SQL?!),
etc. Yet most systems I see assume people have two names with no
spaces, apostrophies, hyphens, etc. Big mistake. And don't get me
started on addresses, which might have one address line, two address
lines, even 5 address lines, to say nothing that international
addresses may or may not put the "street" part first. It's certainly
not easily regex-able.

Okay, I'll step off the soap box and let the next person holler about
how I was wrong about all this!

Joel Maslak wrote:

is not. But there is value in not passing utter garbage to another
program (it has a tendency to clog mail queues, if for no other
reason) - just make sure you do it right.

I fail to see why you wouldn't be able to throttle any abuse of your webform so it wouldn't clog a mail queue. Besides it's very hard to clog or otherwise overload an MTA, since it's purpose built to handle that kind of thing.

I also fail to see why it would be so hard to install an MTA listening on localhost which sole purpose would be to validate email addresses and nothing else. And just dumps any possible outgoing email to /dev/null.

If you're afraid of clogging the mail queue then only hand it off to the sending MTA after validation succeeded. But to be honest why would you care? MTAs are purpose made to handle such things.

I can't really think of a scenario where validating an email address using a separate service would create such a performance bottleneck. If you have robots flooding your web forms 1000s of times a second (still peanuts for the average MTA) you need to rethink your security and abuse prevention...not your email validation...I would say. :slight_smile:

People us a separate database instance for database queries, the database server has its own code to validate input. We don't code our own database server as part of the web form handling code. Why not hand of email validation the same way?

Okay, I'll step off the soap box and let the next person holler about
how I was wrong about all this!

You're mostly right, but I disagreed about the email validation part. I just don't see a point in re-inventing the wheel when there are perfectly capable free alternatives that can do it for you with no noticeable performance penalty.

Greetings,
Jeroen