Wikipedia describes Regular Expressions like this: “In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.”. That’s almost as clear as a fresh glass of one of the finest Belgian beers
In this post I’ll try my best to get you familiarized with the basics of Regular Expressions.
What?
Regular Expressions (regex or regexp) are a very powerful “tool” that’s available is most common programming languages (PHP, Actionscript, Javascript, C#, …) but can be quite complex to grasp. Simply put: It’s a unified way of finding or replacing complex parts of a string. Well, it can be much more complex than that, but I guess this is the most common reason why we use regex.
How?
Some smart bloke just decided to use patterns to define Regular Expressions. These patterns are predefined and you just got to learn them by heart (yeah, of course… there are tools available, more on that later). Some pattern examples:
\d searches for a digit
[a-z] searches for a single character in the range from a to z
(a|b) searches for the string “a” OR the “b”
a{2,4} searches for at least 2 “a” characters next to each other and maximum 4 (for example: “a” and “aaaaa” won’t match, but “aa”, “aaa” and “aaaa” would)
E-mail validation
Although the examples above are quite clear, the whole concept of Regular Expressions might still seem a little vague to you. So to fully explain regex I think it’s better to give you a real life example of probably the most common use of regex: e-mail validation. Emails addresses always have a similar structure: <part.one>@<part.two>.<tld> (I deliberately used a point in the first two parts because it can contain a point, but it might as well can contain a dash or an underscore). This could be a good example of a regex for e-mail validation:
([a-zA-Z0-9_\.\-])+@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+
If you’re new to regex you might find the code above pretty unreadable, but I think it’s probably the most readable version of a regex for e-mail validation.
This one is not:
/^\S+@\S+\.\S+$/
Now you know that there always are multiple solutions to get the same effect, but don’t let that scare you off, let’s go trough the first example step by step:
- ([a-zA-Z0-9])+ will match any character in the range from a to z and A to Z (case sensitive) and 0 to 9. Because of the + after the brackets it will fetch multiple chars (unlimited number).
- ([a-zA-Z0-9_\.\-])+ to allow an underscore, point and dash, you just add these characters (notice the backslash, this is used because a point and dash have a predefined regex meaning too).
- @ this is the actual @ character from the e-mail address
- (([a-zA-Z0-9\-])+\.)+ same here, the a to z, A to Z and 0 to 9 allow all chars in that range plus the dash and at the end a point. Notice the plus at the end, with this the previous group can be matched multiple times.
- ([a-zA-Z0-9]{2,4})+ the last bit, same story a to z, A to Z and 0 to 9 but in stead of unlimited chars it’s limited by 2, 3 or 4. That’s done with the curly brackets.
Now you know how it’s done, but it’s going to take a while to get used to and even longer to write them yourself. When you are making Regular Expressions, you often look at the result and are almost amazed that you wrote them yourself, because they look so complex when they are done.
The idea of regex is simple, but you have to write them one step at the time. That’s the same when you are learning regex: one step at the time. First you’ll be reading them, understanding them, making some modifications of your own, until you end up writing them yourself. It’s very important that you know exactly what everything does, for example the curly braces (to limit the previous found element) or the dash (used to find a range of chars). To learn the pattern syntax, check out this page or Google for regex examples.
Tools
There are numerous tools available to make Regular expressions. One of the best is this RegExr app by gskinner. He made an Adobe AIR desktop app out of it too.
Real life examples (Actionscript, Javascript and PHP)
As I said before, there are always multiple solutions for making a regex. Below some real-life examples on how to use e-mail validating Regular Expressions in AS3, JS and PHP:
AS3
private function validMail(mail:String):Boolean {
var emailExpression:RegExp = /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/i;
return emailExpression.test(mail);
}
Javascript version:
function isEmail(v) {
return /^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/.test(v);
}
PHP version:
function isEmail($email) {
$regex = '/^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+/';
preg_match($regex, $email, $isEmail);
return (bool) !empty($isEmail);
}