Tuesday 12 February 2013

Centralising Regular Expressions

Regular expressions are exceptionally useful and learning how to read and write them is an invaluable skill set. When creating a website you will often find yourself using the same expression in multiple locations. For example, on one website I help maintain the username is validated on four separate scripts. Without centralisation if we wanted to change the rules for the username we would have to open all four scripts and update the expressions to the new rules. With centralised expressions, we can open one script and change one expression to have the changes apply throughout the site. And we don't have to remember from which locations the username is being validated either, we just need to know we are using the correct method to validate it. I'm sure no one needs convincing of the pros of centralisation though so here is how I have centralised the regular expressions.

The following class is contained within a file called 'regex.php':

class RegEx {
 
    /*** USERNAME ***/
    public static function username()
        {
            return "/^[a-zA-Z0-9](?=[\w\-.]{5,19}$)[\w\-]*\.?[\w\-]*$/i";
        }
  
    /*** EMAIL basic email checker ***/
    public static function email()
        {
            return "/^[\w][\w.-]+?(\w@)[\w\-]+\.[A-Za-z]{2,10}(\.[\w]{2,5})?$/";
        }
  
    /*** DATE FORMAT dd-mm-yyyy where - can also be / or . ***/
    public static function dateFormat()
        {
            return "~^[0-9]{1,2}[./-][0-9]{1,2}[./-]([0-9]{2}|[0-9]{4})$~";
        }
  
    /*** CURRENCY FORMAT d+.dd ***/
    public static function currencyFormat()
        {
            return "/^\d+\.\d{2}$/";
        }

    /*** FULL POSTCODE ***/
    public static function postcode()
        {
            return "/^(?=[a-z0-9 ]{6,8}$)[a-z]{1,2}\d{1,2}[a-z]? \d[a-z]{2}$/i";
        }
}

If we find ourselves needing a regular expression we include the file:

if(!class_exists('RegEx'))
    {
        include 'path/to/regex.php';
    }

Because the function is a static one we do not need to create an instance of the class. So lets say we are checking the username, what we would write is:

if(preg_match(RegEx::username(), $u_name))
    {
        //succeed
    }

If you need a new expression you simply create a new function of the format shown.

Pretty straightforward really.

Saturday 9 February 2013

Regular Expression Length Validation

There's no need to use a regular expression as length validation if length validation is the only thing you are doing. For that, you would use strlen(). However, you will likely find yourself faced with length and content validation at the same time. A primary example of this would be username validation. Let's set out some really basic rules for a username:
  • Must be between 3 and 20 characters in length.
  • Must start with a letter and must end with a letter or number.
  • Can only contain one period or hyphen.
  • Other than a period or hyphen, it must only contain alphanumeric characters.
These conditions could be achieved by using strlen to check the length and then a regular expression to check the other conditions.

Strlen and preg_match
if(strlen($username) >= 3 && strlen($username) <= 20)
   {
        if(preg_match("/^[a-z][a-z0-9]*[.-]?[a-z0-9]*$/i", $username))
            {
                 //continue
            }
   }

However, since you are already using a regular expression, you may as well add length validation to the pattern.

To do this, you need to use a lookahead at the beginning of the pattern. As we have a specific first character requirement we will place the lookahead after the first character class. This is an optimisation. If we did the lookahead straight away and then checked the first character to find it did not match we would have wasted some time processing a lookahead. If we had checked the first character straight away the pattern would have not matched and the lookahead would not have been processed. A small optimisation but one nonetheless.

To match all the username conditions above, including the length, with one regular pattern we would do the following.

Just preg_match
if(preg_match("/^[a-z](?=[a-z0-9.-]{2,19}$)[a-z0-9]*[.-]?[a-z0-9]*$/i", $username))
    {
         //continue
    }

Lookahead's are never captured but if they do not match the regular expression will stop. The lookahead here is checking to see if the allowed characters are repeated between 2 and 19 times (we've already matched the first character) and then the string ends. If the string is not repeated at least 2 times or is repeated more than 19 times the lookahead will fail and therefore the match will fail. And that is length validation.

A breakdown of the expression:
/
^                      #start of string
[a-z]                  #match a single letter
(?=                    #lookahead for:
    [a-z0-9.-]{2,19}       #match the allowed characters between 2 and 19 times
    $                      #end of string
)
[a-z0-9]*              #match a letter or number between 0 and infinity times
[.-]?                  #match a dot or a dash between 0 and 1 times
[a-z0-9]*              #match a letter or number between 0 and infinity times
$                      #end of string
/i                     #turn case insensitivity on

It's as simple as that.