Tuesday 12 February 2013

Centralising Regular Expressions

Regular expressions are exceptionally useful and learning how to read and write them is an invaluable skill set. When creating a website you will often find yourself using the same expression in multiple locations. For example, on one website I help maintain the username is validated on four separate scripts. Without centralisation if we wanted to change the rules for the username we would have to open all four scripts and update the expressions to the new rules. With centralised expressions, we can open one script and change one expression to have the changes apply throughout the site. And we don't have to remember from which locations the username is being validated either, we just need to know we are using the correct method to validate it. I'm sure no one needs convincing of the pros of centralisation though so here is how I have centralised the regular expressions.

The following class is contained within a file called 'regex.php':

class RegEx {
 
    /*** USERNAME ***/
    public static function username()
        {
            return "/^[a-zA-Z0-9](?=[\w\-.]{5,19}$)[\w\-]*\.?[\w\-]*$/i";
        }
  
    /*** EMAIL basic email checker ***/
    public static function email()
        {
            return "/^[\w][\w.-]+?(\w@)[\w\-]+\.[A-Za-z]{2,10}(\.[\w]{2,5})?$/";
        }
  
    /*** DATE FORMAT dd-mm-yyyy where - can also be / or . ***/
    public static function dateFormat()
        {
            return "~^[0-9]{1,2}[./-][0-9]{1,2}[./-]([0-9]{2}|[0-9]{4})$~";
        }
  
    /*** CURRENCY FORMAT d+.dd ***/
    public static function currencyFormat()
        {
            return "/^\d+\.\d{2}$/";
        }

    /*** FULL POSTCODE ***/
    public static function postcode()
        {
            return "/^(?=[a-z0-9 ]{6,8}$)[a-z]{1,2}\d{1,2}[a-z]? \d[a-z]{2}$/i";
        }
}

If we find ourselves needing a regular expression we include the file:

if(!class_exists('RegEx'))
    {
        include 'path/to/regex.php';
    }

Because the function is a static one we do not need to create an instance of the class. So lets say we are checking the username, what we would write is:

if(preg_match(RegEx::username(), $u_name))
    {
        //succeed
    }

If you need a new expression you simply create a new function of the format shown.

Pretty straightforward really.

Saturday 9 February 2013

Regular Expression Length Validation

There's no need to use a regular expression as length validation if length validation is the only thing you are doing. For that, you would use strlen(). However, you will likely find yourself faced with length and content validation at the same time. A primary example of this would be username validation. Let's set out some really basic rules for a username:
  • Must be between 3 and 20 characters in length.
  • Must start with a letter and must end with a letter or number.
  • Can only contain one period or hyphen.
  • Other than a period or hyphen, it must only contain alphanumeric characters.
These conditions could be achieved by using strlen to check the length and then a regular expression to check the other conditions.

Strlen and preg_match
if(strlen($username) >= 3 && strlen($username) <= 20)
   {
        if(preg_match("/^[a-z][a-z0-9]*[.-]?[a-z0-9]*$/i", $username))
            {
                 //continue
            }
   }

However, since you are already using a regular expression, you may as well add length validation to the pattern.

To do this, you need to use a lookahead at the beginning of the pattern. As we have a specific first character requirement we will place the lookahead after the first character class. This is an optimisation. If we did the lookahead straight away and then checked the first character to find it did not match we would have wasted some time processing a lookahead. If we had checked the first character straight away the pattern would have not matched and the lookahead would not have been processed. A small optimisation but one nonetheless.

To match all the username conditions above, including the length, with one regular pattern we would do the following.

Just preg_match
if(preg_match("/^[a-z](?=[a-z0-9.-]{2,19}$)[a-z0-9]*[.-]?[a-z0-9]*$/i", $username))
    {
         //continue
    }

Lookahead's are never captured but if they do not match the regular expression will stop. The lookahead here is checking to see if the allowed characters are repeated between 2 and 19 times (we've already matched the first character) and then the string ends. If the string is not repeated at least 2 times or is repeated more than 19 times the lookahead will fail and therefore the match will fail. And that is length validation.

A breakdown of the expression:
/
^                      #start of string
[a-z]                  #match a single letter
(?=                    #lookahead for:
    [a-z0-9.-]{2,19}       #match the allowed characters between 2 and 19 times
    $                      #end of string
)
[a-z0-9]*              #match a letter or number between 0 and infinity times
[.-]?                  #match a dot or a dash between 0 and 1 times
[a-z0-9]*              #match a letter or number between 0 and infinity times
$                      #end of string
/i                     #turn case insensitivity on

It's as simple as that.

Friday 29 June 2012

Noting Scripts

People in programming do not note adequately.

From scripts I have seen, people seem to note for themselves only. The noting is done to their experience level. They seem oblivious to the fact that someone else, with much less experience than them, might someday come along and need to tweak that script they were working on. Due to insufficient noting however, it takes them about four times as long as it normally would have done to make that alteration.

I'm no exception to this. My noting is far too often done for me. I often leave long gaps in my notes because, to me, it seems painstakingly obvious what is happening. Me a year ago would probably have quite some difficulty understanding what was happening though, and would end up wasting time having to figure it out due to these inadequate notes.

How Bad is it Really?

I'm writing this blog post because I came to add a simple addition to a class earlier and I ended up spending over an hour following three interlinking scripts about trying to understand what the previous programmer had done.

This previous programmer is my brother, business partner and mentor. I have worked with him for the past year and a half, and not only that but I learnt PHP from him before he even had a solid understanding of the language. After a few months we were teaching each other and even though we practically have identical coding standards, I still had to work out what he had done due to insufficient noting.

I did a task which should have taken 15 minutes to complete, not over an hour.

What I Suggest We Start Doing...

Use Long Hand and Use Suitable Explanations

Explain, in more than one or two words, what something is trying to achieve. I don't understand your abbreviations, and neither will the next person.

A note that reads "//chs item" above a switch statement is not wanted. A note that reads, "//switch to choose item based on the product ID (pid) sent via the GET", would be much preferred and is much more informative.

Use Large Paragraphs of Notes if Needed

If someone is about dissect your code to hopefully improve it, they are going to find it very handy to have an introduction explaining what it is you were trying to achieve.

Classes and functions are the main targets of this heading. I have written, and been presented with, functions that were over 300 lines long and classes spanning more than ten times that amount.

Unless you are there with them to explain what it all means, then they're going to waste time figuring that out. A short paragraph, five to ten sentences long perhaps, explaining the purpose of these large functions and classes would be very useful and would save much time.

Note Repeated Sections

Do not abstain from writing a note somewhere because 100 lines above it or in another script is an identical block of code. The next person to read your scripts isn't going to read them in the same order that you created them meaning that when they get to a 30 line block of code with no notes they won't know what it's purpose is.

Groups

If you have a group of functions all relating to the same purpose, group those functions together in a note. E.g.

// *** Ajax handlers for anonymous users *** \\
    function a(){}
    function b(){}
    function c(){}
    function d(){}
// *** End of Ajax handlers for anonymous users *** \\

Don't forget to note those individual functions too. I find the ending note necessary as it clearly separates the grouped functions from the non-grouped functions. You might find indentation enough?

Not only does plentiful noting help others learn your code, I have found it helps you in several ways too. Returning to old, forgotten scripts, is much more pleasant if you don't have to remember/learn what it was you did again. Bug finding is quicker because you can scan notes rather than scan code. And when learning something new, you can annotate the code to your standards. Who can understand you better than you?

So I reiterate, notes please!

What noting techniques do you adhere to?