regular expressions

Regular Expressions, PHP and Newline Characters

Ahhhh, regular expressions. They are so handy, but can be such a pain in the ass to use. While coding up a basic script to do a small amount of screen scraping, I remembered a problem I encountered a couple of years ago; one that I was unable to solve at the time. It involves using regular expressions to match data in a string with newline characters. For the initiated, newline characters ("\n" on *NIX) create multiple lines . I.E., the newline character tells the browser or software program to begin a new line. In PHP, you can use echo "\n" to create a new line in the browser output (would not be viewable on the screen, only when viewing source), which can be handy when you are iterating through an array and spewing out lots of data to the screen.

Back to the problem at hand! if you are using PHP's PCRE (Perl Compatible Regular Expressions - i.e. preg_match) to match text, you need to realize that the pattern will only match on a single line, even if you pass in a string that contains many lines.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
echo 
$string;
?>

The above code shows this. If you look at your browser, you will see the bold text. If you view source, you will see separate three lines.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
preg_match('|<div>.*</div>|',$string,$matches);
print_r($matches);
?>


If we try to match the entire div like the code above, it fails and outputs an empty array. This is because the regular expression is only looking at the first line in the string, and therefore does not see the div closed out in the third line.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
preg_match('|<div>.*</div>|s',$string,$matches);
print_r($matches);
?>

The code above adds a trailing option. There are a variety of trailing options available, but the "s" above (at the end of the pattern) tells the regular expression to make periods match any character, including newline. Now, $matches[0] contains the code for the entire div.

Posted In: PHP, regular expressions | 15 comments

SEO Friendly URL's with PHP

I had to come up with a quick little snippet of code to create some SEO friendly urls for dynamic content. What constitutes a SEO friendly url? Basically, just a url with a bunch of words and dashes, nothing to scientific about that. But it does make sense, as the url of this page will have a better chance of ranking well than a corresponding url of www.robertlbolton.com/articleId=21.

I could have hunted through Word Press’s code to find what they use, but I figured it would be quicker to just write it myself. I was trying to get away with using just one regular expression, but couldn’t make it happen. Without further ado,

$temp = “This blog is really lame, and I mean lame!”;
$temp = preg_replace('/(\W){1,}/','-',$temp);
$temp = preg_replace('/-$/','',$temp);
echo $temp;


Here is the regular expression breakdown:

'/
(\W) //match any non-word character
{1,} //match it one or more times (greedy matching)
/'


The second regular expression takes care of the trailing dash at the end that results when the string has spaces or non-word characters like exclamation points at the end. I think you could use a lookahead in the first reg exp to detect the end of the word and then not add a dash, but that is above my head.

Posted In: PHP, regular expressions | 4 comments

Calculations in JavaScript

I just finished creating a “value” calculator for a company using JavaScript. It computed the overall value a person would save by joining the organization – nothing fancy, just a couple of input boxes, some simple equations and 10 values to display. What was tricky, at least for a JavaScript neophyte like me, was getting the numbers in the correct format - both on output and input.

For instance, something as trivial as grabbing the number from an input box and using in a calculation took me some time to figure out. First, I didn’t want to limit what the user could put into the text box. Ie, I wanted to allow “$30,000”, “30,000” , “30000” or “ 30,000.00” and not force the user to enter some contrived number for the sake of computational ease. After some mucking around, I came up with this:

// get the value from the form field
var cop = document.getElementById('cop').value;
//strip out all non-numbers, except a decimal and turn into a float
cop = parseFloat(cop.replace(/[^.0-9]/g,''));
// if it still isn’t a number (user had nothing in the field), make it 0
cop = isNaN(cop) ? 0 : cop;


I remember when I was very new to JavaScript and programming in general, and didn’t know to use parseFloat and parseInt. If you forget to do that for a value you have pulled from a input box, JavaScript will typically concatenate the value, as it treats it as a string:


// num = 6 (as a string)
Var num = document.getElementById('cop').value;
//outputs “65” instead of 11
alert(num + 5);


Next, I had to display the damn thing like this “$30,000.06”. I started with a regular expression I have used in PHP before: /(?<=\d)(?=(?:\d\d\d)+$)/ . But javascript complained about “?<=” which is a positive lookbehind. Apparently, JavaScript doesn’t support it. Digging into my books, I found toLocaleString(), which seemed to meet my criteria. In fact, it works without a problem in Internet Explorer (at least 6.0), but it exhibits some strange behaviors in Firefox. In Firefox, it doesn’t display the “.00” if the number doesn’t already contain it and it will only display one decimal place without the trailing “0”. It wasn’t a major problem, but an annoyance nonetheless. I tried using toFixed(2) and then toLocaleString(), but it still didn’t work in Firefox and rather than writing extra code, I let it be.

One final note. Using the CSS selector :disabled doesn’t work in Internet Explorer. I wanted to hide input boxes I used to display the numbers and disabled them. But in Internet Explorer, it ignores any “font-color” declarations and outputs grey text, so it will look different than the rest of the text. I just hide the input boxes, and left them as enabled…

Posted In: JavaScript, regular expressions | No Comments

Credit Card Validation with JavaScript and Regular Expressions

Continuing from the last post, for the javascript I checked the credit card number of form submittal, (along with a lot other things, like required fields, correct phone numbers, etc… ) and if it isn’t correct, throw an alert and prevent the form from submitting (Ignore the PHP tags, haven't gotten around to adding a generic code highlighting plug yet...).



[php]
var ccType = document.getElementById('credit_card').value;
switch(ccType) {
case 'AMEX':
var goodCC = /^3[47]{1}[0-9]{13}$/;
break;
case 'Visa':
var goodCC = /^4[0-9]{15}$/;
break;
case 'Mastercard':
var goodCC = /^5[1-5]{1}[0-9]{14}$/;
break;
case 'Discover':
var goodCC = /^6011[0-9]{12}$/;
break;
default:
var goodCC = /^[0-9]{15,16}$/;
}
if(!goodCC.test(ccNum)) {
problem += "Please enter in a valid credit card number (only use numbers)\n";
}
}
if (problem != ‘’) {
alert(problem);
//code to stop the form from submitting
}
[/php]

Posted In: JavaScript, regular expressions | No Comments

Credit Card Validation In PHP With Regular Expressions

I had to do some credit card validation in both JavaScript and PHP recently. Looking around the net, I found some decent information but no scripts that fit the business rules I was given. There are a lot of different ways to approach validating a credit card. You can go with the loose rule of making sure it is only numbers and between 15 and 16 characters (or only numbers if you are accepting a wide array of cards). You can make the user select the card, and then make sure the number matches a regular expression for that type of card. That is basically what I did:
$credit_card holds the type of card the user selected, and $cc_number has the credit card number the user entered. It then runs a switch statement to assign a regular expression based on the card, and has a default of 15 to 16 digits if there is no card match.

[php]
switch($credit_card) {
case 'AMEX':
$goodCC = '/^3[47]{1}[0-9]{13}$/';
break;
case 'Visa':
$goodCC = '/^4[0-9]{15}$/';
break;
case 'Mastercard':
$goodCC = '/^5[1-5]{1}[0-9]{14}$/';
break;
case 'Discover':
$goodCC = '/^6011[0-9]{12}$/';
break;
default:
$goodCC = '/^[0-9]{15,16}$/';
}
if (!preg_match($goodCC,$cc_number)) {
$errors = 'Your credit card number does not seem to match the card type you selected';
}
[/php]

Posted In: PHP, regular expressions | No Comments