Preventing Contact Form Spam with PHP

Gone are the days of posting email addresses on websites with “mailto:” for spiders to pick up. I typically use a contact form instead, or sometimes encode the email address with special characters. But a new problem has arisen with forms – form spam. From what I can tell, almost all of the spam appears to be from automatic programs trying to obtain links, i.e comment spam.



The messages typically have some random text with copious amounts href or [url] tags sprinkled in. What I have come up with is pretty simple, and could easily be defeated with a little ingenuity. However, it seems to do the trick in stopping the vast majority of contact form spam.

Since I am dealing with contact forms, I don’t expect any tags to be submitted through the form. I suppose it is possible someone would have a legitimate reason to submit a tag, but I haven’t seen it happen yet. Therefore, I check each field to see if any href or [url] tags exist. I also delineate fields that should be one line, like email and name, versus fields that can contain multiple lines, like a comment text box. I have noticed the spammers typically use new line characters, and then insert the link payload. A normal person would not have newline characters in an input textbox – so I don’t allow a form to go through if contains them. Enough of the talk, here is the code. It is for a simple form with two text inputs, email and name, and textarea, comments:

[php]
$singleLine = array("\\n","[url=","href=");
$singleFields = array("name","email");
$multiLine = array("[url=","href=");

foreach($singleLine as $needle) {
foreach($singleFields as $field) {
if(strpos($_POST[$field],$needle) !== false) {
echo 'There was a problem sending your email, please try again';
exit();
}
}
}

foreach($multiLine as $needle) {
if(strpos($_POST['comments'],$needle) !== false) {
echo 'There was a problem sending your email, please try again';
exit();
}
}
[/php]

It isn’t very sophisticated, but it seems to work…. One other thing, checking for the newline characters in general is good practice, especially for the email address. If you use php’s mail function, and put an unfiltered email address in the header, you are opening yourself up to email injection problems (where spammers use new line characters to add cc: and bcc: recipients to your email).

Happy spam blocking!

Posted In: PHP | 2 comments

Regular Expressions, PHP and Newline Characters

Ahhhh, regular expressions. They are so handy, but can be such a pain in the ass to use. While coding up a basic script to do a small amount of screen scraping, I remembered a problem I encountered a couple of years ago; one that I was unable to solve at the time. It involves using regular expressions to match data in a string with newline characters. For the initiated, newline characters ("\n" on *NIX) create multiple lines . I.E., the newline character tells the browser or software program to begin a new line. In PHP, you can use echo "\n" to create a new line in the browser output (would not be viewable on the screen, only when viewing source), which can be handy when you are iterating through an array and spewing out lots of data to the screen.

Back to the problem at hand! if you are using PHP's PCRE (Perl Compatible Regular Expressions - i.e. preg_match) to match text, you need to realize that the pattern will only match on a single line, even if you pass in a string that contains many lines.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
echo 
$string;
?>

The above code shows this. If you look at your browser, you will see the bold text. If you view source, you will see separate three lines.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
preg_match('|<div>.*</div>|',$string,$matches);
print_r($matches);
?>


If we try to match the entire div like the code above, it fails and outputs an empty array. This is because the regular expression is only looking at the first line in the string, and therefore does not see the div closed out in the third line.
<?php
$string 
"<div>\\n<b>This is the second line</b>\\n</div>";
preg_match('|<div>.*</div>|s',$string,$matches);
print_r($matches);
?>

The code above adds a trailing option. There are a variety of trailing options available, but the "s" above (at the end of the pattern) tells the regular expression to make periods match any character, including newline. Now, $matches[0] contains the code for the entire div.

Posted In: PHP, regular expressions | 18 comments

Javascript, Tables & InsertBefore

I was trying to dynamically add rows to a table and ran into a problem with both Firefox and Internet Explorer. I had a table with a couple rows of data and there was a link to add a new row at the bottom of the table (in its own row).



The javascript simply cloned the first row, cleaned out some values from the td’s and inserted the new row into the table before the last row (i.e., the inserted row would be the second to last row) . That's where the problems arose. In firefox, table.appendChild(newRow) functioned fine, but when I tried table.insertBefore(newRow,lastRow) it threw a DOMexception saying it could not replace the child node because it didn't exist. In Internet Explorer, I could use appendChild or insertBefore. It turns out, you need to append and or insert to the tbody html element, instead of the table element. A real basic example:

<html>
<body>
<table>
<tbody id="useThis">
<tr id="row1"><td>Row One</td></tr>
<tr id="row2"><td>Row Two</td></tr>
</tbody>
</table>
<script type="text/javascript">
//clone row1
var newRow = document.getElementById('row1').cloneNode('true');
//get row2
var row2 = document.getElementById('row2');
//add cloned row1 one before row2
document.getElementById('useThis').insertBefore(newRow,row2);
</script>
</body>
</html>

Posted In: JavaScript, firefox, Internet Explorer | 6 comments

Zend Framework 0.2, Incubator, RewriteRouter and FontController

After time away from the Zend Framework, I dove back in yesterday to play around with 0.2 version. I decided to use the incubator, as there as some fairly significant changes in the works, and I wanted to see what was in store for later releases. By setting your include path to check the incubator directory first in your index.php file, you can make use of the newer classes:
[php]
set_include_path('/__shared/incubator/library' . PATH_SEPARATOR . '/__shared/lib');
[/php]



I copied over the general structure from another site that was using v0.1.5 and immediately ran into problems with the controller. Using the incubator means I can't rely on the manual, so I searched through some emails from the Zend mailing list and found some new code instantiating the FrontController and RewriteRouter (FYI - the emails are archived, and this is an extremely useful resource to search if you are having trouble and can't find the answers in the manual).

I hadn't used the RewriteRouter before and I must say it is nice. Before, I was employing hackish code to simplify my urls, like
[php]
//for url www.example.com/article/view/article-about-zend
$params = $this->_getAllParams();
$url = key($params); //$url is now 'article-about-zend'
[/php]

Now, I can use the RewriteRouter in index.php to simplify my urls and code:
[php]
//for url www.example.com/article/article-about-zend
$router = new Zend_Controller_RewriteRouter();
$article = new Zend_Controller_Router_Route(
"article/:url",
array( "url" => null,
"controller"=>"article",
"action"=>"view"
)
);
$router->addRoute("article",$article);
[/php]

And in my controller method (article controller, view method per the code above), grabbing the params is easy, because the ":url" sets it up:

[php]
$url = $this->_getParam('url');
[/php]

I still have some questions about the RewriteRouter, but I'm sure I'll figure them out as I continue to use it. There are a bunch of other neat classes I am playing with right now, like the caching and logging. I am really impressed with the framework so far and look forward to using other parts of it!

Posted In: PHP | No Comments

Google Analytics & Secure Connections with HTTPS

I ran into a small problem with Google Analytics and secure connections. I worked with a company to get a secure certificate installed and noticed that the Google Analytics tracking code placed on the pages was calling a script from http and not https. This raised an alert in Internet Explorer asking if you “want to display non-secure items in the page”. Firefox also had a red icon, indicating everything was not secure on the page.



A quick search on Google turned up nothing useful, so I visited Google Analytics and quickly found the answer to my problem - http://www.google.com/support/analytics. The instructions are straightforward and you can place the code on non-secure and secure pages. Looking at the new tracking code and both JavaScript files on Google’s server, it looks like the only difference is the tracking code calls the script over https. This leads me to believe you could only update the secure pages on your site with the new tracking code and be fine.

Posted In: firefox, Internet Explorer, Google Analytics | No Comments

Gmail Not Working

I have been having problems accessing my Gmail account this morning. Mostly, I haven't been able to bring up the webpage to login. I did log in and send a couple of emails before it went down again. I have another email account, so it isn't that big of a deal. In fact, I kind of like it in a perverse sort of way - knowing the the omnipresent Google has problems with their website and web applications. Although I don't think a client would accept that as an excuse if one of my applications goes down...



One area of concern is how much of your information is stored with Google, or any other company. As more and more applications and data move to the Internet, service outages become a big deal. Not being able to make a purchase from Amazon for a couple of hours typically doesn't halt your day. But not having access to email, your calendar and contacts can make life difficult. Even though Gmail is a free service, people expect it to work and rely on it for communication.

Update: Gmail is working fine and now other sites are not coming up. I guess it must be my internet connection or something else strange. I can't ping a couple of sites or bring them up in either IE or Firefox, although the vast majority of other sites come up without a problem. I logged into a server in California and routed my web browser through that, and the sites are coming up fine. My girlfriend can see them at work in Seattle, so something is amiss with my connection and not the websites. If it persists, I'll have to call Speakeasy and find out what the problem is.

Posted In: Uncategorized | 4 comments

Fedora Core 6

Fedora Core 6 was released the other day and seems like a lot of people are downloading it. At least that is the impression I get, as the fedora site was down for a while and has now been up in an altered state for the last couple of days.

Some cool new things in 6 (like the completely unproductive yet incredibly cool Comipz - see here on Ubuntu), but I’m not sure if I will upgrade just yet. I have to buy a server for my office though, and when that happens, I will purchase the disk. For some reason I find it easier to just buy the CD’s rather than downloading the image. My desktop doesn’t have a DVD player, so I needed CDs the last time. And I feel somewhat good about buying it, since a portion of the money goes to the Linux causes. It ain’t cheap for the companies that provide the bandwidth to download the image.

Posted In: Fedora Core 6 | No Comments

Testing for IE7

With IE7 about to be pushed out to millions of computers, I realized I need to test some sites and make sure everything looked okay. But I don't want to lose IE6 in the process. I found a stand alone version of IE7 RC1. For some reason, I am not too worried about Firefox 2.0. They haven't made huge changes in CSS, which is what scares me about IE7. Surprisingly, all the sites I have built recently looked fine in IE7. I haven't employed any IE specific hacks, mostly just adding extra divs or other unnecessary markup.



While researching this, I also stumbled across IEs4Linux. I haven't had a chance to install it yet, but will in the next couple of weeks. Now, if the developer adds IE7 to the list, that is one less reason for me to fire up my Windows box.

Posted In: firefox, Internet Explorer | No Comments

Syncing Thunderbird Contacts and Calendar Across Multiple Computers

I found a web page today that talks about syncing your contacts and calendar (via the lightning extension) through the use of extra folders on an IMAP email account. I installed the lightning extension and the synckolab extension and tried to get going but ran into the same error on both windows and linux - Mailaccount “not found”. Lightning is a fairly new extension and the Sync Kolab page says it was just made to work with Lightning – apparently there is more work to be done...



I guess I'll look at Google Calendar, although I still need to figure out a way to sync my contacts. Maybe I should just install SugarCRM and use one of the extensions for Thunderbird. Probably overkill for my needs, but it might be the best option.

Posted In: Uncategorized | 1 comment

301 Redirects with Apache .htacess file

I've just spent more time than I care to in the last couple of days trying to figure out rewriting urls with Apache, both through .htaccess files and directly in the conf file. Today was relatively painless, as I just had to come up with a way to 301 redirect all “extra domains” and some specific pages. For good measure, I also wanted to redirect non “www” domains to the canonical domain and also allow for future, yet unknown domains to be redirected. Here is what I came up with for an .htaccess file:



RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.yourdomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [R=301]
redirect 301 /log.php http://www.yourdomain.com/fog.php

Pretty simple – the RewriteCond looks for any domain other than the one specified and upon matching, invokes the RewriteRule. The RewriteRuule simply uses a 301 redirect to the domain. The last line just redirects to a specific page. Looking at the output of Live HTTP Headers for firefox I noticed that it issues two 301 redirects if someone where to type http://www.extradomain.com/log.php. One redirect to get the proper domain, http://www.yourdomain.com/log.php, and then a second redirect for the actual page. Not a big deal, but I thought it would handle it all in one redirect since I wasn't using the L flag. I know, I have a lot to learn when it comes to mod_rewrite...

On a side note, why is it soooo diffucult to get 301 redirects in place from other people. Domain registers seem to only use 302 redirects and whenever I request a 301 redirect for a domain from another company, I either get silly responses like, “You can use the PHP header function to do a 301 redirect” (these are sites with a lots of individual html and php files, not a app running with a front controller) or it ends up being a 302 redirect.


While I am complaining, I might as well mention the bad behavior I experienced while cooking up some redirects. On my Fedora Core 5 desktop, I edited my /etc/hosts to include the domains and then set up a virtual host with aliases for the domains on my Apache server – this allowed me to use my browser and test the redirects locally with Live HTTP Headers. It seemed that the redirects would occasionally get cached somehow? Not sure whether it was my browser or on the server, but I was changing the .htaccess file and the changes would not take effect. If I cleared my browser cache/cookies AND restarted Apache, things seemed to clear up. But I thought Apache read .htaccess file on every request, and thus, changes should instantly take effect? If it was Firefox, then why didn't clearing the cache/personal data have any reliable effect? Maybe it was my desktop? Who knows...

Posted In: Linux, Fedora Core 5, Apache | 2 comments