In my last post, I refreshed your memory about how whitelists are a great way protecting very focused Java web applications like financial systems. They’re not so great if you need to allow a wide range of input types, maybe even to the point of allowing customers to enter HTML tags.
So what about PHP applications?
If a whitelist won’t do, you could write a module to filter all possible types of input, but you’d need to:
- Identify all safe tags you want to include (to maintain the spirit of a white list) or abandon that principle and try to screen out only the malicious codes. Which is almost impossible.
- Write code to enforce the white list or black list that you built.
- Find sources of information about new vulnerabilities like the Common Vulnerabilities and Exposures (CVE) list and monitor those on a regular basis.
- Update your code as needed to protect against new threats.
In other words, you could do it, but a) your users would hate you for constraining them or b) you’d be forever tweaking it. We need a tool that’s flexible and easily updated by someone who’s focused on that effort.
We’re lucky. There’s already a great, well-maintained product that’ll do this for our PHP applications. It’s called HTMLPurifier.
How to Get Started with HTMLPurifier
The HTMLPurifier project’s downloads page offers three distinct distributions. I picked the Standalone Distribution, because from a coding perspective, it’s a little easier to implement. That’s not to say that the Standard Distribution or the Lite Distribution are a pain to implement. They aren’t! But the Standalone Distribution packages what you’ll need in a single PHP file (mostly!). I like simple implementations.
Speaking of implementation, the HTMLPurifier download page includes easy to follow installation instructions. The short version is: I downloaded the file onto my Linux Server. I used Red Hat Enterprise Linux 7.2 this time around, but I’ve used HTMPurifier under other Linux distributions, too, like Linux Mint and Ubuntu Server. Once downloaded, I extracted the files in my Downloads folder from a Terminal session:
cd Downloads tar -zxvf htmlpurifier-4.8.0-standalone.tar.gz
My web directory’s here in /var/www/html. I wanted to copy HTMLPurifier to my web directory, so I:
sudo mkdir /var/www/html/htmlpurifier sudo cp -r htmlpurifier-4.8.0-standalone/* /var/www/html/htmlpurifier
The installation instructions mention changing some default permissions, so I ran:
sudo chmod -R 0755 /var/www/html/htmlpurifier/standalone/HTMLPurifier/DefinitionCache/Serializer
At this point, unless a typo stops me, I’m ready to start using HTMLPurifier.
HTMLPurifier in Action
I’ll use an example similar to the one I built for the previous blog post. I’ll use an HTML form to prompt for two fields. We’ll use HTMLPurifier on one and just pass through the other. This probably looks familiar to you:
<form name='input_03' method='POST' action='/asfg_protectinginput_php/input_03_results.php'> <table width='50%'> <tr> <td width="25%">Enter HTML to be scrubbed:</td> <td width="75%"> <input type="text" name="scrubbedyes"/> </td> </tr> <tr> <td width="25%">Enter HTML <b>not</b> to be scrubbed:</td> <td with="75%"> <input type="text" name="scrubbedno"/> </td> </tr> <tr> <td width="25%"></td> <td width="75"> <input type="submit" value="Test it!"/> </td> </tr> </table> </form>
Notice that the form above posts to a module called input_03_results.php. Here’s some of the important code:
include('htmlpurifier/HTMLPurifier.standalone.php'); $strScrubbedyes = $_POST['scrubbedyes']; $strScrubbedno = $_POST['scrubbedno']; $purifier = new HTMLPurifier(); $strCleanerHTML = $purifier->purify($strScrubbedyes);
Remember how I chose the Standalone Distribution? I did so because no matter how I want to use HTMLPurifier, I just need that one include statement (for HTMLPurifier.standalone.php). That’s easy!
The next two lines store the HTML form fields to PHP variables.
The two lines after that show HTMLPurifier in action. I create the new HTMLPurifier() object, then invoke its purify method and pass in the variable $strScrubbedyes. That’s really all there is to using HTMLPurifier. No fuss, no muss!
Later in input_03_results.php, I display the results:
<table width='50%'> <tr> <td width="25%">Here's the scrubbed HTML:</td> <td width="75%"> <?php echo $strCleanerHTML; ?> </td> </tr> <tr> <td width="25%">Here's the HTML that was <b>not</b> scrubbed:</td> <td with="75%"> <?php echo $strScrubbedno; ?> </td> </tr> </table>
If I enter something like “<b>Hi there!</b>” in both fields on input_03.php, input_03_results.php will display something like Hi there! for both the scrubbed and unscrubbed fields. However, if I try to enter nefarious code like “<SCRIPT>alert(‘Owned!’); </SCRIPT>”, the scrubbed HTML will be blank. If you’re running this with Chrome or MacOS Safari, the unscrubbed field will also look empty, but if you use “Show Page Source,” you’ll see that the script code actually got through. And if you use Firefox, you’ll actually see the alert dialogue box. That’s because both Chrome and Safari have implemented some anti-Cross Site Scripting (XSS) protections.
That’s an interesting tangent: if browsers are going to implement XSS protections, why should you? Putting aside for a moment the fact that Firefox (and Microsoft Edge) doesn’t implement it, that’s not a bad question. From the perspective of security, there is an answer: Defense in Depth. XSS is a seriously dangerous attack type. Even if some browsers offer some level of protection, there’re still some drawbacks:
- Not all browsers offer protection.
- Even if they did, it’s difficult to quantify the level of protection, and it could vary by browser
- That gray area could hurt your customers — and your reputation
Methods to protect your code (like HTMLPurifier) still need to be tested and quantified, but once you establish a baseline, you can control it to some extent by not deploying upgrades until you can test them. Solutions like HTMLPurifier offer robust functionality, are lightweight, and don’t impede user functionality, so there’s not much of a cost for providing the extra protection. So, the question really is, why wouldn’t you use something like HTMLPurifier?
We’re lucky to have tools like HTMLPurifier that are available and well-maintained. Back in the Old Days (you know — when the Internet was young, and people actually used Internet Explorer*), we’d be on our own. We’d have to code something like that from scratch, and there was a lot less how-to information around. We’d end up annoying customers and not writing effective code. We truly live in a golden age!
Of course, as developers, that means that we don’t have any excuse not to protect our web application’s input! So, what are you waiting for?
Are you using HTMLPurifier or similar? What’s your experience? Are you not using anything? Why not? Let me know in the comments!
* I am not ashamed to say that I still have my free Internet Explorer 3.0 t-shirt from Microsoft. That back says, “I DOWNLOADED. MIDNIGHT, AUGUST 13, 1996.” The Midnight Madness logo on the front still glows in the dark! Ahhh, good times. Good times.
by Terrance A. Crow
Terrance has been writing professionally since the late 1990s — yes, he’s been writing since the last century! Though he started writing about programming techniques and security for Lotus Notes Domino, he went on to write about Microsoft technologies like SQL Server, ActiveX Data Objects, and C#. He now focuses on application security for professional developers because… Well, you’ve watched the news. You know why!