Application Security PHP

Protecting Input: Implementing Whitelists in PHP


In my last post, I talked about implementing whitelists in Java. We finally got to see some actual code! Not actual cannibal SHIA LABEOUF level stuff, but kinda cool nonetheless. As promised, in this post, I’ll show you how to implement the same whitelist with the same philosophical underpinnings in PHP.

But first…

Why do we care about whitelisting again? Yeah, we all know sanitizing input’s a good idea. It’s become a mantra. But let’s say we want to save time and skip it. What could possibly go wrong?

Characters Have Meaning

Let’s say you have a database application, and let’s say you’re feeling pretty good because you’ve implemented something called parameterized queries on the database. Why? Because you know that parameterized queries protect your application against SQL Injection (we’ll talk about that in a future post). Yea you! Your database has an important layer of protection!

Your database application lets users contribute episode guides for science fiction series. Other users can search for information about the series, and you’ve built a nice little community. You’re starting to generate ad revenue!

Then, one day, Joe Evil logs into your site.

You know JavaScript, or you at least know that it often runs in browsers, right? Consider a simple code snippet like this:

<SCRIPT type="text/javascript">

alert('Hi there!');


This is a little bit of code that will display a pop-up with the message, “Hi there!” That’s harmless, right?

Don't let Joe Evil destroy your reputation. Implement whitelists to protect yourself!
Don’t let Joe Evil destroy your reputation. Implement whitelists to protect yourself!

But remember, Joe Evil is evil, so he decides to mess with you and your community. For an episode title, he enters this string, and you have no whitelist:

<SCRIPT type=”text/javascript”>alert(‘Hi there!’);</SCRIPT>

Except, he doesn’t stop with the JavaScript alert() function. He writes an entire snippet of code. The code could do something like steal cookie information for your domain; it could mine your data and send it to another website in a country that does not have an extradition treaty with the US (i.e., a country not on this list). If your application doesn’t have a whitelist and allows in the characters that define JavaScript code, your application might become a server of malware.

That would put serious downward pressure on your community’s trust! And your ad revenue.

Are there other ways to protect yourself? Sure! We could carefully consider the target output environment and encode our output appropriately. We can use other tools to go beyond whitelisting to intelligently accept some HTML statements and reject others. We’ll talk about these options in more detail later. But security is best in layers, and the first layer can be something like a whitelist.

Why do you implement a whitelist?

To help build your customer’s acceptance and trust. To make sure only the kind of data you approve of gets into your application. To delight your customers and attract new customers.

It’s easy to lose a customer over an incident like this. It’s a lot easier to prevent these incidents so your customers don’t leave.

Whitelist in PHP

Make sure your application sees no evil. Use whitelists!
Make sure your application sees no evil. Use whitelists!

The considerations for implementing a whitelist in PHP, like the kinds of suitable applications and the whitelist’s anatomy are the same as they were for the Java application. The implementation’s a bit different, though.

We’re start with an HTML form within a PHP file. That form has the following fields:

  1. transfer_description: A text-only description of the money transfer
  2. transfer_amount: A number-only field to hold the amount of the transfer
  3. the_submitter: The submit button

We’ll submit the form to a PHP file that includes this code:

$arrWhiteList = loadDefinitionsFromTable("input-01-php~index");

The application’s called input-01-php, and the form in question is named index.

The loadDefinitionsFromTable contains code like this:

$arrWhiteList = array();
switch ($strNameOfAppAndForm)
  case "input-01-php~index":
    $arrRow1 = array();
    $arrRow1['fieldName'] = "transfer_description";
    $arrRow1['minLength'] = "0";
    $arrRow1['maxLength'] = "20";
    // Notice that this includes a blank
    $arrRow1['charsAllowed'] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ";
    $arrRow1['nullsAllowed'] = "No";
    $arrWhiteList[] = $arrRow1;
    $arrRow2 = array();
    $arrRow2['fieldName'] = "transfer_amount";
    $arrRow2['minLength'] = "0";
    $arrRow2['maxLength'] = "20";
    $arrRow2['charsAllowed'] = "0123456789,.";
    $arrRow2['nullsAllowed'] = "No";
    $arrWhiteList[] = $arrRow2;
    $arrRow3 = array();
    $arrRow3['fieldName'] = "the_submitter";
    $arrRow3['minLength'] = "0";
    $arrRow3['maxLength'] = "6";
    $arrRow3['charsAllowed'] = "Submit,.";
    $arrRow3['nullsAllowed'] = "No";
    $arrWhiteList[] = $arrRow3;

This code snipped builds an array of arrays. The outermost array makes up all of the rules for index form. Each individual array (there are three in this example) contains a field definition. For example, transfer_description field can be up to 20 characters long and can contain Latin alphabet characters plus a space.

When we want to check the fields, we execute this code:

$booReturn = isIncomingFormValid($arrWhiteList, $_POST);

We take the returned value from loadDefinitionsFromTable and pass it, along with the POST data (contained in the array $_POST), into the function isIncomingFormValid. Its first bit of code looks like this:

function isIncomingFormValid(Array $arrWhiteList, Array $arrPost)
 $booReturn = true;
 foreach($arrPost as $key => $value)

The array containing the $_POST variable is named $arrPost. The foreach statement means we’ll reach each field as we come to it. The field name will be contained in $key and its value will be in $value.

First, we have to make sure that the field even exists in the whitelist. The unexpected field might be part of an attack, so we only want to process allowed fields. To do that, we build this function:

function find_the_field($strKey, $arrWhiteList)
  for ($x = 0; $x < count($arrWhiteList); $x++)
    $arrRow = $arrWhiteList[$x];
    if ($arrRow['fieldName'] == $strKey)
      return $arrRow;

  return false;

Then we invoke it with this code:

$arrWhiteListField = find_the_field($key, $arrWhiteList);
if(count($arrWhiteListField) == 1)
  // If the field's not allowed, reject the form
  return false;

If the count is 1, it means the field doesn’t exist, and we return false, which means the field doesn’t exist in the whitelist.

If we’re confident the field should exist, we load the whitelist definition, then check the conditions against the data:

$strMinLength = $arrWhiteListField['minLength'];
$strMaxLength = $arrWhiteListField['maxLength'];
$strCharsAllowed = $arrWhiteListField['charsAllowed'];

if (strlen($value) &amp;gt; $strMaxLength)
  // If the value's too long, reject it
  error_log("Field " . $key . " was too long.");
  return false;

if ($strMinLength > "0")
  if (strlen($value) &amp;lt; $strMinLength)
    // If there's a value configured for minimum length and the
    // value's too short, reject it
    error_log("Field " . $key . " was too short. Minimum length is " . $strMinLength);
    return false;

for ($x = 0; $x < strlen($value); $x++)
  if(strpos($strCharsAllowed, substr($value, $x, 1)) !== false)
    // No Op
  } else
    error_log("Field " . $key . " detected an illegal string.");
    return false;

The code makes sure the value’s not too long. If it’s not too long and there’s a minimum length specified, the code checks to be sure the value is long enough. Finally, the code loops through the value, one character at a time, and makes sure that each character is in the whitelist. Only if the value meets all of those conditions will the function return a true, meaning the value’s okay to process. Otherwise? Too bad, Joe Evil. Today is not your day!

I’d like you to notice something: At no point do I echo the actual value to the error log. Some attacks target the error logging subsystem, so keeping possible injection attacks away from the log is important. I’m not aware of any such attacks in the wild right now, but a) there may be attacks out and about that I know nothing of and b) malice tech advances at the same rate (and sometimes faster) than the general tech world, so who knows what tomorrow will bring?

At this point, whether you’re a Java or a PHP developer, your arsenal includes a slightly eccentric whitelist program. Is it perfect? Good gravy, no! But under some circumstances, it can give you a lot of protection. In other circumstances, it can become one component in a more sophisticated defense.

And sophisticated defenses are both cool and fun!

In the next post, I’ll discuss what such a defense might look like.

by Terrance A. Crow

Terrance has been writing professionally since the late 1990s — yes, he’s been writing since the last century! Though he started writing about programming techniques and security for Lotus Notes Domino, he went on to write about Microsoft technologies like SQL Server, ActiveX Data Objects, and C#. He now focuses on application security for professional developers because… Well, you’ve watched the news. You know why! 

Terrance A. Crow is the Senior Security Engineer at a global library services company. He holds a CISSP and has been writing applications since the days of dBASE III and Lotus 1-2-3 2.01.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.