Interstell, Inc.

Security, Application Security, and Software Research

Protecting Input: Implement Whitelists


We all know how important it is to keep malicious content from getting into our websites. Many of the most common attacks the Open Web Application Security Project (OWASP) lists, like Injection, Cross-site Scripting, or Unvalidated Redirects and Forwards, are only possible if the application doesn’t disallow bad content. There’s more than one way to accomplish this, and I’ll present several over the next few blog posts. We’ll start with what sounds like the easiest: whitelists.

Implementing a whitelist means only allowing characters, or combinations of characters, that you know are safe. Sounds easy, right? It can be — if your application’s scope allows it!

Use whitelists to completely control what characters get through your UI -- and into your application.

Use whitelists to completely control what characters get through your UI — and into your application.

Suitable Applications

Let’s say you’re working on a banking application. It’s scope is narrow and it only allows these types of data:

  1. Numbers to indicate amounts (0-9)
  2. Punctuation for numbers, like commas (1,200) and periods (1.97)
  3. No punctuation in the strings that capture the reason for a transfer (A-Z, a-z, and spaces)

Implementing a whitelist for this kind of application would be pretty easy. Just make sure numeric fields only have 0-9, a period, or a comma. Make sure that strings only have A-Z, a-z, or a space. Before you discount this kind of application as being too simplistic for the real world, consider: there are a lot of financial applications out there, and they need protecting, too.

On the other hand, let’s say you’re writing an application that allows people to comment using some basic HTML and more robust strings. For example, you want to allow non-destructive HTML tags (like <b> to start bolding) and you want to allow contractions, possessives, and quotes (like apostrophes and double quotes). At first glance, you might be tempted to just add those character (<, >, ‘, and “) to the whitelist.

That could be catastrophic.

Hyperbole aside, those characters can be very dangerous in the wrong combination. In this case, a whitelist alone can’t protect you. But a whitelist can still be part of the protection for this kind of application. I’ll go into more detail in future posts.

For now, let’s take a look at a basic whitelist.

Whitelist Anatomy

There’s an easy and very common way to implement a whitelist, and then there’s my paranoid way of implementing one. Can you guess which way I’m going to show you? Before I subject you to my paranoia, let me get one thing out of the way.

No matter what kind of of input sanitization/protection you implement, build it into both the client-side (e.g., in JavaScript in the browser) and on the server (the backend).

Some of you who have built secure application in the past might be saying, “But why? The client is inherently unsafe, and validations there should only be considered a client convenience!”

This used to be my mindset, and it’s true you can be wildly successful following that approach. But then I read a thread on the OWASP Application Verification Standard mailing list. A developer asked if there was any value to client-side validation, and to my surprise, Jim Manico presented a genius observation. In part, it read:

Client side validation is a valuable intrusion detection technique for defense as well.

If you do both client side and server side validation, then how many server side validation errors should you get? None. And if you do, you’re under attack or someone is using an interceptor to mess with your app.

Early intel on attacks is your friend.

Simple, elegant, and brilliant. In short, genius! I’ve adopted this as my approach.


It might seem shocking (it seemed to so me!), but client-side validation can actually serve a legitimate security purpose!

It might seem shocking (it seemed to so me!), but client-side validation can actually serve a legitimate security purpose!

Now, back to my paranoia. Conventional wisdom says that you can enforce whitelists by using regular expressions (regex). You take the input string and execute a regex against it. If the input string breaks the regex expression’s rule, you can consider the input string unsafe. For example, according to this StackExchange post, this regex string would only allow Roman alphabet characters:


I admit that this works in every situation I’ve seen. It works in PHP and it works in Java. It’s widely accepted as a valid, secure approach.

And yet…

In ancient days, Paleolithic developers worried about attacks called buffer overflows. I’m being facetious, because these attacks are still a danger today (and no modern languages existed in the Paleolithic), but consider: these attacks work because the size of the input exceeds the runtime’s ability to manage it. In other words, taking the input and trying to compare it to a regex expression could trigger the vulnerability.

Java is highly resistant to this kind of assault. Many modern languages have counter measures. But I ask myself, why take the chance, when there’s a safe way to avoid it?

Why compare the whole input string to a regex expression? Why not compare the input string one character at a time to a whitelist? It’s safer, because I know of no way a single character can trigger an exploit. Of course, the act of bringing the input into the code at all could trigger the exploit, which is why it’s important to control POST/PUT sizes. But I think it’s prudent to check incoming data against a whitelist, one character at a time.

Whitelist in Java

Figure 1: General overview of the whitelist solution.

Figure 1: General overview of the whitelist solution.

My goal was to implement a whitelist in a way that is a) easy to configure, b) easy to implement, and c) easy to support. This is the best I’ve come up with so far! The class stores the whitelists for retrieval by the key application name plus HTML form name. For example, let’s say the application input-01 (the sample application for this post; I’ll make it available in its entirety at some point) has an HTML form in the file index.html. WhiteListAnalyzer would setup a HashMap that contained the definitions for the fields within index.html, and the application could retrieve that HashMap with the key “input-01~index.html.”

Here’s some sample code that shows what a whitelist definition looks like. First, we have to import the package containing the HashMap:

import java.util.HashMap;

Next, we define the HashMap to hold the individual definitions:

HashMap hmScratchDefinition = new HashMap();

For each HTML form field, we’ll define the following characteristics:

  1. Minimum length
  2. Maximum length
  3. A String containing the characters allowed in this field; in other words, the actual whitelist
  4. An indicator of whether or not the field can be null (for future use)

We’ll build one array per field, and we’ll store that array in the HashMap. The HashMap entry’s key will be the name of the field. Given a field called “transfer_description,” the code looks like this:

String minLength = "0";
String maxLength = "20";
// Notice that this includes a blank
String charsAllowed = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz ";
String nullAllowed = "No";
String[] arrScratch = {minLength, maxLength, charsAllowed, nullAllowed};
hmScratchDefinition.put("transfer_description", arrScratch);

We can repeat that kind of logic for all of the fields in the form. Then, we add that HashMap to the master HashMap that contains all of the form definitions:

hmWhiteListDefinitions.put("input-01~index.html", hmScratchDefinition);

When a user submits the form in index.html to a servlet, the servlet instantiates WhiteListAnalyzer and invokes the method isIncomingFormValid, which accepts the name of the application/form (input-01~index.html) and the HTTP request. Here’re the imports for this section of code:

import java.util.Enumeration;
import javax.servlet.http.HttpServletRequest;

And here’s the code itself!

HashMap hmAppFormName = (HashMap) hmWhiteListDefinitions.get(strAppAndFormName);

Enumeration<String> enumParameterNames =  request.getParameterNames();

  String strParameterName = (String) enumParameterNames.nextElement();
  // Check to see if the parameter/field even exists
  String strParameterData = request.getParameter(strParameterName);
  String[] arrValidation = (String[]) hmAppFormName.get(strParameterName);

  if (arrValidation == null)
    System.out.println("Attempt to validate illegal field name in app and form " + strAppAndFormName);
    return false;
  if (strParameterData.length() < Integer.valueOf(arrValidation[0])) { System.out.println("Attempt to validate a parameter in " + strAppAndFormName + " failed; string too short"); return false; } if (strParameterData.length() > Integer.valueOf(arrValidation[1]))
    System.out.println("Attempt to validate a parameter in " + strAppAndFormName + " failed; string too long");
    return false;
  // Cycle through each character in the string and make sure it's in the whitelist
  for (int x = 0; x < strParameterData.length(); x++)
    String strToCheck = strParameterData.substring(x, x + 1);
    if (!arrValidation[2].contains(strToCheck))
      System.out.println("Illegal characater found for app/form " + strAppAndFormName);
      return false;
// If we get this far, it means the form has valid data
return true;

Pretty simple, huh? Here’s the code that actually implements the whitelist:

for (int x = 0; x < strParameterData.length(); x++)
  String strToCheck = strParameterData.substring(x, x + 1);
  if (!arrValidation[2].contains(strToCheck))
    System.out.println("Illegal characater found for app/form " + strAppAndFormName);
    return false;

The array containing a field’s definition is called arrValidation. Element 2 contains the list of whitelist characters. The for loop pulls the first character off the input field’s data, makes sure it’s in the whitelist, and then pulls the next character out of the string. If the loop can’t find a character in the string of allowed characters, the method returns false so that calling application knows there’s a nefarious character in the input.

Notice that I don’t echo the actual illegal character to the log. Since it’s illegal, I have no idea what it is, and it could be dangerous. Yep, that’s overly paranoid, but I couldn’t think of a reason to take the risk, so I just log that there was a problem with a specific field. And yes, I know that System.out.println isn’t suitable for a live production application! I’d just the logging built into Tomcat or something like Log4j in a real application.

In my next post, I’ll show you how to implement a whitelist in PHP.

by Terrance A. Crow

Terrance has been writing professionally since the late 1990s — yes, he’s been writing since the last century! Though he started writing about programming techniques and security for Lotus Notes Domino, he went on to write about Microsoft technologies like SQL Server, ActiveX Data Objects, and C#. He now focuses on application security for professional developers because… Well, you’ve watched the news. You know why! 

Categories: Application Security, Java

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.