Application Security Java

Can’t Use Whitelists for your Java Web App? Don’t Abandon Hope!

We all know protecting input is important. A few posts ago, I talked about one option: using whitelists. Implementing a whitelist means you only allow a certain (generally small) set of character into your input fields. I pointed out that they’re great for web applications that are very, very focused in their functionality, like a financial application.

What if your application isn’t so simple or focused? What if you have to allow “safe” HTML tag like bold or italic? As you probably already guessed, for more complex applications, whitelists are next to impossible to implement in a way that favors simplicity over complexity. To obtain the best results, you’d need to:

  1. Identify all safe tags you want to include (to maintain the spirit of a white list) or abandon that principle and try to screen out only the malicious codes.
  2. Write code to enforce the white list or black list that you built.
  3. Find sources of information about new vulnerabilities like the Common Vulnerabilities and Exposures (CVE) list and monitor those on a regular basis.
  4. Update your code as needed to protect against new threats.

In other words, you could do it, but a) your users would hate you for constraining them or b) you’d be forever tweaking it. We need a tool that’s flexible and easily updated by someone who’s focused on that effort.

Just as there’s no need to reinvent a security architecture, there’s no need to build such a tool. The Open Web Application Security Project (OWASP) already has two of them! That hardest thing you’ll have to do is choose between them.

It’s hard to choose because both options work great. That a wonderful problem to have!

Choices, Choices!

The two tools that OWASP offers are:

OWASP AntiSamy

Of the two solutions I’m presenting, AntiSamy is the oldest. It’s still well-maintained (the last release was published in December, 2016). It’s designed to allow your Java web application to pass it a String representing a user’s input, apply a security policy against that String, and return a sanitized version of the String that doesn’t contain any malicious HTML or CSS. Yes, AntiSamy can scan both HTML and cascading style sheets, which is one of its two main differences from the HTML Sanitizer. The other difference is performance. AntiSamy is the slower of the two. It’s not crippling slow, and I’ve used it successfully for more than three years. It’s solid, it’s reliable, and I’ve never seen it allow malicious code.

OWASP HTML Sanitizer

HTML Sanitizer is the newer of the two solutions. It’s also well-maintained, with its last update being in the same timeframe as AntiSamy’s. It, too, is designed to take user input, apply a security policy, and return a String that’s safe for your Java Web app to consume and safe for your users to consume (in terms of HTML). Unlike AntiSamy, it does not support CSS. That’s the main drawback. A minor drawback is that HTML Sanitizer is newer, having been released in September 2013, which means in theory that it hasn’t had as much combat experience as AntiSamy. However, two things have recently inclined me to look at HTML Sanitizer. First, my initial testing shows that it is as effective as AntiSamy, which is a high bar to meet. Second, it’s performance is a little better. It’s not by a lot, but for those of us who are writing code that we hope somedays becomes explosively popular, this is a real benefit. The example I present below uses HTML Sanitizer.

HTML Sanitizer In Action

Installing HTML Sanitizer is simple. Just follow these steps:

  1. Go to HTML Sanitizer’s GitHub site
  2. Click on Clone or Download
  3. Click on Download Zip
  4. Unzip the downloaded file (
  5. Change into this directory: java-html-sanitizer-master/distrib/lib

You’ll only need four jar files:

  1. guava.jar
  2. owasp-java-html-sanitizer-javadoc.jar
  3. owasp-java-html-sanitizer-sources.jar
  4. owasp-java-html-sanitizer.jar

I prefer to put jars like these in a common library, so I placed them in Tomcat’s lib/ subdirectory. Pointing my application at that library reads all of the Tomcat code (like servlet-api.jar), and that’s all the setup I had to do.

For our example, we’ll use an HTML form to prompt for two fields. We’ll use HTML Sanitizer on one and just pass through the other. This probably looks familiar to you:

<form name='input_03' method='POST' action='/asfg_protectinginput_java/processinput_03'>
 <table width='50%'>
 <td width="25%">Enter HTML to be scrubbed:</td>
 <td width="75%">
 <input type="text" name="scrubbedyes"/>
 <td width="25%">Enter HTML <b>not</b> to be scrubbed:</td>
 <td with="75%">
 <input type="text" name="scrubbedno"/>
 <td width="25%"></td>
 <td width="75">
 <input type="submit" value="Test it!"/>

The servlet called processinput_03 includes this import code:

import org.owasp.html.PolicyFactory;
import org.owasp.html.Sanitizers;

We store the two HTML form fields to variables:

String strScrubbedYes = request.getParameter("scrubbedyes");
String strScrubbedNo = request.getParameter("scrubbedno");

Using HTML Sanitizer is about as easy as installing it. Here’s the code that sanitizes “scrubbedyes” for us:

PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.BLOCKS);
String strSaferHTML = policy.sanitize(strScrubbedYes); 

All we have to do is pass the string to the policy’s sanitize method, and it returns safe-ish HTML. Notice that I named the variable “strSaferHTML.” I’m a cautious fellow, so I try to maintain an awareness that no matter what techniques I apply, and no matter how scrupulous I am in apply the best tenets of the principles of security, something’s going to get by. I want even my variable names to help me remember not to be complacent.

Overkill? Not if I don’t let that approach interfere with delivering good applications on time!

HTML Sanitizer makes is very easy to strip dangerous HTML from strings.

After invoking “policy.sanitize,” we can then safely and confidently use code like this to display the results back to the browser:

try (PrintWriter out = response.getWriter()) {
 out.println("<!DOCTYPE html>");
 out.println("<title>INPUT-03 Can't Whitelist? Hope's Not Lost!</title>");
 out.println("<meta charset=\"UTF-8\">");
 out.println("<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">");

What would this do for us? Consider these two strings:

I <b>really</b> enjoyed seeing the movie!

<SCRIPT>;alert("Owned!"); </SCRIPT>

The first is an innocent comment where someone thoroughly enjoyed a movie. The second is a basic demonstration of how to conduct a Cross Site Scripting (XSS) attack. XSS attacks can completely compromise your customer’s data and ruin your reputation. HTML Sanitizer will allow the first line of code. It will delete the second. In other words:

  • “I <b>really<b> enjoyed seeing the movie!” -> HTML Sanitizer -> “I <b>really<b> enjoyed seeing the movie!”
  • “<SCRIPT>alert(“Owned!”); </SCRIPT> -> HTML Sanitizer -> “”

Best of all? You don’t have to maintain the library or routines that can tell safe from hazardous HTML. The HTML Sanitizer maintainers do that.

Keep evil data out of your applications. Customers who trust you are more likely to purchase your services!


I really don’t think anyone would seriously argue against making sure data coming into a Web Application was safe. Generally speaking, what people object to is a protection mechanism that makes it hard for the application to perform its tasks. A white list that’s too restrictive, for example, could make it hard for someone to post a vibrant comment, and something like that can make it harder to build community. So it’s in our best interests as developers who are trying to build safe applications to make it easy to stay safe. Something like HTML Sanitizer does that.

Do you think you’d find it easy to implement this in your site? Do you have any examples of having implemented this? Let me know in the comments!

by Terrance A. Crow

Terrance has been writing professionally since the late 1990s — yes, he’s been writing since the last century! Though he started writing about programming techniques and security for Lotus Notes Domino, he went on to write about Microsoft technologies like SQL Server, ActiveX Data Objects, and C#. He now focuses on application security for professional developers because… Well, you’ve watched the news. You know why! 

Terrance A. Crow is the Senior Security Engineer at a global library services company. He holds a CISSP and has been writing applications since the days of dBASE III and Lotus 1-2-3 2.01.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.