If your website has to accept file uploads, you might already have read my previous blog posts about protecting your Java-based site or your PHP-based site. Those posts talked about how to make sure that the uploaded file was the kind of file you were expecting — a PDF, an Excel worksheet, a JPEG, and the like. That protects your site against exploits that target how the web server handles files.
Unfortunately, that’s not really enough. A JPEG that’s really a JPEG might contain malware; a PDF that’s really a PDF might contain a trojan. Fortunately, depending on the volume of traffic to your site, there are two ways to further protect yourself.
Regardless which of the two ways presented below works for you, you’ll start with ClamAV. I’ve read varying reports about its effectiveness, from the dismal rating by Network World to claims that it’s as effective as commercial versions of other products. I’ll leave the final decision up to you, but I’ve learned to be leery of publications like Network World (which I generally like) who take advertising revenue from commercial vendors in the same space as the comparison articles. I’m not saying Network World’s article is not objective; I’m just suggesting that a grain of salt’s in order.
You should install ClamAV per the source site’s documentation. Be sure you set it up to update regularly. There’s no point in installing AV if its signatures are out of date!
Where should you install it? Unless your site’s large enough to be using a farm of servers (see Second Way for Higher Volume Sites, below), you should install ClamAV on your web server. Once it’s installed, please be sure to test it using the EICAR test file. It’s a harmless file used to confirm anti-virus and anti-malware programs are working.
How can you tell if your site classifies as low or higher volume? It’s hard to say with certainty, but here are some factors that affect your decision:
- Number of submissions a day/hour: If your site received a handful of uploads a day (or even an hour), and if the files are 100Mb or less, then you qualify for Low Volume.
- If your site receives files whose uploads are either large and/or overlap in time (i.e., several files may upload consistently at the same time), then you qualify for a Higher Volume Site
Other factors that might affect your decision:
- Running your site on a shared server with dozens or hundreds of other sites push you towards the Higher Volume Site
- Running on virtualized hardware like Amazon Web Services (AWS), Microsoft’s Azure, or Digital Ocean, performance will be 10-20% less than if you ran on dedicated hardware (this figure wildly fluctuates depending on the kind of virtual machine you’re running); that pushes you towards the Higher Volume Site
- Larger uploads, of course, take more time to scan than smaller uploads; so a single JPG will scan much more quickly than a 30 minute MP4 video and would push you towards a Higher Volume Site
Note: I didn’t develop the implementations described below, but I decided instead to highlight those who have already blazed the trial. As a developer, I hate to reinvent the wheel, so I decided instead of add value by describing how to decide which of these two paths could work best for you.
First Way: Low Volume Sites
This method works for Java sites. If your users upload files to a Java servlet, the NS.Infra Team has published a post describing how to write a Java class to encapsulate ClamAV’s functionality. This solution would work like this:
- Customer uploads a file from a page (JSP, HTML, or whatever) and submits it to your servlet (perhaps running under Apache Tomcat)
- The servlet saves the file to a temporary location (maybe in /tmp)
- The servlet instantiates the ClamAVUtil class from the NS.Infra team solution and passes in the location of the temporary file via the fileScanner method
- The ClamAVUtil method fileScanner returns false if the file is not safe (i.e., if ClamAV detected a virus, it returns false meaning the file is not safe)
This is a very linear, non-scalable solution, but it should work well for sites that are relatively small/low volume.
Second Way: Higher Volume Sites
If you have a higher volume site, you should consider using something like the REST service that SOLITA dev/solita describes in their post Virusscanner as a REST Service. Most likely, this means you’ll install the solution on a server other than your main web server. This should would work if your main web server is running PHP or Java (or any other server than can handle a REST or Representational State Transfer call). The solution seems to assume that Java’s the source, though.
The solution would work like this:
- Customer uploads a file to your server
- Your web server passes the input stream representing the file to the REST service (ClamAVClient class, method scan)
- Your web server checks for isCleanReply; if it’s false, the REST service detected a virus
This solution is scalable from two perspectives:
- Being able to accept more scans at a time (depending on the RAM in the REST server), so the number of scans it can handle at once is greater than the previous solution
- Being able to add multiple scanning servers behind a load balancer (maybe Apache mod_jk if your web server is using Apache Tomcat); as your load increases, you can increase the number of scanning servers
This version scales well, though at a cost of spinning up more virtual machines.
As you saw in my previous posts about protecting your PHP-based web server from malicious uploads and protecting your Java-based web server from malicious uploads, it can be a lot of work to reduces the chances an uploaded file will infect your server or harm your users. Unfortunately, that’s just the cost of doing business in the world we live in. Malicious actors can hide malware in a JPG’s comments; financially-motivated syndicates can hide trojans in video files. Your job, as a security-conscious developer, is to make things as hard as possible for those seeking to harm you, your website, and your users. If you design and engineer the solution right, it won’t take any more time to operate than a solution without those protections. So isn’t a little work up-front worth it to maintain your users’ trust?
by Terrance A. Crow
Terrance has been writing professionally since the late 1990s — yes, he’s been writing since the last century! Though he started writing about programming techniques and security for Lotus Notes Domino, he went on to write about Microsoft technologies like SQL Server, ActiveX Data Objects, and C#. He now focuses on application security for professional developers because… Well, you’ve watched the news. You know why!