Email and malware dissection, Part II

If you’re running a mail server on the Internet, you will surely be familiar with spam and malicious email. This post is the last part on how to identify and isolate malware in email while respecting your users’ email privacy. Since identification and isolation of suspected malware was partly discussed in part I, this part will deal with how to extract suspected malicious binaries from attachments, create ClamAV signatures, use VirusTotal from the command line and sort suspected malicious emails in a quarantine directory.

After having split up every suspicious email in email body text and attachments, as done in the first part of these series with the help of MIME::Parser and a little Perl code, there are now a lot of “msg-*” directories that need to be examined, one directory per email to be exact. If you look at the contents of an msg-* directory, you will notice a msg-*.txt and/or a msg-*.html file. This is the email body text and may be useful to train a bayesian spam filter, which would increase the “spamminess” of an email like this. More on that later.

The most interesting, however, are the attachments:

# find /tmp/qd -type f -exec file -N {} \; | grep -v empty | \
    sed -e 's/^.*: //' -e 's/ \{2,\}//g' | sort | uniq -c | sort -nr

This command will show how many and of what type the attachments are. Unlike the files in the /var/clamav/q directory, these are likely to contain a lot of HTML and ASCII texts – very likely the raw email texts – and ZIP archives! You can unzip these all at once as done with the commands shown below, but please note that this is not without any risk:

# mkdir /tmp/qz
# cd /tmp/qz
# chmod 700 /tmp/qz
# find ../qd -type f -exec file -N {} \; | grep 'Zip archive' | awk -F: '{print $1}' | \
      while read l; do unzip -n "$l"; done

To prevent the unlikely event that an unaware user stumbles upon some malicious windows executable in the /tmp directory of your mail server, I’ve I added a chmod command in the commands above to lock down the qz-directory from other users, as /tmp is usually world readable. I did this because after unzipping the attachments, there will be malicious binary executables in this directory. You might want to consider doing this for anything placed in /tmp, but since most of my users do not have access to this system, I just left other contents to its default file access rights.

Back to the unzipped attachments; the commands here below sorts and counts the output of the file command, giving an overview of filetypes extracted from the zip-files:

# file -b * | sort | uniq -c | sort -nr

Chances are that you are now presented with a lot less files than before, meaning there were several emails with the same binaries hidden in them. The vast majority of them being PE32 executable (GUI) Intel 80386, for MS Windows!

Depending on what you think of acceptable use for email and email attachments, you could argue that binary executables should not be in email at all, regardless if they are malicious or not. This is where ClamAV is really awesome, as it allows you to write your own malware signatures. As this has been documented by at least a dozen others since then, I won’t go into this in a lot of detail, but this here below will create signatures for every PE32 type of file in /tmp/qz:

# cd /tmp
# mkdir qh
# cd qh
# find ../qz -type f -exec file -N {} \; | \
     grep PE32 | awk -F: '{print $1}' | while read l; do \
          head -c 56 "$l" | sigtool --hex-dump; echo; \
     done | sort -u > /tmp/PE32.hex

Then, you can create a ClamAV database, by prepending each line with a unique “virusname”:

PE32.General.Block.By.Policy.1:0:*:4d5a50000200000004000f00ffff0000b80000000000000040001a0000000000000000000000000000000000000000000000000000000000
PE32.General.Block.By.Policy.2:0:*:4d5a90000300000004000000ffff0000b8000000000000004000000000000000000000000000000000000000000000000000000000000000

Name the resulting file PE32.ndb, and copy that into your ClamAV signature directory. On my system this is /var/db/clamav. Please note that clamd is required to use unofficial signatures, but it should do this by default. For testing you could copy PE32.ndb to a directory, say /tmp/c, then check with:

# clamscan -d /tmp/c /tmp/qz

If everything went as planned, you will see matching binaries being detected as ‘PE32.General.Block.By.Policy.[12].UNOFFICIAL FOUND’.

That should settle the most malicious Windows executables. In theory you could use ClamAV to scan for any pattern, and thus block any file to comply with your email policy. However, I my case not every detected suspicious email contained a PE32 executable, which is why I’d like to verify those with VirusTotal. Up until now I have not read any suspicious email to verify the contents for myself, and although I do not expect VirusTotal to do that, I would like to prevent uploading a file to them. Checking a md5 hash will do just fine, as it is (more or less) unique and doesn’t expose private information from the contents of a file. The down side being that the file is not scanned by any other scanner, just verified if it has been scanned before. After browsing through a few examples, I wrote the following Perl code:

#!/usr/bin/env perl
# Descr: Lookup the MD5 hash of a file on https://www.VirusTotal.com/
# Usage: ./this-script.pl file [file...] [...]
# Output format: Comma Seperated Value (csv)
# Output example: "eicar.com","EICAR-Test-File","44d88612fea8a8f36de82e1278abb02f","53","55","2014-09-06 02:43:40"
# Note: If the md5 hash is not found it will be silently marked as unknown.
# Author: Jacco van Buuren

use warnings;
use strict;
use LWP::UserAgent;
use vars qw(@ARGV);
use Digest::MD5::File qw(file_md5_hex);
use File::Basename;
use JSON;

# -- ALTER THE FOLLOWING TO MATCH YOUR VIRUSTOTAL APIKEY -->
my($KEY) = '';

# -- NO EDITING BEYOND THIS POINT --

my($SLEEP) = 4;

my($VT);
my($JSON);
my($sleep,$valid,$resp,$response,$results);
my($file,$decjson,$md5);
my($positives,$tot_engines,$scan_date,$virname);
my($URL) = "https://www.virustotal.com/vtapi/v2/file/report";

$sleep = $SLEEP;

foreach(@ARGV) {
	$file = $_;
	next if ( ! -r $file );
	$md5 = file_md5_hex($file);
	$valid = 0;
	while( $valid == 0 ) {
		$VT = LWP::UserAgent->new();
		$response = $VT->post(
			$URL,
			Content => [
				'resource' => $md5,
				'apikey' => $KEY
			]
		);
		$results=$response->decoded_content;
		$JSON = JSON->new->allow_nonref;
		eval {
			$decjson = $JSON->decode($results);
		};
		$resp = 0;
		$resp = $decjson->{"response_code"} if ( $decjson->{"response_code"} );
		if ( $resp == 0 ) {
			#print "Received invalid response. Slowing down and retrying... " . $sleep . "\n";
			if ( $sleep < 128 ) {
				$sleep = $sleep * 2;
 			}
 			else {
 				# Waited way to long. Let's just report this as UNKNOWN.
 				#print "Giving up. This is UNKNOWN.\n";
 				$valid = 1;
 				$sleep = $SLEEP;
 			}
 		}
 		else {
 			$valid = 1;
 			$sleep = $SLEEP;
 		}
 		sleep $sleep;
 	}
 	#print $JSON->pretty->encode($decjson);
	$positives = 0;
	$positives = $decjson->{"positives"} if ( $decjson->{"positives"} );
	$tot_engines = 0;
	$tot_engines = $decjson->{"total"} if ( $decjson->{"total"} );
	$virname = "UNKNOWN";
	$virname = $decjson->{"scans"}->{"Kaspersky"}->{"result"} if ( $decjson->{"scans"}->{"Kaspersky"}->{"result"} );
	$scan_date = '1970-01-01 00:00:00';
	$scan_date = $decjson->{"scan_date"} if ( $decjson->{"scan_date"} );
	print '"' . basename($file) . '","' . $virname . '","' . $md5 . '","'. $positives . '","' . $tot_engines . '","' . $scan_date . '"' . "\n";
	undef $decjson;
}

Note the URL variable starts with https! This requires LWP::Protocol::https to be installed (using cpan, example in previous post). In theory it’s possible to verify the entire /tmp/qz directory – containing all unique files from suspicious zip archives – with VirusTotal, but online scanning is (very) slow, so I would recommend to do this only when you really need a second opinion. The Perl code above produces output as comma separated value:

1. The filename of the suspected malware (as-is from your system).
2, The “common” name of the virus/malware as defined by Kaspersky. I selected Kaspersky pretty much at random, I guess any will do.
3. The MD5 checksum of the suspected malware.
4. The number of malware engines that reported the file as being malicious.
5. The total number of engines that were queried.
6. The last date that the file was scanned at VirusTotal.

The script slows down and retries automatically if VirusTotal does not provide a valid response to the query, so scanning may go VERY slowly, depending on VirusTotals’ responses.

In my case, the last few unknown suspicious files were ‘data’ according to the file command. For example, this can now be tested with the Perl script (vt-md5.pl) as follows:

# cd /tmp/qz && ls -1 | \
    while read f; do file -b "$f" | grep -v PE32 >/dev/null && ~/vt-md5.pl "$f"; done

Things become really interesting when VirusTotal doesn’t return a useful answer. Since the amount of files that are now left over is a mere fraction of the total amount of suspicious files, you could of course upload these manually to VirusTotal and see if that has a more satisfying result. If not, you could try an online analyzer or take matters into your own hands.

Epilogue

Looking back at the whole process of identifying malware at rest on a mail server, there are a few things to take note of:

First of all, from a puristic fundamental security point of view, malware scanning will only detect malware that it has learned to be malicious, if it does not recognize a binary and heuristic and/or sandbox analysis does not detect a threat, a malware scanner will assume a binary to be safe until its maintainer has recognized and fixed this. This gives the attacker an advantage for a certain amount of time: find a way to go undetected and the attack will be successful.

Second, there are a number of things in this series of posts that are dubious:

1. ClamAV. Using ClamAV to identify suspicious email is limited to the quality of the signatures and its malware detection engine. It is highly likely that both aren’t perfect.

2. Detecting potentially malicious email is best done before a user can access it in an environment where it can do harm. If you are the owner of a domain and receive email directly on your server, and you want to use ClamAV, you may want to use ClamSMTP or an equivalent that allows you to scan while the email is not yet delivered. Otherwise you are required to use some other form of scanning, either from the mail client directly or triggered by local delivery (procmail) or in a batch job (getmail/fetchmail).

3. Just detecting binaries is not enough these days. The msg-* directories contain a wealth of information on how adversaries are trying to attack you. In my opinion, training a bayesian filter with this information is a necessity. You could dive into this a lot deeper and extract IP-address information from sending systems from the email headers, checking them with an online reputation service, perhaps even adding them to a blacklist on the outside firewall or if you have time: redirect any form of communication from these addresses to a honeypot, so you can analyse their intentions even further.

And finally, if you would like to have some idea of what was malicious, you’d need some form of reporting, which you can do with – yet another script – that you can find here. That script is best run from cron on the mail server, assuming you have ClamAV enabled and are running clamd. It will create a sorted quarantine directory inside /var/clamav and will make a symbolic link to it as ‘quarantine’. It will hold a copy of every suspicious email(!!) named after its md5 hash and create an index file of the suspected malware that was placed in there. Using the index file, reporting can then be done like this:

# awk -F, '{print $NF}' /var/clamav/quarantine/index.txt | sort | uniq -c | sort -nr

…which can be run from cron just the same, resulting in an periodic email report 🙂

And if you’re curious, this is my most recent malware report. Note the UNOFFICIAL detections! The binaries from those emails are candidates to check with VirusTotal.

6023 "Zip.Suspect.WinDoubleExtension-zippwd-1"
  85 "Heuristics.Phishing.Email.SpoofedDomain"
  18 "Suspect.Bredozip-zippwd-6"
  14 "HTML.Phishing.Bank-1001"
  13 "HTML.Phishing.Bank-863"
  13 "HTML.Phishing.Bank-477"
  10 "PE32.General.Block.By.Policy.2.UNOFFICIAL"
   7 "HTML.Phishing.Auction-157"
   3 "PUA.Phishing.Bank"
   3 "Heuristics.Phishing.Email.SSL-Spoof"
   3 "Email.Trojan-407"
   3 "Email.Trojan-303"
   3 "Email.Phishing.Pay-46"
   2 "Suspect.Bredozip-zippwd-2"
   2 "PUA.Win32.Packer.Upx-48"
   2 "Email.Trojan-359"
   2 "Email.Trojan-333"
   2 "Email.Trojan-279"
   2 "Email.Trojan-277"
   2 "Eicar-Test-Signature"
   1 "PUA.Win32.Packer.Upx-3"
   1 "PUA.Win32.Packer.Upx-28"
   1 "PUA.OLE.EmbeddedPDF"
   1 "PUA.HTML.Crypt-11"
   1 "PE32.General.Block.By.Policy.1.UNOFFICIAL"
   1 "HTML.Phishing.Bank-1259"
   1 "HTML.Phishing.Auction-291"
   1 "Email.Trojan-432"
   1 "Email.Trojan-395"
   1 "Email.Trojan-384"
   1 "Email.Trojan-367"
   1 "Email.Trojan-348"
   1 "Email.Trojan-342"
   1 "Email.Trojan-304"
   1 "Email.Trojan-300"
   1 "Email.Trojan-293"
   1 "Email.Trojan-291"
   1 "Email.Trojan-288"
   1 "Email.Trojan-285"
   1 "Email.Phishing.Card-31"

Oh, by the way, did you know you can easily import a CSV file in MySQL?

This entry was posted in email, IT Security, malware, Perl, Shell script and tagged , , , , , . Bookmark the permalink.