Email and malware dissection, Part I

Phishing, Trojans, Worms and other malicious activity in email. These series of posts make quite a lengthy write-up on how to use Unix shell script and Perl to dissect large amounts of email all at once without disrespecting email privacy and verify suspected malicious results with VirusTotal using the API and LWP::Useragent. This first part will cover on how to get all suspicious emails from a mail server and stripping out all the attachments.

An email containing a computer malware is often a Worm, Trojan or downloader of the latter, which basically installs a backdoor into your computer system, allowing an intruder full access to anything that you have on it and use it for any nefarious purposes. Sounds nasty, doesn’t it? Which is why you really should have at least one up to date antivirus product on your system preventing you from harm of such evil. After having installed one, you might wonder if that is working as advertised or at all. This is where the EICAR test file comes in handy. It’s a file that antivirus products should detect and report on when discovered. That’s exactly what I do when I try out a new anti malware solution.

Things are a little different in an office environment; Email is likely to be handled by one or more email servers at a the company’s Internet access gateway. But malware is not only transported by email, a lot is done through malicious advertising on the web these days. At the gateway level that may (likely) be handled by a different server, proxy or firewall. Running antivirus not only on desktop computers, but also at the gateway level with an additional different malware scanner, thus creating a layered barrier of malware defenses, is generally referred to as a defense in depth security strategy. When these malware scanners are carefully selected, this may improve the detection rate of malware. Even if this is only a few percentages better than only a single product at the desktop, the cost of such (additional) infrastructure is easily justified when confronted with the costs of recovery from a computer virus outbreak and the resulting damage of such a disaster. No software product is perfect, which applies to anti malware products just the same: do not rely on just one. Today, with a lot of data and computing power in the cloud, this theater of layered barriers is shifting more towards what you can secure within the cloud, but in my opinion the principle of defense in depth is still valid (not relying on a single defense mechanism).

Comparing results from the malware scanner at desktop and the email gateway is a task for the typical computer geek. This is where I step in 😉 How to check malware detection across multiple scanners is quite different from the EICAR test file.

I receive a lot of email. It’s not that I use email that much, it’s just because I have a few email addresses that have been around for more then a decade. These addresses end up in mailinglists, at computers of friends, family, co-workers and get forwarded to other friends and acquaintances and finally wind up on some computer or server that has been compromised and perhaps has even been part of a botnet. Computer criminals have been known to harvest email addresses from compromised computers, and so…  eventually ending up receiving spam and malware by email… Just using email, regularly <sigh>. On the bright side: The resulting malware is a treasure chest when testing antivirus products!

If you’re looking for high volumes of spam and malware, just post to USENET and other public mailing lists and bad things will end up in your mailbox, unless your email provider or ISP filters that on your behalf. Another way to get the bad stuff quickly is to run an fake open SMTP relay on the Internet. This SMTP server should accept all mail from and to anyone, but not send it, making it behave as if it has accepted and forwarded any mail it got. The cheapest, and safest way to do that, is probably to run a VPS at a provider and have the MX-records of your spam-eating domains pointing to its IP address. Beware that doing this is generally frowned upon and is likely to be forbidden by the provider. Even if the provider does not take the VPS offline, it may get you blacklisted within a few hours, making the domain and server useless. I strongly recommend to do this only(!) after written and signed approval by all parties involved.

Since this blog will remain ‘mostly harmless’ I will only look at my own mail archives, and when looking for malware in an email archive you’ll need a malware scanner. I’m using ClamAV because it’s practically the only product you can use for free on a Linux/BSD box, which is what I’m running on the mail server. Since I only want a copy of infected email, and not the entire mail archives of my users (respect privacy!), this almost requires me to run it on the mail server itself. Otherwise I would have to export the mail directories by Samba / NFS, which requires an additional (virtual) computer that I do not have right now, nor the resources to create a virtual one. Running a malware scanner on the mail server makes quite a big impact on performance, even more so on a virtual instance on the same box, so do that when the users are not using email for a few hours. If you’re in an office environment, this means after business hours (yep, there goes your evening, better order some pizza).

ClamAV can copy infected files, which is great to start to fill a directory with unsorted malware. You may need root-privileges to do the following:

# mkdir -p /var/clamav/q
# nohup clamdscan -z --fdpass --copy=/var/clamav/q /home/*/Maildir &

These two commands assume you have enabled and are running the clamd daemon. The first command creates the quarantine directory, and the second scans the Maildir directory of every user on the system.. provided that they have a home directory located in /home and that it contains the Maildir directory. Maildir directories are common for IMAP email services, POP3 being another popular email service. Do note that this will not work for mbox style mailboxes. The second command starts the actual scanning process, but because it will probably take a while to complete I started it with ‘nohup‘. This allows me to log off and return later without interruptions to the scan. Another very popular trick to do the same is to use ‘screen‘.

After the scanning is complete, you have a copy of every suspicious email to have ever landed in a mailbox on your mail server! Great! If you look at the files you will very likely have quite a few to examine:

# cd /var/clamav/q; ls -1 | xargs file -N | sed -e 's/^.*: //' -e 's/ \{2,\}//g' | sort | uniq -c | sort -nr

The previous commands will tell you how many and what type of files you’ve just harvested. There is a big chance that they are almost all ‘SMTP mail’, either ASCII or ISO-8859 texts. ClamAV also detects phishing attempts, and although strictly speaking those aren’t malware, they are malicious, but in general do not contain binary executables. So the Unix file command is correct; these are text files. But some percentage of these is likely to contain executable binaries embedded in them and those still need to be separated from them. The standard layout and encoding of email MIME has been around for quite some time now, and is not trivial to process just from the Unix command line. I’m pretty sure most people these days would use Python, I still use Perl for anything to complex to solve in shell scripts. Perl has a vast extensive archive of really cool Perl modules, CPAN, which offers Perl modules that you can use in your own programs to quickly get things done. If you’ve never used CPAN before you will be asked a lot of questions, but default answers worked for me.

#!/usr/bin/env perl
use strict;
use warnings;
use MIME::Parser;
use vars qw(@ARGV);

my($dir) = ".";

foreach(@ARGV) {
        $file = $_;
        $ATT = MIME::Parser->new();

To use this little Perl code pasted above, you will need to install MIME::Parser, which can be done like this:

$ sudo cpan -f install MIME::Parser

Once you’ve done all that, and saved the Perl snippet as ‘’, made it executable (chmod +x), you could use it to strip out the attachments from the emails:

# mkdir /tmp/qd
# cd /tmp/qd
# find /var/clamav/q -type f -exec ~/ {} \;

This will create an enormous amount of subdirectories under /tmp/qd; Each email having attachments is decomposed in a directory named msg-<number>. If you end up getting a zillion errors from Perl complaining about ‘Can’t locate MIME/’, then check which perl you are using with ‘/usr/bin/which -a perl’. You might want to change the first line of the Perl code to ‘#!/usr/local/bin/perl’.


As this is the first part of a series, there really is no epilogue to this. The next and last part will introduce a wrapper script to sort the results of ClamAV and will discuss on how to use Perl to make use of the VirusTotal API. To wrap it all up it will briefly sum up the things to do to automate all this work and have a regular report on malicious code occurring in email.


This entry was posted in email, FreeBSD, IT Security, Linux, Mac OS X, malware, Perl, Shell script and tagged , , , , . Bookmark the permalink.