Tracking email using web bugs

“Did you get my email?” If you have ever asked this question to anyone, then this post might interest you. I had my doubts about sharing this, since the trick described here does invade the privacy of the recipient somewhat, but spammers have been using this ever since HTML email became possible, almost two decades ago – if I recall correctly. Security awareness is important, so why not expose an old trick.

Email originally had no message status tracing. To the best of my knowledge, one of the first efforts in getting information from a user on a (large) Unix system was with an additional tool and protocol called ‘finger’. It showed when a user was on the system, and if she/he read had read or new mail in general. You could add additional messages to anyone viewing your email status using special files. I remember users having animated ASCII art in their .plan and .project files. Finger was far from perfect and a lot of administrators banned it from their systems. The finger tool and protocol have pretty much died, somewhere half way in the nineties. The need to check if someone has received a message remained nonetheless and was added as an extension to the SMTP protocol, used for transporting email messages. The earliest efforts for SMTP resulting in the form of Delivery Status Notification, and has been revised and improved upon. But the Internet being diverse as it is today, not everybody makes use of these standards. And even if every system along the path of an email was following standards, some users simply do not want you to know if they’ve received or read an email. Which explains why it is difficult to know if email was received at the other end, regardless of the wishes of the sender. There was quite a debate on Whatsapp’s decision to include read notifications in the latest version of it popular messaging app, while other instant messaging applications have been doing this for some time now.

Nevertheless, suppose I’m stuck with good ol’ email and I really want to know if you have received and read my email. As everyone using email knows, most email client programs can use HTML to “enhance” your email reading experience a bit. HTML is of course not native to email, but the language used to describe and define most of the World Wide Web that we know today. HTML is being served from website much like this one, and served by a webserver. If you’ve ever installed a webserver and looked at it in a little more detail, you surely must have noticed that there is quite some information available in its logs from clients visiting the website. You can extend the functionality of your website quite easily with CGI scripts. The most basic examples will show you just what information you can use from a client that is visiting/using your website.

Here’s an example of just what a web server knows about the connecting client / web browser:

<?php
print "<html>\n<head>\n<title>Environment</title>\n</head>\n<body>\n";
$i = 0;
foreach ($_SERVER as $name => $val) {
	print $i . ": " . $name . " = " . $val . "<br>\n";
	++$i;
}
print "</body>\n</html>";
?>

The output of this little PHP snippet is context depended; its output will be different if you access it with a web browser or run it on the command line. The former being the most interesting in this particular case. There are at least three distinct name-value pairs that are interesting. These are: HTTP_USER_AGENT, REMOTE_ADDR and REQUEST_TIME. A special case is HTTP_REFERER. This value of this last variable points to the page the visitor came from, which in this case is empty, because the user never clicked on a link to get to this page. In a more normal web environment, this value is most important as it tells you where all your visitors were coming from 😉
All these variables together can be used to fingerprint the users’ browser, a.k.a. device fingerprinting. And if you’re running a web server, it is highly likely that this information is already in the web servers’ logs. You could extend that information by using javascript and look for plugins that are in the browser to increase the accuracy of the fingerprint. Javascript, however, works only in a normal web browser and should not run from an email client. If you find a way to do that in email, then you have found a huge security issue.

So, let’s blend these two ingredients together and create an email that uses HTML that points to content – an image – that is served from a hidden corner of a website, and on that website a CGI script – PHP actually – that serves that image and at the same time registers when and what kind of client accessed the website. If the recipient opens the email with the HTML, it will instantly connect to the website serving the image, and that action will take note of some specific and hopefully unique enough properties from the client visiting the website. And although I don’t have a name or email address from the recipient, I do have quite some detail on the computer and webbrowser that was used by the recipient of the email. Because this trick has been around forever, most email clients will block HTTP requests from HMTL email by default and require the reader to specifically allow any connection that might expose the fact that the email was displayed.

First of all, how to create an HTML signature depends on which email client you use. If you use Mac Mail this is quite extensively documented here. The recipe is quite elaborate, but it boils down to a couple of steps:

  1. Create a temporary signature in Mail.app via the preferences item of the pulldown menu.
  2. Look in the ~/Library/Mail/V2/MailData/Signatures directory a file named AllSignatures.plist, in which you will find the name of the file containing the signature you’ve just created.
  3. Stop Mail.app. Not just close it. Stop it. And make sure it is not active.
  4. Alter the file containing the signature. This means replacing it partly with HTML of your choice. Save the file.
  5. Get root privileges and set that file to immutable using the chflags command (can also be done from the GUI).

The HTML in the signature should point to a website that you can modify. I recommend to make the HTML signature as simple as possible. In fact, no more that a simple line, as in the example below:

<body ..>
	<div>
		<img src="http://...your-website.../somedirectory/signature.php">
	</div>
	<div>..Your name..</div>
	<div>..you@your-website..</div>
</body>

As you can see, the image tag in the example above is not pointing to a regular JPEG or GIF image, but to a PHP-file. That PHP file will read “environment” variables from the client along with a unique HTTP cookie. The cookie is used to pinpoint a revisiting client, thus make device fingerprinting more accurate. Those values are stored in a MySQL database named signature.

Signature actually does quite a lot of things. First of all it prepares the image that will be served to the client. It checks if a cookie was present, and it will set a cookie with a unique value. That value is a trace id, used to identify a unique client. If it is retrieved later, as it is sent from the client when it re-visits the website, you can assume that to be the same client. To use it, you need to create a MySQL database with the following tables so it can store these values:

CREATE TABLE `visits` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `tid` varchar(256) DEFAULT NULL,
  `ip` varchar(40) DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `url` varchar(4096) DEFAULT NULL,
  `referer` varchar(4096) DEFAULT NULL,
  `useragent` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

CREATE TABLE `traceids` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `tid` varchar(256) DEFAULT NULL,
  `tcount` bigint(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

The script includes another php-file, global.php. This contains the connection details and authentication information. So if you stored the table definitions as tables.sql, you can create the tables in the database with:

$ mysql -u root -p signature < tables.sql 

…Which assumes you created a database named “signature”. To use this from the script, global.php should contain:

$db = signature;
$db_host = 127.0.0.1;
$db_user = root;
$db_pass = your_password;

But this depends of course on how you configured your database, or how it was configured for you if you’re using a service provider.

If you’ve done all that, and sent an HTML email with a signature which you can trace, then this should end up in the database:

select date, useragent, ip from traceids as t, visits as v
     where t.tid = v.tid order by date desc limit 10;

As an ultimate test to see if things are working, you can easily fool this using wget or curl:

yourhost:~ youasauser$ wget --user-agent='# ; // -- fooquu; `uname -a`' --referer='blah' --save-cookies=haha.txt http://example.com/hidden_directory/signature.php
--2014-10-11 22:20:52--  http://example.com/hidden_directory/signature.php
Herleiden van example.com (example.com)... 128.32.137.13
Verbinding maken met example.com (example.com)|128.32.137.13|:80... verbonden.
HTTP-verzoek is verzonden; wachten op antwoord... 200 OK
Lengte: 55 [image/gif]
Wordt opgeslagen als: ‘signature.php’

100%[=====================================================================================================================================>] 55          --.-K/s   in 0s      

2014-10-11 22:20:53 (2,50 MB/s) - '‘signature.php’' opgeslagen [55/55]

yourhost:~ youasauser$ file signature.php 
signature.php: GIF image data, version 87a, 500 x 1

yourhost:~ youasauser$ cat haha.txt 
# HTTP cookie file.
# Generated by Wget on 2014-10-11 22:20:53.
# Edit at your own risk.

.example.com	TRUE	/	FALSE	2147483647	example_com	B1CCB7577D715AF1A574283DC509A057F6F9DD04

Which should be in the database:

select distinct(useragent),date from traceids as t, visits as v 
     where t.tid = v.tid and v.useragent not like '%mozilla%' limit 10;

Or:

select date, ip, useragent from traceids as t, visits as v 
     where t.tid = v.tid and t.tid like '%DD04';

Epilogue

There are several imperfections here:

  1. Device fingerprinting as described here is rather limited. Although a webbrowser, ip address and a timestamp do provide some information, this is hardly unique enough if several of your emails are being read around the same time. This is especially true for business users, where it is likely that everybody is using the same company issued webbrowser and the actual ip address may be hidden as the desktop may be behind a proxy or NAT firewall. A way to improve this along with the cookie, would be to add a unique identifier within the signature, so you could pinpoint a single email more accurate. Cron and additional scripts may be useful here.
  2. Assuming that the recipient read your email within a few hours, you could send a second message, which should end up in the database having the same useragent and ip address. That would provide some more certainty that the first message was received and read, but is by no means proof that that actually happened; It may have been received and read by someone else, or have been in a preview window without being read.
  3. If the recipient has more than one device to read email on, tracing it back will be difficult, since the cookie is unique to the useragent/webbrowser. The cookie will only help in case the ip address changes, which could happen if the recipient went offline and recipients’ ISP dynamically assigns an ip address from a pool of ip addresses.
  4. signature contains a lot of “wrong” PHP code. It works, but can be improved upon. Quite some code could be done with SQL, which would clean this up a bit. One would be to use prepared statements to access and modify the database.
  5. The tables are not optimal as both id columns aren’t really useful, and the primary key could just as well be the ‘tid’ column.

On the other hand, regardless of tracing specific emails, you can now have tons of fun exploring your “visitors” from the database, using raw SQL power:

Top visits:

select tcount,ip from visits as a, traceids as b 
     where a.tid = b.tid and b.tcount > 1 group by ip order by tcount desc limit 10;

Top useragents:

select distinct(useragent),count(*) as b from visits 
     group by useragent order by b desc limit 10;

Which demonstrates that a database is quite a bit more flexible and faster in this case than using grep on a bunch of log-files from somewhere in /var/log on the command line 😉

This entry was posted in email, IT Security, Mac OS X, Web and tagged , , . Bookmark the permalink.