I was conjuring up a physical world which had the same level of tracking and logging as the Internet does — a preposterous world because nobody would expect the same level of tracking to occur once they left their computer and went outside right?
Little did I know that my preposterous world isn’t that far from becoming a reality, and that there could soon be any number of people who know what you did last summer.
October is Cyber Security Awareness Month (ironically, a fact which I’m not sure many people are aware of), and I wanted to write something to try to increase people’s awareness of Internet security and some of the things that go bump on the net (and not just during the night, I might add). I also left it to the last minute.
So I decided to ramble on for two thousand and odd words by reviving an old idea of mine. I’m going to describe a physical world which implements the same level of tracking as the Internet does, in an attempt to increase awareness of how much information your web browser and other Internet activities can reveal about you.
Are you sitting comfortably (this goes on for a bit, so you might want to get a coffee or something)? Then I shall begin…
Imagine a world in which anyone you visited would know where you came from and what type (including the make and model) of vehicle you used to get there.
Any computer that you connect to will get your IP address — four (in the case of IPv4) numbers that are essentially equivalent to your street address in that they let the other computer know where on the Internet you are so that it can send the web page, email, or any other data that you request (and in many cases didn’t request) back to you. It is the same as having to include your return address in a letter when writing to someone, so that they know where to send their reply to.
When you use a web browser to connect to web sites, the web browser will send a piece of information known as the ‘User-Agent’. This information identifies the type of browser, the browser version, and often the type and version of operating system that the browser is running on. This is how many web sites are able to offer you versions of software that are applicable to the operating system that you are running, and how they are able to redirect you to a different version of their site that is more suitable for mobile phones if you visit their web site from your mobile.
Now imagine, that in this world, whenever you went from one shop to the next, you had to tell them which shop you’d just come from and what you were looking at in that shop.
Web browsers send another piece of (misspelled) information called a ‘Referer’ (the name is misspelled in the standard documentation and as such is implemented as ‘Referer’ rather than ‘Referrer’). The ‘Referer’ contains information about the referrer, or which page referred you to the page that you are visiting. The referrer information is sent when you visit a web page as a result of clicking on a link, and contains the web address of the page that contained the link which you just followed.
For example, if you go to the front page of my blog, http://malwaremusings.com/ and happen to click on an interesting sounding blog in the Blogroll section, then your browser will take you to the blog’s web site and whilst chatting with said web site, just happen to slip in to the conversation Referer: http://malwaremusings.com/. It is pretty much the equivalent of filling in a survey which asks ‘How did you hear about us?’ at every place you go to.
This is how a lot of the web statistics and paid advertising works — when you click on an advertisement, the advertising company’s web site logs the ‘Referer’ information which tells the advertisers the web site and page that contained the advertisement upon which you clicked.
It also means that web site owners can see if you arrived at their site as a result of an Internet search and, if the Internet search engine uses the web address to store the search query (which most of them do), then the web site owner can also see the search query used to find their site.
Now, unlike your street address, an IP address doesn’t contain much information in itself. However, this is a good example of a piece of information being mostly harmless by itself, but having the potential to reveal information about you when combined with other information.
The Internet has what is known as the Domain Name System, or DNS. This system is used to find the IP address corresponding to a given host name, or web site name, and is what saves you from having to remember the IP address of every web site you want to visit. The process of finding the IP address corresponding to a name (or finding a name corresponding to an IP address) is known as resolving.
IP addresses will usually resolve back to a name, and that name will usually contain the domain that the IP address is assigned to. This isn’t so bad if you are at home, as the name will usually contain the domain name of your ISP (Internet Service Provider).
However, if you are at work, then the name will usually contain the domain name of the company you are working for (or more precisely, whose network you are using). This means that simply by using a computer at work, you’ve added an extra piece of information to that given away by your browser — your employer (or at least an association between yourself and that company).
If you login (authenticate yourself) to any kind of Internet service, then you are identifying yourself. If you later login to the same account from somewhere else, then you are creating a trail of locations associated with you. For instance, your home/ISP, your employer’s network, a friend’s ISP.
Now think about this. This is an example of how people having a piece of information may be harmless now, but not so harmless later (particularly as technology advances) — the introduction of Geolocation databases.
Geolocation databases, in the case of IP addresses anyway, allow a physical location to be determined from an IP address. The accuracy may vary, and may just name the country, but can name a city or a street (I heard of an actual street address given for a business).
Geolocation databases highlight that you don’t just have to worry about what can be done with your data now, but also how the data can be used (and misused) in the future.
For instance, when you uploaded your profile photo to a social media site, did you think that they’d end up with facial recognition technology giving them the ability to link your face with your name (through your account details), and then identify you if you appeared in any other photos? You can then be associated with anyone else in those photos, and the person who uploaded them.
If the person uploading the photo didn’t remove the Exif information from the image before uploading it, then you could potentially be associated with a GPS location (some phones and possibly cameras with GPSes can include the GPS coordinates of where the picture was taken) and time — a problem that didn’t exist (back when I was a lad — sorry, I just had to sneak that in) in the days of film, or when photos were printed and published.
Something else to think about is software and systems that contact servers on the Internet. For instance, web browsers configured to keep you ‘safe’ on the Internet by checking to see if you are trying to go to a known malicious site, will need to send details of where you are going, to a server on the Internet so that it can be compared with a database.
If it didn’t send the name of the site, nor any information that identified where you were going, then it couldn’t tell you whether it is a known malicious site. This ‘safe’ site check will give the owner of the site a trail of sites that you have visited.
The same goes for software that has an ‘automatically check for updates’ option. It is revealing the fact that you have the software installed, any information that can be gleaned from your IP address, and who knows what other information it is sending out. If the software doesn’t use an SSL connection (such as HTTPS) and authenticate the server when connecting, then it potentially opens you up to a man-in-the-middle attack during which a malicious third-party could substitute a rogue ‘updated’ version of the software.
How many of you have configured your mobile devices to automatically connect to networks and update email/RSS feeds/other content? Again, the change in IP address can allow the server operators (and anyone else with access to the log data) to infer when you leave home, when you arrive at work say, when you leave work, and when you arrive back home. Plus whether you went anywhere else during the day, say for lunch.
If you’ve hidden the name of your wireless network (which, incidentally, can be found when a client connects to it — even if you use encryption), then I think you’ll find that your device is constantly broadcasting your wireless network name (which is why the network name can be found when a client connects to it) to see if you are within range of your network.
This opens you up to another type of attack where someone sets up a rogue access point with the same name as the one which your device is looking for. They can then let you get to the Internet, but will obviously have access to all of your Internet traffic. Note that most Internet browsing and email (including non-webmail) is usually sent in clear text and readable by anyone on the network who can see it.
Our Internet activity produces a lot of log data. The log data is a bit like an electronic equivalent of our body language, in that it can be used to infer information which we didn’t necessarily want to give out (nor necessarily realise that we were giving out).
Whilst doing some fact-checking for this post, I found a three-part series on the Electronic Frontier Foundation’s web site:
- New Cookie Technologies: Harder to See and Remove, Widely Used to Track You
- How Online Tracking Companies Know Most of What You Do Online (and What Social Networks Are Doing to Help Them)
- Browser Versions Carry 10.5 Bits of Identifying Information on Average
I used to think that this problem was mainly a problem on the Internet. Granted, there are things like credit cards and what not that leave a trail in the physical world, but it is largely an Internet problem.
Enter the rewards programs, discount vouchers, and as I found out from a Four Corners documentary (‘In Google We Trust‘), automatic number plate recognition systems and CCTV!
Rewards programs take your personal details such as name and address (and possibly more — I’ve never signed up to one). I believe, judging by the fact that I keep getting asked at the checkout, that this membership card is then scanned at the checkout when you make your purchases. This gives them the potential to track your purchases, and the store(s) in which you make them (which leaks location information), and the times at which they were made.
Another example. A supermarket fuel discount voucher that I received, contained a bar code with the following information encoded in it:
- Store number
- Register number
- Receipt number
- Expiry date of the voucher
- The amount of the fuel discount.
The store number will tie it to a place, the register and receipt number will tie it to a time and presumably a list of items purchased. If you paid by credit card, or with a rewards program, then all of that information can be linked to yourself. The expiry date and discount amount are reasonably harmless (as far as I know).
The scary thing though, is that after watching the Four Corners documentary ‘In Google We Trust‘, I realised that the preposterous world that I was imagining while driving home the other night isn’t that far from becoming a reality.
The documentary was showing police car mounted automatic number plate recognition systems that take images of vehicles, extract the registration number from the number plate, and store that registration number along with a time stamp and GPS location information. This information is apparently stored for five years.
It was also reporting on shopping centres adding tracking software to their CCTV systems to allow them to track shoppers’ movements.
I’ve long been aware that just about anything electronic can be logged and tracked. It is usually (or used to be) logged for troubleshooting purposes, because (sensible, and trust me, I’ve seen some daft ones) log messages can make it a lot easier to find problems when things go wrong.
The problem is though, that anything that is logged can also be searched and correlated with other logged information. This correlation of logged data with other log data and/or context, can make it pretty easy to deduce or to infer knowledge which wouldn’t have otherwise been available.
I’ve just spent almost six months working (and continue to work) with a product designed to do just that — search and index log data and make it easy to correlate logged events.
In some cases, such as troubleshooting problems and looking for unusual circumstances, this is awfully useful. In other cases, such as when you’re going about your everyday life, it’s worrisome.
Especially worrisome when you also consider the security side of things including how securely all of this information is being stored, the security level of the software and systems implementing and storing the information, and the security awareness of the people looking after the data/systems.
Data breaches happen more often than I think most people realise. Information about you can be used in social engineering attacks to make you believe that someone must be legitimate because they know information about you, when in fact they just bought it off a total stranger on the Internet (see Four Corners: Your Money and Your Life).
This is where security awareness (cue Security Awareness Month) comes in to the picture — people generally won’t bother protecting things if they don’t believe that there is a threat.
For instance, you’re not going to bother buying earthquake insurance if you don’t think that you live in an earthquake prone area. This is fine, as long the threat doesn’t actually exist — that is, you really don’t live in an earthquake prone area. I saw a great quote on a lanyard/security pass (this is my recollection of it, as I’ve been unable to find it since):
The biggest threat to security is the belief that there is no threat to security.
It would also seem that this preposterous world doesn’t have many (if any) privacy settings, or at least, not many/any that we are given control of — we’re just expected to go along with whatever tracking and information sharing other people feel that they want to do.
All this (and believe me, this stuff is just the tip of the iceberg) scares me more than any Halloween ghosts and monsters because unlike my preposterous world it would seem, ghosts and monsters remain imaginary.