Posts tagged ‘location’

A Cryptographic Approach to Location Privacy

I have a new paperLocation Privacy via Private Proximity Testing” with Narendran Thiagarajan, Mugdha Lakhani, Mike Hamburg and Dan Boneh. Mike spoke about it at NDSS earlier this week, where it won a distinguished paper award.

What is Private Proximity Testing?

The premise behind our paper is simple: smartphone-based location services today require you to reveal your location to the service provider. Is it possible to have at least a limited set of location services without revealing your location?

One might ask why this is useful since your carrier tracks your location anyway. The answer is that while you might (grudgingly) trust your carrier with your location, your might not trust Facebook, Loopt, Foursquare, or whatever the newest location startup is.

We show that it is indeed possible to provide location functionality in a private manner: specifically, it is possible to do proximity testing with privacy. What this means is that a pair of friends will be automatically notified when they are nearby, but otherwise no information about their locations will be revealed to anyone.

This is a strong notion of privacy—not only does the service provider never get to learn your location, your friends don’t learn your location either except that when you are nearby, the learn the fact that you’re nearby. This is appropriate given the loose notion of ‘friend’ in online social networking.

Note that our concept is a natural fit for the background-service model, where the location app sits in the background and constantly monitors your location, whereas most commercial apps today use the check-in model, where explicit user action is required to transmit data or provide service. We will return to this point later.

Tessellations

Three overlapping hexagonal grids. A blue grid cell is highlighted

The way we detect when two friends are nearby is by dividing the plane[1] into a system of 3 overlapping hexagonal grids. Cryptographic protocols for “Private Equality Testing” allow a pair of users to compare if they are within the same grid cell, but otherwise reveal nothing. By repeating this protocol for each of the 3 grids, they learn if they are close to each other.

For details of how this works, and why simpler methods won’t work, you’ll have to read the paper.

[1] The curvature of the Earth can be ignored since the distances across which our app is intended to work are small.

Theory and Practice

My favorite aspect of this paper is that our research spans the spectrum from math to implementation. This is something that Stanford CS is especially good at.

On the theory front, our contributions were mainly new Private Equality Testing algorithms. Not quite brand-new, but optimizations of existing algorithms. At one point we were really excited about having come up with an algorithm based on an improvement to an arcane complexity-theoretic result called Barrington’s theorem, and were looking forward to what would almost certainly have been the first time ever that it had been implemented in actual software! Unfortunately we later found a more efficient algorithm that used much more prosaic math.

Location tags: because every point in space-time has a fingerprint

On to a completely different part of the paper. Think about all the electromagnetic waves and signals floating around us all the time, varying from point to point, constantly changing and carrying data—GPS, GSM, Bluetooth, WiFi, and many, many others. By extracting entropy from these signals, everyone at a given place at a given time has a shared secret—unpredictable if you’re not at the right place at the right time. Think of the possibilities!

We call these shared secrets location tags. The catch is that the tags extracted by two people are largely equal, but not exactly. What we show in the paper is a cryptographic version of error correction that enables using these approximately-equal secrets as if they were exactly equal. Location tags were introduced by my co-author Boneh and others in an earlier paper; we adapted their work to enable the idea of a shared secret for each time and place.

There are many possible uses for location tags. We use them to ensure that it isn’t possible to spoof your location and try to “cheat.” This is a big problem for Foursquare for example. Here’s another possible use: let’s say a conference wants to have an encrypted chatroom. Instead of handing out keys or passwords—insecure and inconvenient—how about automatically extracting the key from the audio of the conference room! This restricts access to those in the room, and also has forward secrecy, since there are no long-term keys.

This part of our paper is theoretical. We did the math but didn’t build it. The main limitation is the ability of phone hardware to extract location tags. Currently the main viable method is using WiFi traffic; we showed experimentally that robust tags can be extracted within a few seconds. We’re confident that as hardware improves, location tag-based cryptography will become more practical.

Adoption. We talked to both Google and Facebook about adopting our technology in their products. Their responses were lukewarm at best. One barrier seemed to be that current services are committed to the check-in model, whereas our method only works in the background-service model. Ironically, I believe that a major reason the check-in model won (even though Loopt, which took the early lead, was a background app), was privacy—users weren’t comfortable broadcasting their location to their service provider and their friends all the time.

While that was somewhat disappointing, the applicability of our research extends well beyond the consumer web, for example in enterprise or even military settings. Imagine a product manager who wants to track who is attending which events, but wants to guarantee employees that no other information is being collected. Our app is a perfect fit for this scenario.

We’re happy that our ideas are out there, and are always looking to talk to people in the industry who might be interested in making our concept and prototype a reality.

Special thanks to students Frank Wang and Kina Winoto for helping us with the implementation.

There are more blog posts in the pipeline related to this paper. For one, I learnt a lot about the challenges of trying to get crypto adopted in the real world. For another, I’m very excited about a sub-project of this paper called SocialKeys that aims to make encryption transparent, largely eliminating the idea of key management from the user perspective. Stay tuned!

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

February 14, 2011 at 4:28 pm 3 comments

Your Morning Commute is Unique: On the Anonymity of Home/Work Location Pairs

Philippe Golle and Kurt Partridge of PARC have a cute paper (pdf) on the anonymity of geo-location data. They analyze data from the U.S. Census and show that for the average person, knowing their approximate home and work locations — to a block level — identifies them uniquely.

Even if we look at the much coarser granularity of a census tract — tracts correspond roughly to ZIP codes; there are on average 1,500 people per census tract — for the average person, there are only around 20 other people who share the same home and work location. There’s more: 5% of people are uniquely identified by their home and work locations even if it is known only at the census tract level. One reason for this is that people who live and work in very different areas (say, different counties) are much more easily identifiable, as one might expect.

The paper is timely, because Location Based Services  are proliferating rapidly. To understand the privacy threats, we need to ask the two usual questions:

  1. who has access to anonymized location data?
  2. how can they get access to auxiliary data linking people to location pairs, which they can then use to carry out re-identification?

The authors don’t say much about these questions, but that’s probably because there are too many possibilities to list! In this post I will examine a few.

GPS navigation. This is the most obvious application that comes to mind, and probably the most privacy-sensitive: there have been many controversies around tracking of vehicle movements, such as NYC cab drivers threatening to strike. The privacy goal is to keep the location trail of the user/vehicle unknown even to the service provider — unlike in the context of social networks, people often don’t even trust the service provider. There are several papers on anonymizing GPS-related queries, but there doesn’t seem to be much you can do to hide the origin and destination except via charmingly unrealistic cryptographic protocols.

The accuracy of GPS is a few tens or few hundreds of feet, which is the same order of magnitude as a city block. So your daily commute is pretty much unique. If you took a (GPS-enabled) cab home from work at a certain time, there’s a good chance the trip can be tied to you. If you made a detour to stop somewhere, the location of your stop can probably be determined. This is true even if there is no record tying you to a specific vehicle.

ScreenshotLocation based social networking. Pretty soon, every smartphone will be capable of running applications that transmit location data to web services. Google Latitude and Loopt are two of the major players in this space, providing some very nifty social networking functionality on top of location awareness. It is quite tempting for service providers to outsource research/data-mining by sharing de-identified data. I don’t know if anything of the sort is being done yet, but I think it is clear that de-identification would offer very little privacy protection in this context. If a pair of locations is uniquely identifying, a trail is emphatically so.

The same threat also applies to data being subpoena’d, so data retention policies need to take into consideration the uselessness of anonymizing location data.

I don’t know if cellular carriers themselves collect a location trail from phones as a matter of course. Any idea?

Plain old web browsing. Every website worth the name identifies you with a cookie, whether you log in or not. So if you browse the web from a laptop or mobile phone from both home and work, your home and work IP addresses can be tied together based on the cookie. There are a number of free or paid databases for turning IP addresses into geographical locations. These are generally accurate up to the city level, but beyond that the accuracy is shaky.

A more accurate location fix can be obtained by IDing WiFi access points. This is a curious technological marvel that is not widely known. Skyhook, Inc. has spent years wardriving the country (and abroad) to map out the MAC addresses of wireless routers. Given the MAC address of an access point, their database can tell you where it is located. There are browser add-ons that query Skyhook’s database and determine the user’s current location. Note that you don’t have to be browsing wirelessly — all you need is at least one WiFi access point within range. This information can then be transmitted to websites which can provide location-based functionality; Opera, in particular, has teamed up with Skyhook and is “looking forward to a future where geolocation data is as assumed part of the browsing experience.” The protocol by which the browser communicates geolocation to the website is being standardized by the W3C.

The good news from the privacy standpoint is that the accurate geolocation technologies like the Skyhook plug-in (and a competing offering that is part of Google Gears) require user consent. However, I anticipate that once the plug-ins become common, websites will entice users to enable access by (correctly) pointing out that their location can only be determined to within a few hundred meters, and users will leave themselves vulnerable to inference attacks that make use of location pairs rather than individual locations.

Image metadata. An increasing number of cameras these days have (GPS-based) geotagging built-in and enabled by default. Even more awesome is the Eye-Fi card, which automatically uploads pictures you snap to Flickr (or any of dozens of other image sharing websites you can pick from) by connecting to available WiFi access points nearby. Some versions of the card do automatic geotagging in addition.

If you regularly post pseudonymously to (say) Flickr, then the geolocations of your pictures will probably reveal prominent clusters around the places you frequent, including your home and work. This can be combined with auxiliary data to tie the pictures to your identity.

Now let us turn to the other major question: what are the sources of auxiliary data that might link location pairs to identities? The easiest approach is probably to buy data from Acxiom, or another provider of direct-marketing address lists. Knowing approximate home and work locations, all that the attacker needs to do is to obtain data corresponding to both neighborhoods and do a “join,” i.e, find the (hopefully) unique common individual. This should be easy with Axciom, which lets you filter the list by  “DMA code, census tract, state, MSA code, congressional district, census block group, county, ZIP code, ZIP range, radius, multi-location radius, carrier route, CBSA (whatever that is), area code, and phone prefix.”

Google and Facebook also know my home and work addresses, because I gave them that information. I expect that other major social networking sites also have such information on tens of millions of users. When one of these sites is the adversary — such as when you’re trying to browse anonymously — the adversary already has access to the auxiliary data. Google’s power in this context is amplified by the fact that they own DoubleClick, which lets them tie together your browsing activity on any number of different websites that are tracked by DoubleClick cookies.

Finally, while I’ve talked about image data being the target of de-anonymization, it may equally well be used as the auxiliary information that links a location pair to an identity — a non-anonymous Flickr account with sufficiently many geotagged photos probably reveals an identifiable user’s home and work locations. (Some attack techniques that I describe on this blog, such as crawling image metadata from Flickr to reveal people’s home and work locations, are computationally expensive to carry out on a large scale but not algorithmically hard; such attacks, as can be expected, will rapidly become more feasible with time.)

devicesSummary. A number of devices in our daily lives transmit our physical location to service providers whom we don’t necessarily trust, and who keep might keep this data around or transmit it to third parties we don’t know about. The average user simply doesn’t have the patience to analyze and understand the privacy implications, making anonymity a misleadingly simple way to assuage their concerns. Unfortunately, anonymity breaks down very quickly when more than one location is associated with a person, as is usually the case.

May 13, 2009 at 6:42 am 24 comments


About 33bits.org

I’m an associate professor of computer science at Princeton. I research (and teach) information privacy and security, and moonlight in technology policy.

This is a blog about my research on breaking data anonymization, and more broadly about information privacy, law and policy.

For an explanation of the blog title and more info, see the About page.

Me, elsewhere

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 266 other subscribers