Oklahoma Abortion Law: Bloggers get it Wrong

October 9, 2009 at 6:24 pm 10 comments

The State of Oklahoma just passed legislation requiring that detailed information about every abortion performed in the state be submitted to the State Department of Health. Reports based on this data are to be made publicly available. The controversy around the law gained steam rapidly after bloggers revealed that even though names and addresses of mothers obtaining abortions were not collected, the women could nevertheless be re-identified from the published data based on a variety of other required attributes such as the date of abortion, age and race, county, etc.

As a computer scientist studying re-identification, this was brought to my attention. I was as indignant on hearing about it as the next smug Californian, and I promptly wrote up a blog post analyzing the serious risk of re-identification based on the answers to the 37 questions that each mother must anonymously report. Just before posting it, however, I decided to give the text of the law a more careful reading, and realized that the bloggers have been misinterpreting the law all along.

While it is true that the law requires submitting a detailed form to the Department of Health, the only information that is made public are annual reports with statistical tallies of the number of abortions performed under very broad categories, which presents a negligible to non-existent re-identification risk.

I’m not defending the law; that is outside my sphere of competence. There do appear to be other serious problems with it, outlined in a lawsuit aimed at stopping the law from going into effect. The text of this complaint, as Paul Ohm notes, does not raise the “public posting” claim. Besides, the wording of the law is very ambiguous, and I can certainly see why it might have been misinterpreted.

But I do want to lament the fact that bloggers and special interest groups can start a controversy based on a careless (or less often, deliberate) misunderstanding, and have it amplified by an emerging category of news outlets like the Huffington post, which have the credibility of blogs but a readership approaching traditional media. At this point the outrage becomes self-sustaining, and the factual inaccuracies become impossible to combat. I’m reminded of the affair of the gay sheep.

Entry filed under: Uncategorized. Tags: , , , .

Livejournal Done Right: The Case for a Social Network with Built-in Privacy De-anonymization is not X: The Need for Re-identification Science

10 Comments Add your own

  • 1. PublicRecords  |  October 9, 2009 at 9:58 pm

    So a FOIA request is all you need and then you CAN get all this data and put it online, right? Since there are no names to redact, it’s “private”, right?

    And the Dept. of Health is not staffed by religious persons who might misuse the data, or “leak” it, right?

    And, incidentally, you may be reading it wrong. You’re referring to Section 7, while Section 5 merely says they won’t include name, address, or information “specifically identifying” any patient/female. Just because they ARE DEFINITELY going to release aggregate reports (Section 7) DOESN’T mean they aren’t going to release INDIVIDUAL reports, which they consider would not “reasonably lead to the identification of any individual female”.

    The ONLY reason for the gov’t to have this info is either to act on it or publicize it. When have you ever seen expanding bureaucracy NOT misuse its power?

    “But I do want to lament the fact that bloggers and special interest groups can start a controversy based on a careless (or less often, deliberate) misunderstanding”

    Special interest groups? Where have you been for the last decade of government encroachment? Trying to stop government from intimidating patients and physicians is not a special interest group.

    Reply
    • 2. Arvind  |  October 9, 2009 at 10:39 pm

      Your outrage is charming, but misdirected.

      “So a FOIA request is all you need and then you CAN get all this data and put it online, right?”

      The law specifically states that the information is exempt from the Oklahoma Open Records Act (the FOIA doesn’t apply here since the data isn’t being held by the federal government).

      “And the Dept. of Health is not staffed by religious persons who might misuse the data, or “leak” it, right?”

      I have no opinion on that. I already said I’m not defending the law. Could you please try to understand that fact? I’m only saying that the claim about re-identifability doesn’t check out.

      “Just because they ARE DEFINITELY going to release aggregate reports (Section 7) DOESN’T mean they aren’t going to release INDIVIDUAL reports, which they consider would not “reasonably lead to the identification of any individual female”.”

      That may be true, I don’t know; it’s for lawyers to figure out. But again, it is beside my point, which is that the public outrage has been premised on the interpretation that the law mandates the release of individual records.

      “Special interest groups? Where have you been for the last decade of government encroachment? Trying to stop government from intimidating patients and physicians is not a special interest group.”

      Holy crap, you need a dictionary Wikipedia. That is in fact exactly what special interest group means.

      Reply
      • 3. Anonymous  |  October 12, 2009 at 4:02 pm

        I’d have to disagree about your special interest group comment too. The idea of trying to restrain the power of government, that touches a lot more people that what you’d just describe as a special interest group, unless you want to call thousands and thousands of unconnected people an organization. PublicRecords rightly called you on that one.

        Mostly though, I think I don’t like your ‘lamenting’ of blogger’s actions. Even if it’s overstated, this is definitely an issue that deserves the attention. You might be understating what someone can do with information that gets slipped, or the information that’s readily available.

        And even beyond that, you mention that the law has some pretty obvious flaws, so…again, the outrage comes from where?

        Reply
        • 4. Arvind  |  October 12, 2009 at 4:12 pm

          I’m not calling “thousands of unconnected people” a special-interest group. I’m calling Feminists for Choice, the organization that first raised the re-identification claim in their blog, a special-interest group. I should have made that clearer in my post.

          “Mostly though, I think I don’t like your ‘lamenting’ of blogger’s actions.”

          Well, we’ll just have to disagree on that one, then. That comment isn’t just from this incident, but rather something that I’ve felt for years, and it is my considered opinion. I feel that more misinformation than information has been spread through these periodic outcries and outrages in the blogosphere.

          I really like this quote by Clay Shirky (about the #amazonfail incident):

          Though the event initially triggered enormous moral outrage, evidence that it didn’t actually happen didn’t quell that outrage. Moral judgment is harder to reverse than other, less emotional forms; when an event precipitates the cleansing anger of righteousness, admitting you were mistaken feels dirty. As a result, there can be an enormous premium put on finding rationales for continuing to feel aggrieved, should the initial rationale disappear. Call it ‘conservation of outrage.’

          Reply
  • 5. kamalika  |  October 10, 2009 at 9:17 pm

    the only information that is made public are annual reports with statistical tallies of the number of abortions performed under very broad categories, which presents a negligible to non-existent re-identification risk.

    Do you know how broad these categories are? If they are releasing a multidimensional histogram of values, and several of the bins contain just a few people, it could still be a re-identification risk.

    Reply
    • 6. Arvind  |  October 11, 2009 at 1:18 am

      Hi Kamalika,

      That’s a good question. Fortunately, the law spells out the categories. Most of the histograms are 1-dimensional, and none have more than 2 or 3 variables. And the multi-dimensional ones look pretty innocous, like (age, marital status, race). This is for the whole state, not on a county-by-county basis.

      Reply
  • 7. Spotlight OK » HB 1595  |  October 13, 2009 at 7:54 am

    […] Left hypocrisy aside, upon further scrutiny, 33 Bits of Entropy, a data analysis blog, argues that the way the data is collected will not serve to identify the […]

    Reply
  • 8. Vidyaguy  |  October 20, 2009 at 9:26 pm

    Semi-accidentally ran across your website. It is rare to find a mind that actually works, rather than careening from one emotional fix to another. I suspect that I will return from time to time, just to find some refreshing and relatively logical commentary.

    Reply
    • 9. Arvind  |  October 21, 2009 at 12:27 am

      Thanks :-) I only write when I have something to say, and so occasionally go months without a post. Subscribing to the feed might be easier than visiting from time to time.

      Reply
  • 10. Jeffrey Stubs  |  October 21, 2009 at 6:37 am

    I think that people are still afraid of the Big Brother (I mean the original one, from George Orwell’s book) that will observe their every move and use information about them to make them more submissive. Every new innovation that involves data collecting is making people more nervous – and I don’t blame them. There is always something that keeps us aware of the thing that our data can be someday be stolen or use by someone that would like to use them against us.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


About 33bits.org

I'm an assistant professor of computer science at Princeton. I research (and teach) information privacy and security, and moonlight in technology policy.

This is a blog about my research on breaking data anonymization, and more broadly about information privacy, law and policy.

For an explanation of the blog title and more info, see the About page.

Me, elsewhere

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 254 other followers