Tuesday, June 11, 2013

Metadata and Constitutional Fornication

The current buzzword on the 'net: metadata.

That's what the NSA--and probably the CIA, the FBI, and other agencies--has been collecting, via phone records, search data, and the like. Defenders of these activities--and make no mistake, there are many on both sides of the aisle--insist such collections are nothing for the typical citizen to worry about, that because it's just "metadata" the government is not learning anything personal or private about individual citizens. As this piece in the New Yorker notes, Dianne Feinstein--for instance--has no problem with the collection of metadata:
Dianne Feinstein, a Democrat from liberal Northern California and the chairman of the Senate Select Committee on Intelligence, assured the public earlier today that the government’s secret snooping into the phone records of Americans was perfectly fine, because the information it obtained was only “meta,” meaning it excluded the actual content of the phone conversations, providing merely records, from a Verizon subsidiary, of who called whom when and from where. In addition, she said in a prepared statement, the “names of subscribers” were not included automatically in the metadata (though the numbers, surely, could be used to identify them). “Our courts have consistently recognized that there is no reasonable expectation of privacy in this type of metadata information and thus no search warrant is required to obtain it,” she said, adding that “any subsequent effort to obtain the content of an American’s communications would require a specific order from the FISA court.”  
She said she understands privacy—“that’s why this is carefully done”—and noted that eleven special federal judges, the Foreign Intelligence Surveillance Court, which meets in secret, had authorized the vast intelligence collection. A White House official made the same points to reporters, saying, “The order reprinted overnight does not allow the government to listen in on anyone’s telephone calls” and was subject to “a robust legal regime.” The gist of the defense was that, in contrast to what took place under the Bush Administration, this form of secret domestic surveillance was legitimate because Congress had authorized it, and the judicial branch had ratified it, and the actual words spoken by one American to another were still private. So how bad could it be?
But such a head-in-the-sand point of view ignores the reality of metadata, that through it very specific things can be known via simple extrapolation. Kurt Opsahl at EFF provides some examples in this regard:
What they are trying to say is that disclosure of metadata—the details about phone calls, without the actual voice—isn't a big deal, not something for Americans to get upset about if the government knows. Let's take a closer look at what they are saying: 
  • They know you rang a phone sex service at 2:24 am and spoke for 18 minutes. But they don't know what you talked about. 
  • They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret. 
  • They know you spoke with an HIV testing service, then your doctor, then your health insurance company in the same hour. But they don't know what was discussed. 
  • They know you received a call from the local NRA office while it was having a campaign against gun legislation, and then called your senators and congressional representatives immediately after. But the content of those calls remains safe from government intrusion. 
  • They know you called a gynecologist, spoke for a half hour, and then called the local Planned Parenthood's number later that day. But nobody knows what you spoke about. 
Sorry, your phone records—oops, "so-called metadata"—can reveal a lot more about the content of your calls than the government is implying. Metadata provides enough context to know some of the most intimate details of your lives.
Now, let's be clear on terminology here. "Metadata" is properly defined as data about data. It's data collected about specific groups of data. Thus for something like phone calls, the data would be all of the individual phone calls en toto, who made them, who was called, what was said, etc. But if we were to then look at this set of data from above (in a manner of speaking), we could collect a whole new set of data--times calls were made, locations made from, durations of calls, etc.--and compile that data with reference to those criteria alone. The internals of the calls wouldn't matter. That's metadata. And really, given a large enough number of collections in this regard, we could also have another new set of data with reference to the metadata; this would be meta-metadata. Sound complicated and somewhat esoteric? It is.

When we speak of "meta" anything, we are speaking about viewing the subject matter from one level up, for lack of a better way to say it. For instance, there is the field of metahistory (a term popularized by Hayden White in his 1973 book of the same name), more commonly referred to as historiography, which is the study of the history of the field of history. There is also metamathematics and metalogic (closely related fields which also impinge on the world of computer programming), whose definitions should be easy enough to glean. Metamathematics takes us into the world of Douglas Hofstadter's Gödel, Escher, Bach: An Eternal Golden Braid, a world I've broached before.

Gödel's work on number theory is key here, as it produced not only his Incompleteness Theorem, but also became the cornerstone of arithmetic topology, much of which is basically metamathematics. In this world, mathematical theorems can be grouped into sets and then studied as an entirely new mathematical system. And again, the new system can be treated similarly, allowing for a meta-metamathematics.

Why is any of this significant? Well first, the fact of the matter is that once properly organized, the "meta" field of inquiry is simpler and more limited in scope as a matter of course. Each "step-up" creates an even simpler version, thus the complexity of the initial field limits how many such step-ups are theoretically possible. But here's the thing: knowledge--including truth-values--undiscoverable on the initial level can become easily discoverable on the meta-level. At the same time--when it comes to the individual and privacy--the collection of data at each step-up appears to be less intrusive and is easily justified and defended by those doing the collecting.

But it's not.

And the problem faced by those who would protest such data-gathering is that such metadata has been gathered and analyzed for decades, both by the government and private interests. Consider polling data. Once upon a time, polls--the history of which extends back nearly two hundred years with regard to organized third-party efforts and even further with regard to casual ones--were simple things. And of course, due to their limited nature, they could easily be very, very wrong (see "Alf Landon"). As the science of polling improved, however, so did the analysis of polling data. Today, polling data--meta polling data, actually--can be used to tell us things about specific subsets of the population far beyond the scope of the actual questions in various polls. And we as a nation have embraced, or at least accepted, this trend. Indeed, the media as a whole uses metadata from polls on a near-daily basis as background for news stories and opinion pieces, as well as to better orient their own business models.

Things don't end there, however. Companies with large numbers of customers use metadata--gleaned from their own activities with those customers--too. Health insurance companies are especially likely to go down this road. And the U.S. government agencies dealing with healthcare issues, even before the advent of Obamacare, are no different. Neither is the IRS nor the SSA. The targeting of right wing orgs by word choice was, after all, an exercise in the use of metadata.

The amount of data held by the government on individual citizens is staggering. Yet, the analysis of such data has always been constrained, not because of rules or laws, but because of limitations due to methodology. But limitations no more! The world has changed in the past thirty or forty years. Digital storage--for all kinds of information--means ease of analysis, even with regard to data from generally unrelated sources.

There is a legitimate issue of ownership here: metadata is created as a matter of course from everyday activities between people, companies, and/or government agencies. Looking at polling data as an example, once I answer a pollster's questions, I don't have much to say about how that data--and the corresponding metadata--is used. Things like my location may even be a matter of public record or easily ascertained by something as simple as an area code or zip code, while other background data--if freely given--is freely usable. But what about data--or metadata--collected via a nebulous agreement between two entities then handed over to a third on demand (though without the consent of one of the first two)? That's what we are talking about here.

The government's own collection of data by agencies like the IRS--and of course the Census Bureau--produces data and metadata because of process, alone. The issue of how this data should be used can be a thorny one, as the IRS targeting scandal makes quite clear. But...in general the existence in government hands of such data and metadata is a given. Analysis--when possible--is something of a foregone conclusion. Hence, there is system of checks and balances within government. This system has proven to have severe limitations, due to the process of bureaucratization, wherein bureaucracies act outside of the control of elected officials. Still, there remain avenues to control such behavior even now (though some--mostly on the left, but not wholly--seem to want to limit these avenues).

Data demanded by the government from a third party for the purposes of data or metadata analysis is a whole different ball game. The courts have rightly limited the scope of such demands when it comes to basic--and therefore personal--data. Warrants are supposed to be based on probable and just cause, with reference to specific individuals or organizations, at the very least. Metadata, by it's very nature, cannot be collected under such warrants. Instead, the target becomes the person or company already in possession of the metadata (like Verizon). But this is problematic too, since the target is not being alleged to have done something wrong or even abetted in that regard. Thus, there is no true target in such a warrant. It's a very clear example of a fishing expedition, something the Fourth Amendment is usually seen to prohibit. The Fourth in full (my boldface):
The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.
The legal argument disallowing warrants of the type used to gather metadata is a simple one to make: there is no probable cause, with regard to the "where" of the search (i.e. Verizon's records). Moreover, since the "what" of the search, though in the possession of Verizon, is not clearly owned wholly by Verizon, it can be argued that a separate warrant--with a valid probable cause--would be needed for each data point. After all, if I am a Verizon customer, Verizon may possess a record of all of my phone calls, but theoretically so might I. Such records--with respect to myself and Verizon--are jointly owned; Verizon does not share them with other subscribers as but will do so with me.

There are counters to these arguments, running the gamut from the data being so non-specific as to it being completely non-intrusive (Feinstein's argument) to this kind of data-mining being an issue of national security (the FISA court's rationale for allowing the warrants), but note that none of these counter arguments are actually grounded in the Constitution. All make assumptions about what is permissible, all are about exceptions, not standards.

And let's be clear: there are exceptions to constitutional issues. Always. The question is, is this reflective of one? I would very mush like to answer that with an emphatic "no," but the problem is that--again--we have tacitly accepted the collection and usage of metadata nearly across the board in our daily lives. As much as I might not want the government getting its hands on this particular batch of metadata--phone records and the like--it actually bothers me a great deal more that other government agencies already have loads of data on me and my family with regard to health, income, and the like--as they do on every American--and that such data is routinely used to produce metadata as well.

To drive this point home, consider this wonderful essay by Kieran Healy, "Using Metadata to Find Paul Revere." Professor Healy bases his analysis on nothing more than group membership; there is no usage of actual activities, words spoken, or articles written by various people in the period, just their avowed membership in various organizations/groups (info that--assuming theses groups applied for tax-exempt status--the IRS might be expected to have these days). The final "person by person" table (Healy dealt with 254 distinct personages) yields a central figure in the rebellion and amazingly, that figure is none other than Paul Revere:
Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any) or his personal life. All I know is this bit of metadata, based on membership in some organizations. And yet my analytical engine, on the basis of absolutely the most elementary of operations in Social Networke Analysis, seems to have picked him out of our 254 names as being of unusual interest. We do not have to stop here, with just a picture. Now that we have used our simple “Person by Event” table to generate a “Person by Person” matrix, we can do things like calculate centrality scores, or figure out whether there are cliques, or investigate other patterns.
Now, some might view this as a testament to the power of metadata. And it is. But take it in context with Feinstein's claims that there was nothing to worry about here. She may be right, with regard to the phone data, there may be sufficient safeguards in place. So what? Healy didn't need a single bit of such data to identify Revere as a potential leader of the rebellion in 1772. And the government already has access to that same kind of data--metadata--now; it doesn't need a warrant and there is little standing in its way from utilizing such data for all sorts of purposes, some of them none too good.

And we've walked ourselves down this road, for the most part.

Cheers, all.