The NSA’s Unreliable Substitute for Real Investigation

Published September 1, 2006

Last December, the Washington Post reported the National Security Agency (NSA) has been conducting warrantless eavesdropping of Americans who make overseas calls. In May, USA Today described another secret NSA program, this one aimed at building a government database containing the source and destination of every domestic phone call in America.

Questions have been raised about the legality of these programs, and their defenders have argued that rules against warrantless surveillance should be relaxed to explicitly permit mass automated monitoring of electronic communications.

That would be a mistake. There are reasons to doubt such open-ended electronic surveillance techniques would be a useful tool in the fight against terrorism. And there are good reasons to worry that loosening warrant requirements would effectively eviscerate judicial supervision of surveillance activities, dealing a serious blow to our constitutional rights.

Investigative Techniques

Details about the NSA programs are sparse, but it’s widely believed the NSA uses techniques known as data mining and predictive analytics to scan the domestic calling records it collects.

In layman’s terms, data mining uses special computer programs to sift through bits of data in records, documents, and spreadsheets and assembles them–as if they were pieces of a big jigsaw puzzle–into an understandable picture.

Data mining and predictive analytics are increasingly used in manufacturing industries to gain insight as to the quality and performance of products and to predict potential defects before they enter the distribution chain. Elsewhere, credit card companies and banks use the same techniques to detect fraud.

The government maintains data mining is a mature technology that can be dependable for use in anti-terror investigation. This is somewhat misleading. While it is true that today’s data mining techniques are quite sophisticated, the accuracy of the information data mining can reveal is also dependent on the quantity of raw data available.

For example, data mining techniques in the credit card industry work well because there are enough documented examples of fraudulent use that the software can “learn” to identify the most common patterns. To accurately predict a component failure, manufacturers mine thousands of warranty reports and repair records.

Not Enough Pieces

In the NSA’s data mining puzzle, however, there just aren’t enough pieces to create a comprehensible picture.

The difficulty of using data mining techniques to predict terrorist acts in the United States is that there have been (thankfully) far too few terrorist attacks to establish useful terrorist profiles. The behaviors of Oklahoma City bomber Timothy McVeigh, 1996 Olympics bomber Eric Rudolph, and Unabomber Theodore Kaczynski were all strikingly different. It’s unlikely that software processing data based on those attacks would have detected the September 11 hijackers.

The agency’s international eavesdropping program faces similarly daunting challenges. The NSA doubtless has technology to automatically create transcripts of phone conversations, but the artificial intelligence technologies necessary to accurately predict who is a terrorist are still a very long way off. Simply searching for suspicious keywords or phrases won’t solve the problem, because innocent people frequently use words such as “bomb,” “hijack,” “airplane,” or “Osama.”

When applied to a nation of nearly 300 million people, both programs are likely to be crippled by the problem of false positives.

False Positives

Suppose there are 100 terrorists residing in the United States. Suppose the NSA has software that correctly identifies non-terrorists 99.99 percent of the time. That means it misidentifies an innocent individual as a terrorist one out of 10,000 times.

Nonetheless, that means the software would produce 30,000 suspects in its search for those 100 true terrorists. For a credit card company, a list of 30,000 possibly stolen cards is manageable, as it requires only a phone call to verify if the purchases in question are legitimate. But investigating suspected terrorists is an extremely invasive and labor-intensive process. Even if the FBI had the manpower to do so, eavesdropping on tens of thousands of innocent Americans would raise troubling constitutional issues.

An even bigger issue with mass surveillance by software is the way it would transform the principle of judicial oversight. Under current law, law enforcement officials must request a warrant from a judge for each suspect they wish to monitor. The judge examines the evidence for each suspect individually, and grants a warrant only if he or she finds probable cause that the suspect is guilty.

Automated surveillance, however, would involve a computer program monitoring tens of millions of individuals with no judicial oversight at all. Even more troubling, after the software had produced its list of suspects, the judge would be asked to approve human surveillance of the list the software produced, even though many of those on the list are probably innocent.

Constitutional rights depend on bright lines, so judges are not forced to make arbitrary judgment calls about when someone’s rights have been violated. But such bright lines would be extremely difficult to draw once the traditional “probable cause” standard has been abandoned.

Expanding automated surveillance would put a dangerous amount of power in the hands of law enforcement officials. Judges, who are rarely computer experts, would have to defer to investigators and the technicians who operate their computer systems. Given that law enforcement agencies have a long habit of pushing for expanded power, the abandonment of judicial oversight would be an ominous development.

Ultimately, there are probably few shortcuts to fighting terrorism. Although technology has many uses in law enforcement, the bulk of the work is likely to continue to be decidedly low-tech: Start with a known suspect, get a warrant to tap his or her communications, and use the evidence gathered to identify additional suspects.

That may not be as glamorous as having a computer automatically generate lists of suspects, but it’s likely to generate more reliable information. More importantly, such old-fashioned investigative techniques don’t require us to abandon the constitutional protections that have safeguarded our rights for more than two centuries.

Timothy B. Lee ([email protected]) is a policy analyst at the Show-Me Institute, a non-partisan public policy research organization in St. Louis, and a regular contributor to the Technology Liberation Front Web site.