March 29, 2024

Search Results for: Century Wiretapping

21st Century Wiretapping: Risk of Abuse

Today I’m returning, probably for the last time, to the public policy questions surrounding today’s wiretapping technology. Thus far in the series (1, 2, 3, 4, 5, 6, 7, 8) I have described how technology enables wiretapping based on automated recognition of certain features of a message (rather than individualized suspicion of a person), I have laid out the argument in favor of allowing such content-triggered wiretaps given a suitable warrant, and I have addressed some arguments against allowing them. These counterarguments, I thnk, show that content-triggered wiretaps be used carefully and with suitable oversight, but they do not justify forgoing such wiretaps entirely.

The best argument against content-triggered wiretaps is the risk of abuse. By “abuse” I mean the use of wiretaps, or information gleaned from wiretaps, illegally or for the wrong reasons. Any wiretapping regime is subject to some kind of abuse – even if we ban all wiretapping by the authorities, they could still wiretap illegally. So the risk of abuse is not a new problem in the high-tech world.

But it is a worse problem than it was before. The reason is that to carry out content-triggered wiretaps, we have to build an infrastructure that makes all communications available to devices managed by the authorities. This infrastructure enables new kinds of abuse, for example the use of content-based triggers to detect political dissent or, given enough storage space, the recording of every communication for later (mis)use.

Such serious abuses are not likely, but given the harm they could do, even a tiny chance that they could occur must be taken seriously. The infrastructure of content-triggered wiretaps is the infrastructure of a police state. We don’t live in a police state, but we should worry about building police state infrastructure. To make matters worse, I don’t see any technological way to limit such a system to justified uses. Our only real protections would be oversight and the threat of legal sanctions against abusers.

To sum up, the problem with content-triggered wiretaps is not that they are bad policy by themselves. The problem is that doing them requires some very dangerous infrastructure.

Given this, I think the burden should be on the advocates of content-triggered wiretaps to demonstrate that they are worth the risk. I won’t be convinced by hypotheticals, even vaguely plausible ones. I won’t be convinced, either, by vague hindsight claims that such wiretaps coulda-woulda-shoulda captured some specific badguy. I’m willing to be convinced, but you’ll have to show me some evidence.

Twenty-First Century Wiretapping: False Positives

Lately I’ve been writing about the policy issues surrounding government wiretapping programs that algorithmically analyze large amounts of communication data to identify messages to be shown to human analysts. (Past posts in the series: 1; 2; 3; 4; 5; 6; 7.) One of the most frequent arguments against such programs is that there will be too many false positives – too many innocent conversations misidentified as suspicious.

Suppose we have an algorithm that looks at a set of intercepted messages and classifies each message as either suspicious or innocuous. Let’s assume that every message has a true state that is either criminal (i.e., actually part of a criminal or terrorist conspiracy) or innocent. The problem is that the true state is not known. A perfect, but unattainable, classifier would label a message as suspicious if and only if it was criminal. In practice a classifier will make false positive errors (mistakenly classifying an innocent message as suspicious) and false negative errors (mistakenly classifying a criminal message as innocuous).

To illustrate the false positive problem, let’s do an example. Suppose we intercept a million messages, of which ten are criminal. And suppose that the classifier correctly labels 99.9% of the innocent messages. This means that 1000 innocent messages (0.1% of one million) will be misclassified as suspicious. All told, there will be 1010 suspicious messages, of which only ten – about 1% – will actually be criminal. The vast majority of messages labeled as suspicious will actually be innocent. And if the classifier is less accurate on innocent messages, the imbalance will be even more extreme.

This argument has some power, but I don’t think it’s fatal to the idea of algorithmically classifying intercepts. I say this for three reasons.

First, even if the majority of labeled-as-suspicous messages are innocent, this doesn’t necessarily mean that listening to those messages is unjustified. Letting the police listen to, say, ten innocent conversations is a good tradeoff if the eleventh conversation is a criminal one whose interception can stop a serious crime. (I’m assuming that the ten innocent conversations are chosen by some known, well-intentioned algorithmic process, rather than being chosen by potentially corrupt government agents.) This only goes so far, of course – if there are too many innocent conversations or the crime is not very serious, then this type of wiretapping will not be justified. My point is merely that it’s not enough to argue that most of the labeled-as-suspcious messages will be innocent.

Second, we can learn by experience what the false positive rate is. By monitoring the operation of the system, we can see learn how many messages are labeled as suspicious and how many of those are actually innocent. If there is a warrant for the wiretapping (as I have argued there should be), the warrant can require this sort of monitoring, and can require the wiretapping to be stopped or narrowed if the false positive rate is too high.

Third, classification algorithms have (or can be made to have) an adjustable sensitivity setting. Think of it as a control knob that can be moved continuously between two extremes, where one extreme is labeled “avoid false positives” and the other is labeled “avoid false negatives”. Adjusting the knob trades off one kind of error for the other.

We can always make the false positive rate as low as we like, by turning the knob far enough toward “avoid false positives”. Doing this has a price, because turning the knob in that direction also increases the number of false negatives, that is, it causes some criminal messages to be missed. If we turn the knob all the way to the “avoid false positives” end, then there will be no false positives at all, but there might be many false negatives. Indeed, we might find that when the knob is turned to that end, all messages, whether criminal or not, are classified as innocuous.

So the question is not whether we can reduce false positives – we know we can do that – but whether there is anywhere we can set the knob that gives us an acceptably low false positive rate yet still manages to flag some messages that are criminal.

Whether there is an acceptable setting depends on the details of the classification algorithm. If you forced me to guess, I’d say that for algorithms based on today’s voice recognition or speech transcription technology, there probably isn’t an acceptable setting – to catch any appreciable number of criminal conversations, we’d have to accept huge numbers of false positives. But I’m not certain of that result, and it could change as the algorithms get better.

The most important thing to say about this is that it’s an empirical question, which means that it’s possible to gather evidence to learn whether a particular algorithm offers an acceptable tradeoff. For example, if we had a candidate classification algorithm, we could run it on a large number of real-world messages and, without recording any of those messages, simply count how many messages the algorithm would have labeled as suspicious. If that number were huge, we would know we had a false positive problem. We could do this for different settings of the knob, to see where we had to get an acceptable false positive rate. Then we could apply the algorithm with that knob setting to a predetermined set of known-to-be-criminal messages, to see how many it flagged.

If governments are using algorithmic classifiers – and the U.S. government may be doing so – then they can do these types of experiments. Perhaps they have. It doesn’t seem too much to ask for them to report on their false positive rates.

Twenty-First Century Wiretapping: Reconciling with the Law

When the NSA’s wiretapping program first came to light, the White House said, mysteriously, that they didn’t get warrants for all of their wiretaps because doing so would have been impractical. Some people dismissed that as empty rhetoric. But for the rest of us, it was a useful hint about how the program worked, implying that the wiretapping was triggered by the characteristics of a call (or its contents) rather than following individuals who were specifically suspected of being terrorists.

As I wrote previously, content-based triggering is a relatively recent phenomenon, having become practical only with the arrival of the digital revolution. Our laws about search, seizure, and wiretapping mostly assume the pre-digital world, so they don’t do much to address the possibility of content-based triggering. The Fourth Amendment, for example, says that search warrants must “particularly describ[e] the place to be searched, and the persons or things to be seized.” Wiretapping statutes similarly assume wiretaps are aimed at identified individuals.

So when the NSA and the White House wanted to do searches with content-based triggering, there was no way to get a warrant that would allow them to do so. That left them with two choices: kill the program, or proceed without warrants. They chose the latter, and they now argue that warrants aren’t legally necessary. I don’t know whether their legal arguments hold water (legal experts are mostly skeptical) but I know it would be better if there were a statute that specifically addressed this situation.

The model, procedurally at least, would follow the Foreign Intelligence Surveillance Act (FISA). In FISA, Congress established criteria under which U.S. intelligence agencies could wiretap suspected spies and terrorists. FISA requires agencies to get warrants for such wiretaps, by applying to a special secret court, in a process designed to balance national security against personal privacy. There are also limited exceptions; for example, there is more leeway to wiretap in the first days of a war. Whether or not you like the balance point Congress chose in FISA, you’ll agree, I hope, that it’s good for the legislature to debate these tradeoffs, to establish a general policy, rather than leaving everything at the discretion of the executive branch.

If it took up this issue, Congress might decide to declare that content-based triggering is never acceptable. More likely, it would establish a set of rules and principles to govern wiretaps that use content-based triggering. Presumably, the new statute would establish a new kind of warrant, perhaps granted by the existing FISA court, and would say what justification needed to be submitted to the court, and what reporting needed to done after a warrant was granted. Making these choices wisely would mitigate some of the difficulties with content-based triggering.

Just as important, it would create a constructive replacement for the arguments over the legality of the current NSA program. Today, those arguments are often shouting matches between those who say the program is far outside the law, and those who say that the law is outdated and is blocking necessary and reasonable intelligence-gathering. A debate in Congress, and among citizens, can help to break this rhetorical stalemate, and can re-establish the checks and balances that keep government’s power vital but limited.

Twenty-First Century Wiretapping: Content-Based Suspicion

Yesterday I argued that allowing police to record all communications that are flagged by some automated algorithm might be reasonable, if the algorithm is being used to recognize the voice of a person believed (for good reason) to be a criminal. My argument, in part, was that that kind of wiretapping would still be consistent with the principle of individualized suspicion, which says that we shouldn’t wiretap someone unless we have strong enough reason to suspect them, personally, of criminality.

Today, I want to argue that there are cases where even individualized suspicion isn’t necessary. I’ll do so by introducing yet another hypothetical.

Suppose we have reliable intelligence that al Qaeda operatives have been instructed to use a particular verbal handshake to identify each other. Operatives will prove they were members of al Qaeda by carrying out some predetermined dialog that is extremely unlikely to occur naturally. Like this, for instance:

First Speaker: The Pirates will win the World Series this year.
Second Speaker: Yes, and Da Vinci Code is the best movie ever made.

The police ask us for permission to run automated voice recognition algorithms on all phone conversations, and to record all conversations that contain this verbal handshake. Is it reasonable to give permission?

If the voice recognition is sufficiently accurate, this could be reasonable – even though the wiretapping is not based on advance suspicion of any particular individual. Suspicion is based not on the identity of the individuals speaking, but on the content of the communication. (You could try arguing that the content causes individualized suspicion, at the moment it is analyzed, but if you go that route the individualized suspicion principle doesn’t mean much anymore.)

Obviously we wouldn’t give the police carte blanche to use any kind of content-based suspicion whenever they wanted. What makes this hypothetical different is that the suspicion, though content-based, is narrowly aimed and is based on specific evidence. We have good reason to believe that we’ll be capturing some criminal conversations, and that we won’t be capturing many noncriminal ones. This, I think, is the general principle: intercepted communications may only be made known to a human based on narrowly defined triggers (whether individual-based or content-based), and those triggers must be justified based on specific evidence that they will be fruitful but not overbroad.

You might argue that if the individualized suspicion principle has been good enough for the past [insert large number] years, it should be good enough for the future too. But I think this argument misses an important consequence of changing technology.

Back before the digital revolution, there were only two choices: give the police narrow warrants to search or wiretap specific individuals or lines, or give the police broad discretion to decide whom to search or wiretap. Broad discretion was problematic because the police might search too many people, or might search people for the wrong reasons. Content-based triggering, where a person got to overhear the conversation only if its content satisfied specific trigger rules, was not possible, because the only way to tell whether the trigger was satisfied was to have a person listen to the conversation. And there was no way to unlisten to that conversation if the trigger wasn’t present. Technology raises the possibility that automated algorithms can implement triggering rules, so that content-based triggers become possible – in theory at least.

Given that content-based triggering was infeasible in the past, the fact that traditional rules don’t make provision for it does not, in itself, end the argument. This is the kind of situation that needs to be evaluated anew, with proper respect for traditional principles, but also with an open mind about how those principles might apply to our changed circumstances.

By now I’ve convinced you, I hope, that there is a plausible argument in favor of allowing government to wiretap based on content-based triggers. There are also plausible arguments against. The strongest ones, I think, are (1) that content-based triggers are inconsistent with the current legal framework, (2) that content-based triggers will necessarily make too many false-positive errors and thereby capture too many innocent conversations, and (3) that the infrastructure required to implement content-based triggers creates too great a risk of abuse. I’ll wrap up this series with three more posts, discussing each of these arguments in turn.

Twenty-First Century Wiretapping: Recognition

For the past several weeks I’ve been writing, on and off, about how technology enables new types of wiretapping, and how public policy should cope with those changes. Having laid the groundwork (1; 2; 3; 4; 5) we’re now ready for to bite into the most interesting question. Suppose the government is running, on every communication, some algorithm that classifies messages as suspicious or not, and that every conversation labeled suspicious is played for a government agent. When, if ever, is government justified in using such a scheme?

Many readers will say the answer is obviously “never”. Today I want to argue that that is wrong – that there are situations where automated flagging of messages for human analysis can be justified.

A standard objection to this kind of algorithmic triggering is that authority to search or wiretap must be based on individualized suspicion, that is, that there must be sufficient cause to believe that a specific individual is involved in illegal activity, before that individual can be wiretapped. To the extent that that is an assertion about current U.S. law, it doesn’t answer my question – recall that I’m writing here about what the legal rules should be, not what they are. Any requirement of individualized suspicion must be justified on the merits. I understand the argument for it on the merits. All I’m saying is that that argument doesn’t win by default.

One reason it shouldn’t win by default is that individualized suspicion is sometimes consistent with algorithmic recognition. Suppose that we have strong cause to believe that Mr. A is planning to commit a terrorist attack or some other serious crime. This would justify tapping Mr. A’s phone. And suppose we know Mr. A is visiting Chicago but we don’t know exactly where in the city he is, and we expect him to make calls on random hotel phones, pay phones, and throwaway cell phones. Suppose further that the police have good audio recordings of Mr. A’s voice.

The police propose to run automated voice recognition software on all phone calls in the Chicago area. When the software flags a recording as containing Mr. A’s voice, that recording will be played for a police analyst, and if the analyst confirms the voice as Mr. A’s, the call will be recorded. The police ask us, as arbiters of the public good, for clearance to do this.

If we knew that the voice recognition algorithm would be 100% accurate, then it would be hard to object to this. Using an automated algorithm would be more consistent with the principle of individualized suspicion than would be the traditional approach of tapping Mr. A’s home phone. His home phone, after all, might be used by an innocent family member or roommate, or by a plumber working in his house

But of course voice recognition is not 100% accurate. It will miss some of Mr. A’s calls, and it will incorrectly flag some calls by others. How serious a problem is this? It depends on how many errors the algorithm makes. The traditional approach sometimes records innocent people – others might use Mr. A’s phone, or Mr. A might turn out to be innocent after all – and these errors make us cautious about wiretapping but don’t preclude wiretapping if our suspicion of Mr. A is strong enough. The same principle ought to hold for automated voice recognition. We should be willing to accept some modest number of errors, but if errors are more frequent we ought to require a very strong argument that recording Mr. A’s phone calls is of critical importance.

In practice, we would want to set out crisply defined criteria for making these determinations, but we don’t need to do that exercise here. It’s enough to observe that given sufficiently accurate voice recognition technology – which might exist some day – algorithmically triggered recording can be (a) justified, and (b) consistent with the principle of individualized suspicion.

But can algorithmic triggering be justified, even if not based on individualized suspicion? I’ll argue next time that it can.