All Posts

This page shows all posts from all authors. Selected posts also appear on the front page.

RIAA, MPAA Join Internet2 Consortium

RIAA and MPAA, trade associations that include the major U.S. record and movie companies, joined the Internet2 consortium on Friday, according to a joint press release. I've heard some alarm about this, suggesting that this will allow the AAs to control how the next generation Internet is built. But once we strip away the hype, there's not much to worry about in this announcement.

Despite its grand name, Internet2 is not a new network. Its main purpose has been to add some fast links to today's Internet, to connect bandwidth-hungry universities, e.g., so that researchers at one university can explore the results of climate simulations done at a peer university. The Internet2 links carry traffic of all sorts and they use the same protocols as the rest of the Internet.

A lesser function of Internet2 is to host discussions among researchers studying specific topics. It's good when people studying similar problems can talk to each other, as long as one group isn't put in charge of what the other groups do. And as I understand it, the Internet2 discussions are just that – discussions – and not a top-down management structure. So it doesn't look to me like Internet2, as a corporate body, could do much to divert the natural course of research, even if it wanted to.

Finally, Internet2 is not in a position to dicate what technology gets deployed in the future Internet. Internet2 may give birth to ideas that are then adopted by the industry; but those ideas will only be deployed if market pressures drive the industry to build them. If the AAs think that they can sit down with Internet2 and negotiate the future of the Internet, they're sadly mistaken. But I very much doubt that that's what they think.

So why are the AAs joining Internet2? My guess is that they joined for mostly the same reasons that other non-IT-industry corporate members did. Why did Johnson and Johnson join? Why did Ford join? Because their business strategies depend on the future of high-performance networks. The same is true of the record and movie companies. Their business models will one day center on online, digital distribution of content. It's best for them, and probably for everybody else too, if they face that future squarely, right away. I'm hope their presence in Internet2 will help them see what is coming, and figure out how to adapt to it.

Acoustic Snooping on Typed Information

Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio recording of somebody typing on an ordinary computer keyboard for fifteen minutes or so, you can figure out everything they typed. The idea is that different keys tend to make slightly different sounds, and although you don't know in advance which keys make which sounds, you can use machine learning to figure that out, assuming that the person is mostly typing English text. (Presumably it would work for other languages too.)

Asonov and Agrawal had a similar result previously, but they had to assume (unrealistically) that you started out with a recording of the person typing a known training text on the target keyboard. The new method eliminates that requirement, and so appears to be viable in practice.

The algorithm works in three basic stages. First, it isolates the sound of each individual keystroke. Second, it takes all of the recorded keystrokes and puts them into about fifty categories, where the keystrokes within each category sound very similar. Third, it uses fancy machine learning methods to recover the sequence of characters typed, under the assumption that the sequence has the statistical characteristics of English text.

The third stage is the hardest one. You start out with the keystrokes put into categories, so that the sequence of keystrokes has been reduced a sequence of category-identifiers – something like this:

35, 12, 8, 14, 17, 35, 6, 44, ...

(This means that the first keystroke is in category 35, the second is in category 12, and so on. Remember that keystrokes in the same category sound alike.) At this point you assume that each key on the keyboard usually (but not always) generates a particular category, but you don't know which key generates which category. Sometimes two keys will tend to generate the same category, so that you can't tell them apart except by context. And some keystrokes generate a category that doesn't seem to match the character in the original text, because the key happened to sound different that time, or because the categorization algorithm isn't perfect, or because the typist made a mistake and typed a garbbge charaacter.

The only advantage you have is that English text has persistent regularities. For example, the two-letter sequence "th" is much more common that "rq", and the word "the" is much more common than "xprld". This turns out to be enough for modern machine learning methods to do the job, despite the difficulties I described in the previous paragraph. The recovered text gets about 95% of the characters right, and about 90% of the words. It's quite readable.

[Exercise for geeky readers: Assume that there is a one-to-one mapping between characters and categories, and that each character in the (unknown) input text is translated infallibly into the corresponding category. Assume also that the input is typical English text. Given the output category-sequence, how would you recover the input text? About how long would the input have to be to make this feasible?]

If the user typed a password, that can be recovered too. Although passwords don't have the same statistical properties as ordinary text (unless they're chosen badly), this doesn't pose a problem as long as the password-typing is accompanied by enough English-typing. The algorithm doesn't always recover the exact password, but it can come up with a short list of possible passwords, and the real password is almost always on this list.

This is yet another reminder of how much computer security depends on controlling physical access to the computer. We've always known that anybody who can open up a computer and work on it with tools can control what it does. Results like this new one show that getting close to a machine with sensors (such as microphones, cameras, power monitors) may compromise the machine's secrecy.

There are even some preliminary results showing that computers make slightly different noises depending on what computations they are doing, and that it might be possible to recover encryption keys if you have an audio recording of the computer doing decryption operations.

I think I'll go shut my office door now.

Aussie Judge Tweaks Kazaa Design

A judge in Australia has found Kazaa and associated parties liable for indirect copyright infringement, and has tentatively imposed a partial remedy that requires Kazaa to institute keyword-based filtering.

The liability finding is based on a conclusion that Kazaa improperly "authorized" infringement. This is roughly equivalent to a finding of indirect (i.e. contributory or vicarious) infringement under U.S. law. I'm not an expert in Australian law, so on this point I'll refer you to Kim Weatherall's recap.

As a remedy, the Kazaa parties will have to pay the 90% of the copyright owners' trial expenses, and will have to pay damages for infringement, in an amount to be determined by future proceedings. (According to Kim Weatherall, Australian law does not allow the copyright owners to reap automatic statutory damages as in the U.S. Instead, they must prove actual damages, although the damages are boosted somehow for infringements that are "flagrant".)

More interestingly, the judge has ordered Kazaa to change the design of their product, by incorporating keyword-based filtering. Kazaa allows users to search for files corresponding to certain artist names and song titles. The required change would disallow search terms containing certain forbidden patterns.

Designing such a filter is much harder than it sounds, because there are so many artist names and song names. These two namespaces are so crowded that a great many common names given to non-infringing recordings are likely to contain forbidden patterns.

The judge's order uses the example of the band Powderfinger. Presumably the modified version of Kazaa would ban searches with "Powderfinger" as part of the artist name. This is all well and good when the artist name is so distinctive. But what if the artist name is a character string that occurs frequently in names, such as "beck", "smiths", or "x"? (All are names of artists with copyrighted recordings.) Surely there will be false positives.

It's even worse for song names. You would have to ban simple words and phrases, like "Birthday", "Crazy", "Morning", "Sailing", and "Los Angeles", to name just a few. (All are titles of copyrighted recordings.)

The judge's order asks the parties to agree on the details of how a filter will work. If they can't agree on the details, the judge will decide. Given the enormous number of artist and song names, and the crowded namespace, there are a great many details to decide, balancing over- and under-inclusiveness. It's hard to see how the parties can agree on all of the details, or how the judge can impose a detailed design. The only hope is to appoint some kind of independent arbiter to make these decisions.

Ultimately, I think the tradeoff between over- and under-inclusiveness will prove too difficult – the filters will either fail to block many infringing files, or will block many non-infringing files, or both.

This is the same kind of filtering that Judge Patel ordered Napster to use, after she found Napster liable for indirect infringement. It didn't work for Napster. Users just changed the spelling of artist and song names, adopting standard misspellings (e.g., "Metallica" changed to "Metalica" or "MetalIGNOREica" or the Pig Latin "Itallicamay"), or encoding the titles somehow. Napster updated its filters to compansate, but was always one step behind. And Napster's job was easier, because the filtering was done on Napster's own computers. Kazaa will have to try to download updates to users' computers every time it changes its filters.

To the judge's credit, he acknowledges that filtering will be imprecise and might even fail miserably. So he orders only that Kazaa must use filtering, but not that the filtering must succeed in stopping infringement. As long as Kazaa makes its best effort to make the agreed-upon (or ordered) filtering scheme work, it will have have satisfied the order, even if infringement goes on.

Kim Weatherall calls the judge's decision "brave", because it wades into technical design and imposes a remedy that requires an ongoing engagement between the parties, two things that courts normally try to avoid. I'm not optimistic about this remedy – it will impose costs on both sides and won't do much to stop infringement. But at least the judge didn't just order Kazaa to stop all infringement, an order with which no general-purpose communication technology could ever hope to comply.

In the end, the redesign may be moot, as the prospect of financial damages may kill Kazaa before the redesign must occur. Kazaa is probably dying anyway, as users switch to newer services. From now on, the purpose of Kazaa, in the words of the classic poster, may be to serve as a warning to others.

Back in the Saddle

Hi, all. I'm back from a lovely vacation, which included a stint camping in Sequoia / King's Canyon National Park, beyond the reach of Internet technology. In transit, I walked right by Jack Valenti in the LA airport. He looked as healthy as ever, and more relaxed than in his MPAA days.

Blogging will resume tomorrow, once I've dug out sufficiently from the backlog. In the meantime, I recommend reading Kim Weatherall's summary of the Australian judge's decision in the Kazaa case.

Recommended Reading: The Success of Open Source

It's easy to construct arguments that open source software can't succeed. Why would people work for free to make something that they could get paid for? Who will do the dirty work? Who will do tech support? How can customers trust a "vendor" that is so diffuse and loosely organized?

And yet, open source has had some important successes. Apache dominates the market for web server software. Linux and its kin are serious players in the server operating system market. Linux is even a factor in the desktop OS market. How can this be reconciled with what we know about economics and sociology?

Many articles and books have been written about this puzzle. To my mind, Steven Weber's book "The Success of Open Source" is the best. Weber explores the open source puzzle systematically, breaking it down into interesting subquestions and exploring answers. One of the book's virtues is that it doesn't claim to have complete answers; but it does present and dissect partial answers and hints. This is a book that could merit a full book club discussion, if people are interested.

Tagged:  

Recommended Reading: Crime-Facilitating Speech

Eugene Volokh has an interesting new paper about Crime-Facilitating Speech (abridged version): "speech [that] provides information that makes it easier to commit crimes, torts, or other harms". He argues convincingly that many free-speech cases pertain to crime-facilitating speech. Somebody wants to prevent speech because it may facilitate crime, but others argue that the speech has beneficial effects too. When should such speech be allowed?

The paper is a long and detailed discussion of this issues, with many examples. In the end, he asserts that crime-facilitating speech should be allowed except where (a) "the speech is said to a few people who the speaker knows are likely to use it to commit a crime or to escape punishment", (b) the speech "has virtually no noncriminal uses", (c) "the speech facilitates extraordinarily serious harms, such as nuclear or biological attacks". But don't just read the end – if you have time it's well worth the effort to understand how he got there.

What struck me is how many of the examples relate to computer security or copyright enforcement. Many security researchers feel that the applied side of the field has become a legal minefield. Papers like this illustrate how that happened. The paper's recommendations, if followed, would go a long way toward making legitimate research and publication safer.

ICANN Challenged on .xxx Domain

The U.S. government has joined other governments and groups in asking ICANN to delay implementation of a new ".xxx" top-level domain, according to a BBC story.

Adding a .xxx domain would make little difference in web users' experiences. Those who want to find porn can easily find it already; and those who want to avoid it can easily avoid it. It might seem at first that the domain will create more "space" for porn sites. But there's already "space" on the web for any new site, of any type, that somebody wants to create. The issue here is not whether sites can exist, but what they can call themselves.

Adding .xxx won't make much difference in how sites are named, either. I wouldn't be happy to see a porn site at freedom-to-tinker.xxx; nor would the operator of that site be happy to see my site here at freedom-to-tinker.com. The duplication just causes confusion. Your serious profit-oriented porn purveyor will want to own both the .com and .xxx versions of his site's URL; and there's nothing to stop him from owning both.

Note also that the naming system does not provide an easy way for end users to get a catalog of all names that end in a particular suffix. Anybody can build an index of sites that fall into a particular category. Such indices surely exist for porn sites.

The main effect of adding .xxx would be to let sites signal that they have hard-core content. That's reportedly the reason adult theaters started labeling their content "XXX" in the first place – so customers who wanted such content could learn where to find it.

That kind of signaling is a lousy reason to create a new top-level domain. There are plenty of other ways to signal. For example, sites that wanted to signal their XXX nature could offer their home page at xxx.sitename.com in addition to www.sitename.com. But ICANN has chosen to create .xxx anyway.

Which brings us to the governments' objections. Perhaps they object to .xxx as legitimizing the existence of porn on the net. Or perhaps they object to the creation of a mechanism that will make it easier for people to find porn.

These objections aren't totally frivolous. There's no top-level domain for religious groups, or for science, or for civic associations. Why create one for porn? And surely the private sector can fill the need for porn-signaling technology. Why is ICANN doing this? (Governments haven't objected to ICANN's decisions before, even though those decisions often made no more sense than this decision does. But that doesn't mean ICANN is managing the namespace well.)

And so ICANN's seemingly arbitrary management of the naming system brings it into conflict with governments. This is a sticky situation for ICANN. ICANN is nominally in charge of Internet naming, but ICANN's legitimacy as a "government" for the net has always been shaky, and it has to worry about losing what legitimacy it has if the U.S. joins the other governments who want to replace ICANN with some kind of consortium of nations.

The U.S. government is asking ICANN to delay implementation of .xxx so it can study the issue. We all know what that means. Expect .xxx to fade away quietly as the study period never ends.

DMCA, and Disrupting the Darknet

Fred von Lohmann's paper argues that the DMCA has failed to keep infringing copies of copyrighted works from reaching the masses. Fred argues that the DMCA has not prevented "protected" files from being ripped, and that once those files are ripped they appear on the darknet where they are available to everyone. I think Fred is right that the DMCA and the DRM (anti-copying) technologies it supports have failed utterly to keep material off the darknet.

Over at the Picker MobBlog, several people have suggested an alternate rationale for the DMCA: that it might help raise the cost and difficulty of using the darknet. The argument is that even if the DMCA doesn't help keep content from reaching the darknet, it may help stop material on the darknet from reaching end users.

I don't think this rationale works. Certainly, copyright owners using lawsuits and technical attacks in an attempt to disrupt the darknet. They have sued many end users and a few makers of technologies used for darknet filesharing. They have launched technical attacks including monitoring, spoofing, and perhaps even limited denial of service attacks. The disruption campaign is having a nonzero effect. But as far as I can tell, the DMCA plays no role in this campaign and does nothing to bolster it.

Why? Because nobody on the darknet is violating the DMCA. Files arrive on the darknet having already been stripped of any technical protection measures (TPMs, in the DMCA lingo). TPMs just aren't present on the darknet. And you can't circumvent a TPM that isn't there.

To be sure, many darknet users break the law, and some makers of darknet technologies apparently break the law too. But they don't break the DMCA; and indeed the legal attacks on the darknet have all been based on old-fashioned direct copyright infringement by end users, and contributory or vicarious infringement by technology makers. Even if there were no DMCA, the same legal and technical arms race would be going on, with the same results.

Though it has little if anything to do with the DMCA, the darknet technology arms race is an interesting topic in itself. In fact, I'm currently writing a paper about it, with my students Alex Halderman and Harlan Yu.

Tagged:  

DMCA: An Avoidable Failure

In his new paper, Fred von Lohmann argues that the Digital Millennium Copyright Act of 1998, when evaluated on its own terms, is a failure. Its advocates said it would prevent widespread online copyright infringement; and it has not done so.

Fred is right on target in diagnosing the DMCA's failure to do what its advocates predicted. What Fred doesn't say, though, is that this failure should have been utterly predictable – it should have been obvious when the DMCA was grinding through Congress that things would end up like this.

Let's look at the three assumptions that underlie the darknet argument [quoting Fred]:

  1. Any widely distributed object will be available to some fraction of users in a form that permits copying.
  2. Users will copy objects if it is possible and interesting to do so.
  3. Users are connected by high-bandwidth channels.

When the DMCA passed in 1998, #1 was obviously true, and #3 was about to become true. #2 was the least certain; but if #2 turned out to be false then no DMCA-like law would be necessary anyway. So why didn't people see this failure coming in advance?

The answer is that many people did, but Congress ignored them. The failure scenario Fred describes was already conventional wisdom among independent computer security experts by 1998. Within the community, conversations about the DMCA were not about whether it would work – everybody knew it wouldn't – but about why Washington couldn't see what seemed obvious to us.

When the Darknet paper was published in 2001, people in the community cheered. Not because the paper had much to say to the security community – the paper's main argument had long been conventional wisdom – but because the paper made the argument in a clear and accessible way, and because, most of all, the authors worked for a big IT company.

For quite a while, employees of big IT companies had privately denigrated DRM and the DMCA, but had been unwilling to say so in public. Put a microphone in front of them and they would dodge questions, change the subject, or say what their employer's official policy was. But catch them in the hotel bar afterward and they would tell a different story. Everybody knew that dissenting from the corporate line was a bad career move; and nobody wanted to be the first to do it.

And so the Darknet paper caused quite a stir outside the security community, catalyzing a valuable conversation, to which Fred's paper is a valuable contribution. It's an interesting intellectual exercise to weigh the consequences of the DMCA in an alternate universe where it actually prevents online infringement; but if we restrict ourselves to the facts on the ground, Fred has a very strong argument.

The DMCA has failed to prevent online infringement; and that failure should have been predictable. To me, the most interesting question is how our policymakers can avoid making this kind of mistake again.

Tagged:  

Measuring the DMCA Against the Darknet

Next week I'll be participating in a group discussion of Fred von Lohmann's new paper, "Measuring the DMCA Against the Darknet", over at the Picker MobBlog. Other participants will include Julie Cohen, Wendy Gordon, Doug Lichtman, Jessica Litman, Bill Patry, Bill Rosenblatt, Larry Solum, Jim Speta, Rebecca Tushnet, and Tim Wu.

I'm looking forward to a lively debate. I'll cross-post my entries here, with appropriate links back to the discussion over there.

Tagged:  
Syndicate content