All Posts

This page shows all posts from all authors. Selected posts also appear on the front page.

Unwanted Calls and Spam on VoIP

Fred Cohen is predicting that VoIP will bring with it a flood of unsolicited commercial phone calls. (VoIP, or "Voice over Internet Protocol," systems deliver telephone-like service, making connections via the Internet rather than using the wires of the plain old telephone system.) Cohen argues that VoIP will drive down the cost of international calling to nearly zero, thereby making international telemarketing calls very cheap. He also argues that small overseas call centers will violate the U.S. Do Not Call List with impunity.

This comes on top of concerns about SPIT, or Spam over Internet Telephony. SPIT sends machine-generated voice calls to the phones or voicemail boxes of VoIP users; Cohen worries about VoIP-mediated calls from live people. In a previous article about SPIT, VoIP vendors argue, unconvincingly, that they can handle the SPIT problem.

The root cause of this problem is the same as for email spam. Whenever a communication technology (1) allows anybody to communicate with anybody else, (2) at very low cost, unsolicited and unwanted communication will be a problem. We saw it with spam, and now we'll see it with SPIT and VoIP telemarketing.

End-users can try to protect themselves from VoIP annoyances by using some of the same methods used against email spam. Whitelists (lists of trusted people), blacklists (lists of suspected spammers), challenge-response, and ultimately even automatic classification and filtering of voice messages, all seem likely to be tried at some point in the future. But as with email spam, don't expect them to solve the problem, but only to reduce the annoyance level somewhat.

An even more interesting question is whether service providers can address the problem, perhaps by ejecting bad actors from their networks. This depends on how a particular network is structured. Some networks are closely controlled; these will have some chance of ejecting villains. Some networks rely on open protocols, so that nobody is in a position of control – the villains will just connect to the network as they please, and perhaps reconnect periodically under new names. Things get more challenging when different networks connect to each other, so that their legitimate clients can talk to each other. If a closed network connects to an open one, villains on the open network may be able to reach customers of the closed network, despite the best efforts of the closed network's administrator.

Can't we just use closed networks instead of open ones? If only it were so simple. Open networks have important advantages over closed ones; and many people will choose open networks because of these advantages, and in spite of the possibly heavier spam load on open networks. They may well be right to make that choice.

Because all of this calling will be done on the Internet, an open and tremendously flexible network, there are many creative attacks on these problems. For example, an open authentication infrastructure might provide a kind of CallerID service for VoIP, or even a certification of non-spammerness. Expect the technological battle to go on for years.

Tagged:  

Pharming

Internet spoofing attacks have been getting more and more sophisticated. The latest evil trick is "Pharming," which relies on DNS poisoning (explanation below) to trick users about which site they are viewing. Today I'll explain what pharming is. I'll talk about fixes later in the week.

Spoofing attacks, in general, try to get a user to think he is viewing one site (say, Citibank's home banking site) when he is really viewing a bogus site created by a villain. The villain makes his site look just like Citibank's site, so that the user will trust the site and enter information, such as his Citibank account number and password, into it. The villain then exploits this information to do harm.

Today most spoofing attacks use "phishing." The villain sends the victim an email, which is forged to look like it came from the target site. (Forging email is very easy – the source and content of email messages are not verified at all.) The forged email may claim to be a customer service message asking the victim to do something on the legitimate site. The email typically contains a hyperlink purporting to go to the legitimate site but really going to the villain's fake site. If the victim clicks the hyperlink, he sees the fake site.

The best defense against phishing is to distrust email messages, especially ones that ask you to enter sensitive information into a website, and to distrust hyperlinks in email messages. Another defense is to have your browser tell you the name of the site you are really visiting. (The browser's Address line tries to do this, so in theory you could just look there, but various technical tricks may make this harder than you think.) Tools like SpoofStick display "You're on freedom-to-tinker.com" in big letters at the top of your browser window, so that you're not fooled about which site you're viewing. The key idea in these defenses is that your browser knows which domain (e.g. "citibank.com" or "freedom-to-tinker.com") the displayed page is coming from.

"Pharming" tries to fool your computer about where the data is coming from. It does this by attacking DNS (Domain Name Service), the service that interprets names like "freedom-to-tinker.com" for you.

The Internet uses two types of addresses to designate machines. IP addresses are numbers like 128.112.68.1. Every data packet that travels across the Internet is labeled with source and destination IP addresses, which are used to route the packet from the packet's source to its destination.

DNS addresses are text-strings like www.citibank.com. The Internet's routing infrastructure doesn't know anything about DNS addresses. Instead, a DNS address must be translated into an IP address before data can be routed to it. Your browser translated the DNS address "www.freedom-to-tinker.com" into the IP address "216.157.129.231" in the process of fetching this page. To do this, your browser probably consulted one or more servers out on the Internet, to get information about proper translations.

"Pharming" attacks the translation process, to trick your computer somehow into accepting a false translation. If your computer accepts a false translation for "citibank.com," then when you communicate with "citibank.com" your packets will go to the villain's IP address, and not to the IP address of Citibank. I'll omit the details of how a villain might do this, as this post is already pretty long. But here's the scary part: if a pharming attack is successful, there is no information on your computer to indicate that anything is wrong. As far as your computer (and the software on it) is concerned, everything is working fine, and you really are talking to "citibank.com". Worse yet, the attack can redirect all of your Citibank-bound traffic – email, online banking, and so on – to the villain's computer.

What can be done about this problem? That's a topic for another day.

Tagged:  

Harvard Business School Boots 119 Applicants for "Hacking" Into Admissions Site

Harvard Business School (HBS) has rejected 119 applicants who allegedly "hacked" in to a third-party site to learn whether HBS had admitted them. An AP story, by Jay Lindsay, has the details.

HBS interacts with applicants via a third-party site called ApplyYourself. Harvard had planned to notify applicants whether they had been admitted, on March 30. Somebody discovered last week that some applicants' admit/reject letters were already available on the ApplyYourself website. There were no hyperlinks to the letters, but a student who was logged in to the site could access his/her letter by constructing a special URL. Instructions for doing this were posted in an online forum frequented by HBS applicants. (The instructions, which no longer work due to changes in the ApplyYourself site, are reproduced here.) Students who did this saw either a rejection letter or a blank page. (Presumably the blank page meant either that HBS would admit the student, or that the admissions decision hadn't been made yet.) 119 HBS applicants used the instructions.

Harvard has now summarily rejected all of them, calling their action a breach of ethics. I'm not so sure that the students' action merits rejection from business school.

My first reaction on reading about this was surprise that HBS would make an admissions decision (as it apparently had done in many cases) and then wait for weeks before informing the applicant. Applicants rejected from HBS would surely benefit from learning that information as quickly as possible. Harvard had apparently gone to the trouble of telling ApplyYourself that some applicants were rejected, but they weren't going to tell the applicants themselves!? It's hard to see a legitimate reason for HBS to withhold this information from applicants who want it.

As far as I can tell, the only "harm" that resulted from the students' actions is that some of them learned the information about their own status that HBS was, for no apparent reason, withholding from them. And the information was on the web already, with no password required (for students who had already logged on to their own accounts on the site).

I might feel differently if I knew that the applicants were aware that they were breaking the rules. But I'm not sure that an applicant, on being told that his letter was already on the web and could be accessed by constructing a particular URL, would necessarily conclude that accessing it was against the rules. And it's hard to justify punishing somebody who caused no real harm and didn't know that he was breaking the rules.

As the AP article suggests, this is an easy opportunity for HBS (and MIT and CMU, who did the same thing) to grandstand about business ethics, at low cost (since most of the applicants in question would have been rejected anyway). Stanford, on the other hand, is reacting by asking the applicants who viewed their Stanford letters to come forward and explain themselves. Now that's a real ethics test.

Tagged:  

Cal-Induce Bill Morphs Into Filtering Mandate

A bill in the California state senate (SB 96), previously dubbed the "Cal-Induce Act," has now morphed via amendment into a requirement that copyright and porn filters be included in many network software programs.

Here's the heart of the bill:

Any person or entity that [sells, advertises, or distributes] peer-to-peer file sharing software that enables its user to electronically disseminate commercial recordings or audiovisual works via the Internet or any other digital network, and who fails to incorporate available filtering technology into that software to prevent use of that software to commit an unlawful act with respect to a commercial recording or audiovisual work, or a violation of [state obscenity or computer intrusion statutes] is punishable ... by a fine not exceeding [$2500], imprisonment ... for a period not to exceed one year, or by both ...

This section shall not apply to the following:

(A) Computer operating system or Internet browser software.

(B) An electronic mail service or Internet service provider.

(C) Transmissions via a [home network] or [LAN]. [Note: The bill uses an odd definition of "LAN" that would exclude almost all of the real LANs I know. – EF]

As used in this section, "peer to peer file sharing software" means software ... the primary purpose of which ... is to enable the user to connect his or her computer to a network of other computers on which the users of these computers have made available recordings or audiovisual works for electronic dissemination to other users who are connected to the network. When a transaction is complete, the user has an identical copy of the file on his or her computer and may also then disseminate the file to other users connected to the network.

The main change from the previous version of the bill is the requirement to include filtering technologies; the previous version had required instead that the person "take reasonable care in preventing" bad uses of the software. This part of the bill is odd in several ways.

First, if the system in question uses a client-server architecture (as in the original Napster system), the bill applies only to the client-side software, since only the client software meets the bill's definition of P2P. Since the bill requires that a filter be incorporated into the P2P software, a provider could not protect itself by doing server-side filtering, even if that filtering were perfectly effective. This bill doesn't just mandate filtering, it mandates client-side filtering.

Second, the bill apparently requires anyone who advertises or distributes P2P software to incorporate filters into it. This seems a bit odd; normally advertisers and distributors don't control the design of the products they advertise. Typically, third party advertisers and distributors aren't allowed to inspect a software product's design.

Third, the "primary purpose" language is pretty hard to apply. A program's author may have one purpose in mind; a distributor may have another purpose in mind; and users may have a variety of purposes in using the software. Of course, the software itself can't properly be said to have a purpose, other than doing what it is programmed to do. Most P2P software is programmed to distribute whatever files its users ask it to distribute. Is purpose to be inferred from the intent of the designer, or from the design of the software itself, or from the actual use of the software by users? Each of these alternatives leads to problems of one sort or another.

Note also the clever construction of the P2P definition, which requires only that the primary purpose be to connect the user to a network where some other people are offering files to share. It does not seem to require that the primary purpose of the network be to share files, or that the primary purpose of the software be to share files, but only that the software connects the user to a network where some people are sharing files. Note also that the purpose language refers only to the transfer of audio or video files, not to the infringing transfer of such files; so even a system that did only authorized transfers would seem to be covered by the definition. Finally, note that the bill apparently requires the filters to apply to all uses of the software in question, not just uses that involve networking or file transfer.

Fourth, it's not clear what the bill says about situations where there is no workable filtering software, or where the only available filtering software is seriously flawed. Is there an obligation to install some filtering software, even if doesn't work very well, and even if it makes the P2P software unusable in practice? The bill's language seems to assume that there is available filtering software that is known to work well, which is not necessarily the case.

The new version of the bill also adds enumerated exceptions for operating system or web browser software, email services, ISPs, home networks, and LANs (though the bill's quirky definition of "LAN" would exclude most LANs I know of). As usual, it's not a good sign when you have to create explicit exceptions for commonly used products like these. The definition still seems likely to ensnare new legitimate communication technologies.

(Thanks to Morgan Woodson (creator of an amusing Induce Act Hearing mashup) for bringing this to my attention.)

Separating Search from File Transfer

Earlier this week, Grokster and StreamCast filed their main brief with the Supreme Court. The brief's arguments are mostly predictable (but well argued).

There's an interesting observation buried in the Factual Background (on pp. 2-3):

What software like respondents' adds to [a basic file transfer] capability is, at bottom, a mechanism for efficiently finding other computer users who have files a user is seeking....

Software to search for information on line ... is itself hardly new. Yahoo, Google, and others enable searching. Those "search engines," however, focus on the always-on "servers" on the World Wide Web.... The software at issue here extends the reach of searches beyond centralized Web servers to the computers of ordinary users who are on line....

It's often useful to think of a file sharing system as a search facility married to a file transfer facility. Some systems only try to innovate in one of the two areas; for example, BitTorrent was a major improvement in file transfer but didn't really have a search facility at all.

Indeed, one wonders why the search and file transfer capabilities aren't more often separated as a matter of engineering. Why doesn't someone build a distributed Web searching system that can cope with many unreliable servers? Such a system would let ordinary users find files shared from the machines of other ordinary users, assuming that the users ran little web servers. (Running a small, simple web server can be made easy enough for any user to do.)

On the Web, file transfer and search are separated, and this has been good for users. Files are transferred via a standard protocol, HTTP, but there is vigorous competition between search engines. The same thing could happen in the file sharing world. In the file sharing world, the search engines would presumably be decentralized. But then again, big Web search engines are decentralized in the sense that they consist of very large numbers of machines scattered around the world – they're physically decentralized but under centralized control.

Why haven't file sharing systems been built using separate products for search and file transfer? That's an interesting question to think about. I haven't figured out the answer yet.

Tagged:  

Boosting

Congratulations to my Princeton colleague Rob Schapire on winning ACM's prestigious Kanellakis Award (shared with Columbia's Yoav Freund). The annual award is given for a contribution to theoretical computer science that has a significant practical impact. Schapire and Freund won this year for an idea called boosting, so named because it can take a mediocre machine learning algorithm and automatically "boost" it into a really good one. The basic idea is cool, and not too hard to understand at a basic level.

A common type of machine learning problem involves learning how to classify objects based on examples. A learning algorithm (i.e., a computer program) is shown a bunch of example objects, each of which falls into one of two categories. Each example object has a label saying which category it is in. Each object can be labeled with an "importance weight" that tells us the relative importance of categorizing that object correctly; objects with higher weights are more important. The machine learning algorithm's job is to figure out a rule that can be used to distinguish the two categories of objects, so that it can categorize objects that it hasn't seen before. The algorithm isn't told what to look for, but has to figure that out for itself.

Any fool can "solve" this problem by creating a rule that just guesses at random. That method will get a certain number of cases right. But can we do better than random? And if so, how much better?

Suppose we have a machine learning algorithm that does just a little better than random guessing for some class of problems. Schapire and Freund figured out a trick for "boosting" the performance of any such algorithm. To use their method, start by using the algorithm on the example data to deduce a rule. Call this Rule 1. Now look at each example object and see whether Rule 1 categorizes it correctly. If Rule 1 gets an object right, then lower that object's importance weight a little; if Rule 1 gets an object wrong, then raise that object's weight a little. Now run the learning algorithm again on the objects with the tweaked weights, to deduce another rule. Call this Rule 1a.

Intuitively, Rule 1a is just like Rule 1, except that Rule 1a pays extra attention to the examples that Rule 1 got wrong. We can think of Rule 1a as a kind of correction factor that is designed to overcome the mistakes of Rule 1. What Shapire and Freund proved is that if you combine Rule 1 and Rule 1a in a certain special way, the combined rule that results is guaranteed to be more accurate than Rule 1. This trick takes Rule 1, a mediocre rule, and makes it better.

The really cool part is that you can then take the improved rule and apply the same trick again, to get another rule that is even better. In fact, you can keep using the trick over and over to get rules that are better and better.

Stated this way, the idea doesn't seem to complicated. Of course, the devil is in the details. What makes this discovery prize-worthy is that Schapire and Freund worked out the details of exactly how to tweak the weights and exactly how to combine the partial rules – and they proved that the method does indeed yield a better rule. That's a very nice bit of computer science.

(Princeton won more of the major ACM awards than any other institution this year. Besides Rob Schapire's award, Jennifer Rexford won the Grace Murray Hopper award (for contributions by an under-35 computer scientist) for her work on Internet routing protocols, and incoming faculty member Boaz Barak won the Dissertation Award for some nice crypto results. Not that we're gloating or anything....)

Tagged:  

Computer Science Professors' Brief in Grokster

Today, seventeen computer science professors (including me) are filing an amicus brief with the Supreme Court in the Grokster case. Here is the summary of our argument, quoted from the brief:

Amici write to call to the Court's attention several computer science issues raised by Petitioners [i.e., the movie and music companies] and amici who filed concurrent with Petitioners, and to correct certain of their technical assertions. First, the United States' description of the Internet's design is wrong. P2P networks are not new developments in network design, but rather the design on which the Internet itself is based. Second, a P2P network design, where the work is done by the end user's machine, is preferable to a design which forces work (such as filtering) to be done within the network, because a P2P design can be robust and efficient. Third, because of the difficulty in designing distributed networks, advances in P2P network design – including BitTorrent and Respondents' [i.e., Grokster's and Streamcast's] software – are crucial to developing the next generation of P2P networks, such as the NSF-funded IRIS Project. Fourth, Petitioners' assertion that filtering software will work fails to consider that users cannot be forced to install the filter, filtering software is unproven or that users will find other ways to defeat the filter. Finally, while Petitioners state that infringers' anonymity makes legal action difficult, the truth is that Petitioners can obtain IP addresses easily and have filed lawsuits against more than 8,400 alleged infringers. Because Petitioners seek a remedy that will hobble advances in technology, while they have other means to obtain relief for infringement, amici ask the Court to affirm the judgment below.

The seventeen computer science professors are Harold Abelson (MIT), Thomas Anderson (U. Washington), Andrew W. Appel (Princeton), Steven M. Bellovin (Columbia), Dan Boneh (Stanford), David Clark (MIT), David J. Farber (CMU), Joan Feigenbaum (Yale), Edward W. Felten (Princeton), Robert Harper (CMU), M. Frans Kaashoek (MIT), Brian Kernighan (Princeton), Jennifer Rexford (Princeton), John C. Reynolds (CMU), Aviel D. Rubin (Johns Hopkins), Eugene H. Spafford (Purdue), and David S. Touretzky (CMU).

Thanks to our counsel, Jim Tyre and Vicky Hall, for their work in turning a set of ideas and chunks of rough text into a coherent brief.

Tagged:  

Forecast for Infotech Policy in the New Congress

Cameron Wilson, Director of the ACM Public Policy Office in Washington, looks at changes (made already or widely reported) in the new Congress and what they tell us about likely legislative action. (He co-writes the ACM U.S. Public Policy Blog, which is quite good.)

He mentions four hot areas. The first is regulation of peer-to-peer technologies. Once the Supreme Court's decision in Grokster comes down, expect Congress to spring into action, to protect whichever side claims to be endangered by the decision. A likely focal point for this is the new Intellectual Property subcommittee of the Senate Judiciary Committee. (The subcommittee will be chaired by Sen. Orrin Hatch, who has not been shy about regulating infotech in the name of copyright. He championed of the Induce Act.) This issue will start out being about P2P but could easily expand to regulate a wider class of technologies.

The second area is telecom. Sen. Ted Stevens is the new chair of the Senate Commerce Committee, and he seems eager to work on a big revision of the Telecom Act of 1996. This will be a battle royal involving many interest groups, and telecom policy wonks will be fully absorbed. Regulation of non-telecom infotech products seems likely to creep into the bill, given the technological convergence of telecom with the Internet.

The third area is privacy. The Real ID bill, which standardizes state driver's licenses to create what is nearly a de facto national ID card, is controversial but seems likely to become law. The recent ChoicePoint privacy scandal may drive further privacy legislation. Congress is likely to do something about spyware as well.

The fourth area is security and reliability of systems. Many people on the Hill will want to weigh in on this issue, but it's not clear what action will be taken. There are also questions over which committees have jurisdiction. Many of us hope that PITAC's report on the sad state of cybersecurity research funding will trigger some action.

As someone famous said, it's hard to make predictions, especially about the future. There will surely be surprises. About the only thing we can be sure of is that infotech policy will get even more attention in this Congress than in the last one.

More on Ad-Blocking

I'm on the road today, so I don't have a long post for you. (Good news: I'm in Rome. Bad news: It's Rome, New York.)

Instead, let me point you to an interesting exchange about copyright and ad-blocking software on my course blog, in which "Archer" opens with a discussion of copyright and advertising revenue, and Harlan Yu responds by asking whether distributing Firefox AdBlock is a contributory infringement.

There's plenty of interesting writing on the course blog. Check it out!

UPDATE (Feb. 28): Another student, "Unsuspecting Innocent," has more on this topic.

Tagged:  

Can P2P Nets Be Poisoned?

Christin, Weigend, and Chuang have an interesting new paper on corruption of files in P2P networks. Some files are corrupted accidentally (they call this "pollution"), and some might be corrupted deliberately ("poisoning") by copyright owners or their agents. The paper measures the availability of popular, infringing files on the eDonkey, Overnet, Gnutella, and FastTrack networks, and simulates the effect of different pollution strategies that might be used.

The paper studied a few popular files for which corruption efforts were not occurring (or at least not succeeding). Polluted versions of these files are found, especially on FastTrack, but these aren't a barrier to user access because non-corrupted files tend to have more replicas available than polluted files do, and the systems return files with more replicas first.

They move on to simulate the effect of various pollution strategies. They conclude that a sufficiently sophisticated pollution strategy, which injects different decoy versions of a file at different times, and injects many replicas of the same decoy at the same time, would significantly reduce user access to targeted files.

Some P2P programs use simple reputation systems to try to distinguish corrupted files from non-corrupted ones; the paper argues that these will be ineffective against their best pollution strategy. But they also note that better reputation systems could can detect their sophisticated poisoning strategy.

They don't say anything more about the arms race between reputation technologies and pollution technologies. My guess is that in the long run reputation systems will win, and poisoning strategies will lose their viability. In the meantime, though, it looks like copyright owners have much to gain from poisoning.

[UPDATE (6:45 PM): I changed the second paragraph to eliminate an error that was caused by my misreading of the paper. Originally I said, incorrectly, that the study found little if any evidence of pollution for the files they studied. In fact, they chose those files because they were not subject to pollution. Thanks to Cypherpunk, Joe Hall, and Nicolas Christin for pointing out my error.]

Tagged:  
Syndicate content