Posts tagged with: Security

Electronic Voting Researcher Arrested Over Anonymous Source

Updates: 8/28 Alex Halderman: Indian E-Voting Researcher Freed After Seven Days in Police Custody
8/26 Alex Halderman: Indian E-Voting Researcher Remains in Police Custody
8/24 Ed Felten: It’s Time for India to Face its E-Voting Problem
8/22 Rop Gonggrijp: Hari is in jail :-(

About four months ago, Ed Felten blogged about a research paper in which Hari Prasad, Rop Gonggrijp, and I detailed serious security flaws in India's electronic voting machines. Indian election authorities have repeatedly claimed that the machines are "tamperproof," but we demonstrated important vulnerabilities by studying a machine provided by an anonymous source.

The story took a disturbing turn a little over 24 hours ago, when my coauthor Hari Prasad was arrested by Indian authorities demanding to know the identity of that source.

At 5:30 Saturday morning, about ten police officers arrived at Hari's home in Hyderabad. They questioned him about where he got the machine we studied, and at around 8 a.m. they placed him under arrest and proceeded to drive him to Mumbai, a 14 hour journey.

The police did not state a specific charge at the time of the arrest, but it appears to be a politically motivated attempt to uncover our anonymous source. The arresting officers told Hari that they were under "pressure [from] the top," and that he would be left alone if he would reveal the source's identity.

Hari was allowed to use his cell phone for a time, and I spoke with him as he was being driven by the police to Mumbai:

The Backstory

India uses paperless electronic voting machines nationwide, and the Election Commission of India, the country's highest election authority, has often stated that the machines are "perfect" and "fully tamper-proof." Despite widespread reports of election irregularities and suspicions of electronic fraud, the Election Commission has never permitted security researchers to complete an independent evaluation nor allowed the public to learn crucial technical details of the machines' inner workings. Hari and others in India repeatedly offered to collaborate with the Election Commission to better understand the security of the machines, but they were not permitted to complete a serious review.

Then, in February of this year, an anonymous source approached Hari and offered a machine for him to study. This source requested anonymity, and we have honored this request. We have every reason to believe that the source had lawful access to the machine and made it available for scientific study as a matter of conscience, out of concern over potential security problems.

Later in February, Rop Gonggrijp and I joined Hari in Hyderabad and conducted a detailed security review of the machine. We discovered that, far from being tamperproof, it suffers from a number of weaknesses. There are many ways that dishonest election insiders or other criminals with physical access could tamper with the machines to change election results. We illustrated two ways that this could happen by constructing working demonstration attacks and detailed these findings in a research paper, Security Analysis of India's Electronic Voting Machines. The paper recently completed peer review and will appear at the ACM Computer and Communications Security conference in October.

Our work has produced a hot debate in India. Many commentators have called for the machines to be scrapped, and 16 political parties representing almost half of the Indian parliament have expressed serious concerns about the use of electronic voting.

Earlier this month at EVT/WOTE, the leading international workshop for electronic voting research, two representatives from the Election Commission of India joined in a panel discussion with Narasimha Rao, a prominent Indian electronic voting critic, and me. (I will blog more about the panel in coming days.) After listening to the two sides argue over the security of India's voting machines, 28 leading experts in attendance signed a letter to the Election Commission stating that "India’s [electronic voting machines] do not today provide security, verifiability, or transparency adequate for confidence in election results."

Nevertheless, the Election Commission continues to deny that there is a security problem. Just a few days ago, Chief Election Commissioner S.Y. Quraishi told reporters that the machines "are practically totally tamper proof."

Effects of the Arrest

This brings us to today's arrest. Hari is spending Saturday night in a jail cell, and he told me he expects to be interrogated by the authorities in the morning. Hari has retained a lawyer, who will be flying to Mumbai in the next few hours and who hopes to be able to obtain bail within days. Hari seemed composed when I spoke to him, but he expressed great concern for his wife and children, as well as for the effect his arrest might have on other researchers who might consider studying electronic voting in India.

If any good has come from this, it's that there has been an outpouring of support for Hari. He has received positive messages from people all over India.

Unfortunately, the entire issue distracts from the primary problem: India's electronic voting machines have fundamental security flaws, and do not provide the transparency necessary for voters to have confidence in elections. To fix these problems, the Election Commission will need help from India's technical community. Arresting and interrogating a key member of that community is enormously counterproductive.


Professor J. Alex Halderman is a computer scientist at the University of Michigan.

The Future of DRE Voting Machines

Last week at the EVT/WOTE workshop, Ari Feldman and I unveiled a new research project that we feel represents the future of DRE voting machines. DRE (direct-recording electronic) voting machines are ones where voters cast their ballots by pressing buttons or using a touch screen, and the primary record of the votes is stored in a computer memory. Numerous scientific studies have demonstrated that such machines can be reprogrammed to steal votes, so when we got our hands on a DRE called the Sequoia AVC Edge, we decided to do something different: we reprogrammed it to run Pac-Man.

As more states move away from using insecure DREs, there’s a risk that thousands of these machines will clog our landfills. Fortunately, our results show that they can be productively repurposed. We believe that in the not-so-distant future, recycled DREs will provide countless hours of entertainment in the basements of the nation’s nerds.

To see how we did it, visit our Pac-Man on the AVC Edge voting machine site.

Tagged:  

A New DMCA Exemption for Security Research

By now, most readers have probably heard about the six newly minted exemptions to the anti-circumvention measures of the Digital Millennium Copyright Act (DMCA), announced last week by the Librarian of Congress. For the uninitiated, Ars Technica has an excellent overview of the exemptions, which provide much-needed legal cover for a variety of activities including jailbreaking and unlocking cell phones, decrypting DVDs for non-commercial remixes, and several others.

Of particular interest to folks in the security community is the exemption granted for security research on video game digital rights management (DRM) systems, stemming from both realized and potential security holes in systems like Safedisc and SecuROM.

(More below the fold.)

Tagged:  

A Major Internet Milestone: DNSSEC and SSL

On July 15th, a small but significant internet event occurred. On that day, years of planning culminated in the deployment of a cryptographic signature on the root DNS zone. To simplify greatly, this means that internet users will soon be able to have a much higher degree of trust in the hierarchical Domain Name System by utilizing the powers of recursion and cryptography. When a user's computer is told that the IP address for "gmail.com" is 72.14.204.19, the user can be sure that this answer is true. This is important if you are someone such as a Chinese dissident who wants to reliably and securely reach gmail.com in order to communicate with your peers. The rollout of this throughout all domains, DNS resolvers, and client applications will take a little while, but the basic infrastructure is now in place.

This mitigates a certain class of vulnerabilities that web users used to face. Although it forecloses attacks at the domain name-to-IP address stage of requesting a web page, it does not necessarily foreclose attacks at other stages. For instance, an attacker that gets between you and the server you are trying to reach can simply claim that he is the server at 72.14.204.19. Our traditional way of protecting against this style of attack has been to rely on Certificate Authorities -- trusted third-parties who certify digital key-pairs only for the true owners of a given domain name. Thus, even if an attacker tries to execute one of these "man-in-the-middle" attacks, he won't possess the secret portion of the digital key-pair that is required to prove that his communications come from the true gmail.com. Your browser checks for a certified corresponding public key in the process of setting up a secure SSL/TLS connection to https://gmail.com.

Unfortunately, there are several technical, operational, and jurisdictional shortcomings of the Certificate Authority model. As I discussed in an earlier post, many of these problems are not present in the hierarchical and delegated model of DNS. However, DNS does not inherently provide the ability to store domain name-to-key-pair information. But could it? At one of the recent DNSSEC deployment ceremonies, Vint Cerf noted:

More has happened here today than meets the eye. An infrastructure has been created for a hierarchical security system, which can be purposed and re-purposed in a number of different ways. And so I would predict that although we started out putting this system together to assure that the domain name lookups return valid internet addresses, that in the long run this hierarchical structure of trust will be applied to a number of other functions that require strong authentication. And so you will have seen a new major milestone in the internet story.

I believe that storing SSL/TLS keys in DNSSEC-secured DNS records will be the first significant "other function" that will emerge. An alternative to Certificate Authorities for domain-to-key mapping is sorely needed. There are two major practical hurdles to getting there: 1) We must define a standard for placing keys in DNS and 2) We must secure the "last mile" from the service provder's DNS resolver to the end-user's computer.

The first hurdle involves the type of standard-setting that the internet community is quite familiar with. On a technical level, it means that we need to collectively decide what these DNS records look like. The second hurdle involves building more functionality into end users' software so that it can do cryptographic validation of DNS results rather than blindly trusting its upstream DNS resolver. There may be temporary ways to do this within web browser code, but ultimately it will probably have to be built into what is called the "stub resolver" -- a local service running on your computer that usually just asks for the results from the upstream resolver.

It is important to note that none of his makes Certificate Authorities obsolete. Although the DNS-based approach replaces the most basic type of SSL certificates, the Certificate Authorities will continue to be the only entities that can offer validation of real-world identity of site owners. The DNS-based approach and basic "domain validated" Certificate Authority certificates both verify only that whoever controls the domain name is the entity that your computer is communicating with, without saying who that is. In recent years, "Extended Validation" certificates (the ones that make your browser bar glow green) have begun to be offered by all major certificate authorities. These certificates require more rigorous validation of the identity of the owner, so that for example you know that the person who controls bankofamerica.com is really Bank of America Corporation.

At this year's Black Hat and Defcon, Dan Kaminsky demonstrated some new software he is releasing that could make deploying DNSSEC more easy in general, and that could also address the two main hurdles to placing keys in DNS. He readily admits that his particular implementation is not perfect, and has encouraged critiques and changes. [Update: His slides are available here.]

Hopefully, with the input of the many smart folks in the security, internet standards, and software development communities, we will see a production-quality DNSSEC-secured solution to domain-to-key authentication in the near future.

Tagged:  

The Stock-market Flash Crash: Attack, Bug, or Gamesmanship?

Andrew wrote last week about the stock market's May 6 "flash crash", and whether it might have been caused by a denial-of-service attack. He points to a detailed analysis by nanex.com that unpacks what happened and postulates a DoS attack as a likely cause. The nanex analysis is interesting and suggestive, but I see the situation as more complicated and even more interesting.

Before diving in, two important caveats: First, I don't have access to raw data about what happened in the market that day, so I will accept the facts as posited by nanex. If nanex's description is wrong or incomplete, my analysis won't be right. Second, I am not a lawyer and am not making any claims about what is lawful or unlawful. With that out of the way ...

Here's a short version of what happened, based on the nanex data:
(1) Some market participants sent a large number of quote requests to the New York Stock Exchange (NYSE) computers.
(2) The NYSE normally puts outgoing price quotes into a queue before they are sent out. Because of the high rate of requests, this queue backed up, so that some quotes took a (relatively) long time to be sent out.
(3) A quote lists a price and a time. The NYSE determined the price at the time the quote was put into the queue, and timestamped each quote at the time it left the queue. When the queues backed up, these quotes would be "stale", in the sense that they had an old, no-longer-accurate price --- but their timestamps made them look like up-to-date quotes.
(4) These anomalous quotes confused other market participants, who falsely concluded that a stock's price on the NYSE differed from its price on other exchanges. This misinformation destabilized the market.
(5) The faster a stock's price changed, the more out-of-kilter the NYSE quotes would be. So instability bred more instability, and the market dropped precipitously.

The first thing to notice here is that (assuming nanex has the facts right) there appears to have been a bug in the NYSE's system. If a quote goes out with price P and time T, recipients will assume that the price was P at time T. But the NYSE system apparently generated the price at one time (on entry to the queue) and the timestamp at another time (on exit from the queue). This is wrong: the timestamp should have been generated at the same time as the price.

But notice that this kind of bug won't cause much trouble under normal conditions, when the queue is short so that the timestamp discrepancy is small. The problem might not have be noticed in normal operation, and might not be caught in testing, unless the testing procedure takes pains to create a long queue and to check for the consistency of timestamps with prices. This looks like the kind of bug that developers dread, where the problem only manifests under unusual conditions, when the system is under a certain kind of strain. This kind of bug is an accident waiting to happen.

To see how the accident might develop and be exploited, let's consider the behavior of three imaginary people, Alice, Bob, and Claire.

Alice knows the NYSE has this timestamping bug. She knows that if the bug triggers and the NYSE starts issuing dodgy quotes, she can make a lot of money by exploiting the fact that she is the only market participant who has an accurate view of reality. Exploiting the others' ignorance of real market conditions---and making a ton of money---is just a matter of technique.

Alice acts to exploit her knowledge, deliberately triggering the NYSE bug by flooding the NYSE with quote requests. The nanex analysis implies that this is probably what happened on May 6. Alice's behavior is ethically questionable, if not illegal. But, the nanex analysis notwithstanding, deliberate triggering of the bug is not the only possibility.

Bob also knows about the bug, but he doesn't go as far as Alice. Bob programs his systems to exploit the error condition if it happens, but he does nothing to cause the condition. He just waits. If the error condition happens naturally, he will exploit it, but he'll take care not to cause it himself. This is ethically superior to a deliberate attack (and might be more defensible legally).

(Exercise for readers: Is it ethical for Bob to deliberately refrain from reporting the bug?)

Claire doesn't know that the NYSE has a bug, but she is a very careful programmer, so she writes code that watches other systems for anomalous behavior and ignores systems that seem to be misbehaving. When the flash crash occurs, Claire's code detects the dodgy NYSE quotes and ignores them. Claire makes a lot of money, because she is one of the few market participants who are not fooled by the bad quotes. Claire is ethically blameless --- her virtuous programming was rewarded. But Claire's trading behavior might look a lot like Alice's and Bob's, so an investigator might suspect Claire of unethical or illegal behavior.

Notice that even if there are no Alices or Bobs, but only virtuous Claires, the market might still have a flash crash and people might make a lot of money from it, even in the absence of a denial-of-service attack or indeed of any unethical behavior. The flood of quote requests that trigged the queue backup might have been caused by another bug somewhere, or by an unforeseen interaction between different systems. Only careful investigation will be able to untangle the causes and figure out who is to blame.

If the nanex analysis is at all correct, it has sobering implications. Financial markets are complex, and when we inject complex, buggy software into them, problems are likely to result. The May flash crash won't be the last time a financial market gyrates due to software problems.

Tagged:  

School's Laptop Spying Software Exploitable from Anywhere

This post is by Jay Novak, Jon Stribley, and J. Alex Halderman.

Absolute Manage is a remote administration program that allows sysadmins to supervise and maintain client computers over the Internet. It has been in the news since early February, when Lower Merion School District in Pennsylvania was alleged to be using it to spy on students at home via their laptop webcams. The story took a new twist last Thursday, when Threat Level reported that researchers at Leviathan Security Group had discovered serious vulnerabilities in the program. These problems let attackers carry out a number of exploits, including installing malware or running other arbitrary code on the students' laptops. The major limitation in the reported attacks is that the bad guy needs to be on the same local network as the victim, and the program's developers, Absolute Software, says it's a largely theoretical threat.

Unfortunately, the security problems are worse than has been reported so far, and are far from theoretical. In fact, any machine with a public IP address running Absolute Manage can be taken over by attackers anywhere on the Internet. Such an attacker can command the machine to run arbitrary code, steal data, or take photographs using the computer's camera.

We have been investigating Absolute Manage for several months, hoping to gain a better understanding of the security measures it employs to protect users. We are disclosing this information now because, following the Threat Level post, we believe it's only a matter of time until real attackers discover it. Users need to be aware of the vulnerabilities and take proper measures to protect themselves.

Broken Cryptography

The security issues revolve around the way Absolute Manage encrypts commands sent to the clients. The software has two parts: the Absolute Manage Agent, which runs on client machines, and the Absolute Manage Server, which tracks clients under its supervision and sends commands to perform remote operations like installing new software. The clients and server exchange messages using a TCP-based protocol. Clients continuously listen for messages containing instructions from the server, and they also periodically send "heartbeat" messages to the server to deliver status updates and pull down any queued commands.

The programs compress and encrypt each message before transmitting it. They use an encryption algorithm called Blowfish, which is a credible, if outmoded, choice. The problem is how the keys are managed. Strong encryption protocols like SSL negotiate a new secret key for each communication session, but Absolute Manage uses the same hard coded key every time, in every client. Blowfish can use variable-sized keys from 1-56 bytes, and the programmers decided to use secret textual phrases for the keys. We know what the keys are. We won't publish them here, but they were easy to figure out by inspecting the program files.

Using hard coded keys neutralizes the benefits of cryptography. Since the same, easy-to-discover key is used in every client, it's straightforward for an attacker to unwrap the encrypted messages, modify them, or encrypt messages of his own. This problem is very similar to the broken cryptography in Diebold voting machines that Ari, Alex, and Ed discovered years ago. Diebold also used a simple fixed key, which allowed an attacker with access to one machine to learn the key and attack all the other machines.

The broken cryptography in Absolute Manage enables a variety of attacks. For instance, an attacker can eavesdrop on a command sent from the server and rewrite it to issue a new command that installs and executes malicious code. Or, he can act as a man-in-the-middle between the client and server and insert evil commands in response to a client heartbeat message. An attacker could easily do these by sniffing wireless LAN traffic, as in school and office environments where Absolute Manage is often deployed. These seem to be the attacks referred to in the Threat Level post.

Exploitable from Anywhere

The limitation of these attacks is that the bad guy usually needs to be on the same physical network as the victim. However, we discovered that potentially more dangerous attacks are possible. In these attacks, a bad guy anywhere on the Internet can exploit any Absolute Manage client with a publicly reachable IP address.

Here's an example of a message from the server to the client, with the encryption and compression removed. The client tries to authenticate the command using a parameter called the SeedValue. This value is provided by the server when a client initially attempts to contact it after booting. After that, the client requires the SeedValue to be the same in subsequent commands. The client basically ignores all of the other parameters that look like they would be hard to guess, so the SeedValue is the only thing that makes it difficult for the attacker to generate his own command messages from whole cloth.

In our example message, the SeedValue is:

D969E2CD0CB67F4063F45CEAC7D145B12D76A969306AE0CE

It turns out that it is encrypted using a second hard-coded textual phrase. Decrypting it yields the following bytes:

00 00 00 00 e0 03 10 03 40 03 00 00 31 00 34 00 37 00 35 00 00 00 00 00 00

The length is misleading; the value is actually just a 16-bit unicode encoding of a 7-digit number, 1401475. This is the server's "serial number," which was provided by Absolute Software along with the product activation key when we purchased our license.

Thus, an attacker who wants to send arbitrary commands to Absolute Manage clients just needs to figure out the server's serial number. One way he can do that is to guess it. The attacker could try contacting the client with different values until one of them turns out to be correct. If all serial numbers are 7 digits (like ours) or less, and there is no pattern to there assignment, then an attacker can guess among 10 million possibilities. If there is a pattern (the likely case) then the attacker's job may be much easier. Our tests show that we can make more than 330 guesses/second over a fast network link, so even assuming no pattern an attacker could expect to succeed after about four hours of guessing. Each server uses the same serial number for all its clients, so after the attacker guesses it for one client, he can compromise all the server's other clients without any additional guesswork.

Brute forcing the server's serial number is one method attackers can use, but there is a much more efficient attack for targeting a large set of clients: the server will tell the correct SeedValue to any client that asks. If the attacker knows the IP address of the server a client is trying to contact, he can just impersonate a freshly-booted client and ask the server to send him the correct SeedValue. The server will respond with all the information the attacker needs to impersonate the server.

A bad guy could extend this method to target all Absolute Manage clients in one attack. He could scan the entire Internet address space to discover all hosts running Absolute Manage Server and build a list of active SeedValues. (Servers generally run on public IP addresses so that they can receive status updates from clients that are away from the local network.) Such a scan would take only a few days. The attacker could then do a second Internet-wide scan to discover Absolute Manage Clients. For each of them, he would need only a few seconds to try all the active SeedValues from his list and determine the correct one. This attack could be exploited to quickly install and run malicious code on all computers running the Absolute Manage client on publicly accessible IP addresses.

Defenses and Lessons

In the short term, users can protect themselves by uninstalling the Absolute Manage client. This might be difficult on machines with privileges locked down, so system administrators will need to help.

(Attempts to work around the problem, such as firewalling the server so it can't be found by an Internet scan, may backfire. If the server is unreachable from outside the firewall, clients that are rebooted away from the local network will be unable to obtain a SeedValue. In this situation, the clients insecurely default to accepting arbitrary commands without even the protection of a SeedValue.)

In the long term, the solution is for Absolute Manage to adopt serious cryptographic authentication. Absolute Software says they will do this in the next version later this summer--let's hope they get it right next time.

Remote administration products like Absolute Manage carry large risks because they intentionally create a mechanism for a remote third party to take control of the machine. This can be powerful in the right hands but devastating if exploited by attackers. There will always be a risk of abuse by authorized parties, as alleged in the students' lawsuit against Lower Merion School District, but correctly designed technology should at least prevent unauthorized third-party attacks by making sure only authorized parties can issue commands. This requires getting authentication right--exactly what Absolute Manage failed to do.

Because of these dangers, remote administration software should be designed defensively, minimizing the risk even if the authentication fails. For example, it could only allow installation of signed binaries, or it could give users prominent notification before actions are taken so that attacks can be more easily detected.

The blatant vulnerabilities in Absolute Manage suggest that this kind of remote administration software requires greater security scrutiny. We will further discuss the problems and the lessons they carry in a forthcoming paper.

Tagged:  

India's Electronic Voting Machines Have Security Problems

A team led by Hari Prasad, Alex Halderman, and Rop Gonggrijp released today a technical paper detailing serious security problems with the electronic voting machines (EVMs) used in India.

The independent Electoral Commission of India, which is generally well respected, has dealt poorly with previous questions about EVM security. The chair of the Electoral Commission has called the machines "infallible" and "perfect" and has rejected any suggestion that security improvements are even possible. I hope the new study will cause the EC to take a more realistic approach to EVM security.

The researchers got their hands on a real Indian EVM which they were able to examine and analyze. They were unable to extract the software running in the machine (because that would have required rendering the machine unusable for elections, which they had agreed not to do) so their analysis focused on the hardware. They were able to identify several attacks that manipulated the hardware, either by replacing components or by clamping something on to a chip on the motherboard to modify votes. They implemented demonstration attacks, actually building proof-of-concept substitute hardware and vote-manipulation devices.

Perhaps the most interesting aspect of India's EVMs is how simple they are. Simplicity is a virtue in security as in engineering generally, and researchers (including me) who have studied US voting machines have advocated simplifying their design. India's EVMs show that while simplicity is good, it's not enough. Unless there is some way to audit or verify the votes, even a simple system is subject to manipulation.

If you're interested in the details, please read the team's paper.

The ball is now in the Election Commission's court. Let's hope that they take steps to address the EVM problems, to give the citizens of the world's largest democracy the transparent and accurate elections they deserve.

Tagged:  

Google Publishes Data on Government Data and Takedown Requests

Citizens have long wondered how often their governments ask online service providers for data about users, and how often governments ask providers to take down content. Today Google took a significant step on this issue, unveiling a site reporting numbers on a country-by-country basis.

It's important to understand what is and isn't included in the data on the Google site. First, according to Google, the data excludes child porn, which Google tries to block proactively, worldwide.

Second, the site reports requests made by government, not by private individuals. (Court orders arising from private lawsuits are included, because the court issuing the order is an arm of government.) Because private requests are excluded, the number of removal requests is lower than you might expect -- presumably removal requests from governments are much less common than those from private parties such as copyright owners.

Third, Google is reporting the number of requests received, and not the number of users affected. A single request might affect many users; or several requests might focus on a single user. So we can't use this data to estimate the number of citizens affected in any particular country.

Another caveat is that Google reports the country whose government submitted the request to Google, but this may not always be the government that originated the request. Under Mutual Legal Assistance Treaties, signatory countries agree to pass on law enforcement data requests for other signatories under some circumstances. This might account for some of the United States data requests, for example, if other countries asked the U.S. government to make data requests to Google. We would expect there to be some such proxy requests, but we can't tell from the reported data how many there were. (It's not clear whether Google would always be able to distinguish these proxy requests from direct requests.)

With these caveats in mind, let's look at the numbers. Notably, Brazil tops both the data-requests list and the takedown-requests list. The likely cause is the popularity of Orkut, Google's social network product, in Brazil. India, where Orkut is also somewhat popular, appears relatively high on the list as well. Social networks often breed disputes about impersonation and defamation, which could lead a government to order release of information about who is using a particular account.

The U.S. ranks second on the data-requests list but is lower on the takedown-requests list. This is consistent with the current U.S. trend toward broader data gathering by law enforcement, along with the relatively strong protection of free speech in the U.S.

Finally, China is a big question mark. According to Google, the Chinese government claims that the relevant data is a state secret, so Google cannot release it. The Chinese government stands conspicuously alone in this respect, choosing to deny its citizens even this basic information about their government's activities.

There's a lot more information I'd like to see about government requests. How many citizens are affected? How many requests does Google comply with? What kinds of data do governments seek about Google users? And so on.

Despite its limitations, Google's site is a valuable step toward transparency about governments' attempts to observe and control their citizens' online activities. I hope other companies will follow suit, and that Google will keep pushing on this issue.

Tagged:  

Pseudonyms: The Natural State of Online Identity

I've been writing recently about the problems that arise when you try to use cryptography to verify who is at the other end of a network connection. The cryptographic math works, but that doesn't mean you get the identity part right.

You might think, from this discussion, that crypto by itself does nothing -- that cryptographic security can only be bootstrapped from some kind of real-world identity verification. That's the way it works for website certificates, where a certificate authority has to check your bona fides before it will issue you a certificate.

But this intuition turns out to be wrong. There is one thing that crypto can do perfectly, without any real-world support: providing pseudonyms. Indeed, crypto is so good at supporting pseudonyms that we can practically say that pseudonyms are the natural state of identity online.

To explain why this is true, I need to offer a gentle introduction to a basic crypto operation: digital signatures. Suppose John Doe ("JD") wants to use digital signatures. First, JD needs to create a private cryptographic key, which he does by generating some random numbers and combining them according to a special geeky recipe. The result is a unique private key that only JD knows. Next, JD uses a certain procedure to determine the public key that corresponds to his private key. He announces the public key to everyone. The math guarantees that (1) JD's public key is unique and corresponds to JD's private key, and (2) a person who knows JD's public key can't figure out JD's private key.

Now JD can make digital signatures. If JD wants to "sign" a certain message M, he combines M with JD's private key in a special way, and the result is JD's "signature on M". Now anybody can verify the signature, using JD's public key. Only JD can make the signature, because only JD knows JD's private key; but anybody can verify the signature.

At no point in this process does JD tell anybody who he is -- I called him "John Doe" for a reason. Indeed, JD's public key is a perfect pseudonym: it conveys nothing about JD's actual identity, yet it has a distinct "owner" whose presence can be verified. ("You're really the person who created this public key? Then you should be able to make a signature on the message 'squeamish ossifrage' for me....")

Using this method, anybody can make up a fresh pseudonym whenever they want. If you can generate random numbers and do some math (or have your computer do those things for you), then you can make a fresh pseudonym. You can make as many as you want, without needing to coordinate with anybody. This is all easy to do.

These methods, pseudonyms and signatures, are used even in cases where we want to verify somebody's real-world identity. When you connect to (say) https://mail.google.com, Google's web server gives you its public key -- a pseudonym -- along with a digital certificate that attests that that public key -- that pseudonym -- belongs to Google Inc. Binding public keys -- pseudonyms -- to real-world identities is tedious and messy, but of course this is often necessary in practice.

Online, identities are hard to manage. Pseudonyms are easy.

Side-Channel Leaks in Web Applications

Popular online applications may leak your private data to a network eavesdropper, even if you're using secure web connections, according to a new paper by Shuo Chen, Rui Wang, XiaoFeng Wang, and Kehuan Zhang. (Chen is at Microsoft Research; the others are at Indiana.) It's a sobering result -- yet another illustration of how much information can be leaked by ordinary web technologies. It's also really clever.

Here's the background: Secure web connections encrypt traffic so that only your browser and the web server you're visiting can see the contents of your communication. Although a network eavesdropper can't understand the requests your browser sends, nor the replies from the server, it has long been known that an eavesdropper can see the size of the request and reply messages, and that these sizes sometimes leak information about which page you're viewing, if the request size (i.e., the size of the URL) or the reply size (i.e., the size of the HTML page you're viewing) is distinctive.

The new paper shows that this inference-from-size problem gets much, much worse when pages are using the now-standard AJAX programming methods, in which a web "page" is really a computer program that makes frequent requests to the server for information. With more requests to the server, there are many more opportunities for an eavesdropper to make inferences about what you're doing -- to the point that common applications leak a great deal of private information.

Consider a search engine that autocompletes search queries: when you start to type a query, the search engine gives you a list of suggested queries that start with whatever characters you have typed so far. When you type the first letter of your search query, the search engine page will send that character to the server, and the server will send back a list of suggested completions. Unfortunately, the size of that suggested completion list will depend on which character you typed, so an eavesdropper can use the size of the encrypted response to deduce which letter you typed. When you type the second letter of your query, another request will go to the server, and another encrypted reply will come back, which will again have a distinctive size, allowing the eavesdropper (who already knows the first character you typed) to deduce the second character; and so on. In the end the eavesdropper will know exactly which search query you typed. This attack worked against the Google, Yahoo, and Microsoft Bing search engines.

Many web apps that handle sensitive information seem to be susceptible to similar attacks. The researchers studied a major online tax preparation site (which they don't name) and found that it leaks a fairly accurate estimate of your Adjusted Gross Income (AGI). This happens because the exact set of questions you have to answer, and the exact data tables used in tax preparation, will vary based on your AGI. To give one example, there is a particular interaction relating to a possible student loan interest calculation, that only happens if your AGI is between $115,000 and $145,000 -- so that the presence or absence of the distinctively-sized message exchange relating to that calculation tells an eavesdropper whether your AGI is between $115,000 and $145,000. By assembling a set of clues like this, an eavesdropper can get a good fix on your AGI, plus information about your family status, and so on.

For similar reasons, a major online health site leaks information about which medications you are taking, and a major investment site leaks information about your investments.

The paper goes on to consider possible mitigations. The most obvious mitigation is to add padding to messages so that their sizes are not so distinctive -- for example, every message might be padded to make its size a multiple of 256 bytes. This turns out to be less effective than you might expect -- significant information can still leak even if messages are generously padded -- and the padded messages are slower and more expensive to transmit.

We don't know which sites the researchers studied, but it seems like a safe bet that most, if not all, of the sites in these product categories have similar problems. It's important to keep these attacks in perspective -- bear in mind that they can only be carried out by someone who can eavesdrop on the network between you and the site you're visiting.

It's becoming increasingly clear that securing web-based applications is very difficult, and that the basic tools for developing web apps don't do much to help. The industry, and researchers, will be struggling with web app security issues for years to come.

Tagged:  
Syndicate content