Posts tagged with: Computing in the Cloud

Cloud(s), Hype, and Freedom

Richard Stallman's recent description of 'the cloud' as 'hype' and a 'trap' seems to have stirred up a lot of commentary, but not a lot of clear discussion of the problems Stallman raised. This isn't surprising- the term 'the cloud' has always been vague. (It was hard to resist saying 'cloudy.' ;) When people say 'the cloud' they are really lumping at least four 'cloud types' together.

traditional applications, hosted elsewhere

Probably the most common type of 'cloud' is a service that takes a traditional software functionality and moves it to remotely hosted, (typically) web-delivered servers. Gmail and salesforce.com are like this- fairly traditional email and CRM applications, 'just' moved to the web.

If Stallman's 'hype' claim is valid anywhere, it is here. Administration and maintenance costs are definitely lower when an expert like Google funds and runs the server, and reliability may improve as well. But the core functionality of these apps, and the ability to access data over a network, have been present since the dawn of networked computing. On average, this is undoubtedly a significant change in quality, but only rarely a change in type- making the buzz much harder to justify.

Stallman's 'trap' charge is more complex. Computer users have long compromised on personal control by storing data remotely but accessing it via standardized protocols. This introduced risks- you had to trust the data host and couldn't tinker with the server- but kept some controls- you could switch clients, and typically you could export the data. Some web apps still strike that balance- for example, most gmail features are accessible via good old POP and IMAP. But others don't.

Getting your data out of a service like salesforce can be a 'hidden cost' of an apparently free service, and even with a relatively standards-based service like gmail you have no freedom to make changes to the server. These risks are what Stallman means when he talks about a 'trap', and regardless of your conclusion about them, understanding them is important.

services involving data that can't (yet) be managed locally

Google Maps and Google Search are the canonical examples of this type of cloud service- heaps of data so large that one would need a large data center to host your own copy and a very, very fat pipe to keep it up-to-date.

Hype-wise, these are a mixed bag. These services definitely bring radical new functionality that traditionally can't exist- I can't store all of google maps on my phone. That hype is justified. At the same time, our personal ability to store and process data is still growing quickly, so the claims that this type of cloud service will always 'require' remote servers may be overblown.

'Trap'-wise? Dependence on these services reminds me of 'dependence' on a library before the internet- you can work to make sure your library respects your privacy, prefer public libraries to private ones, or establish a personal library if your reading interests are narrow, but in the end eschewing large libraries is likely to be a case of cutting off your nose to spite your face. We're in the same state with this type of cloud service. You can avoid them, but those concerned with freedom might be better off understanding and fixing them than condemning them altogether.

services that make creation of new data technically or economically feasible

Facebook and wikipedia are the canonical examples here. Unlike the first two types of cloud, where data was available but inconvenient before it ended up in the cloud, this class of cloud applications creates information that wasn't previously feasible to collect at all.

There may well not be enough hype around this type of cloud. Replicating web scale collaborative facilities like these will be very difficult to do in a p2p fashion, and the impact of the creation of new information (even when it is as mundane as facebook's data often is) is hard to understate.

Like the previous type of cloud, it is hard to call these a trap per se- they do make it hard to leave, but they do so by providing new functionality that is very hard to get with any traditional software model.

services offering computing and storage, rather than data

The most recent type of cloud service is remotely provisioned computing and storage, like Amazon's EC2/S3 and Google's App Engine. This is perhaps the most purely generative type of cloud, allowing individuals to create new services and scale them out to service millions of people without having to invest in their own physical infrastructure. It is hard to see any way in which this can reasonably be called 'hype,' given the reach it allows individuals and small or transient groups to have which might otherwise cost them many thousands of dollars.

From a freedom perspective, these can be both the best and worst of the cloud types. On the plus side, these services can be incredibly transparent- developers who use them directly have access to their own source code, and end users may not know they are using them at all. On the down side, especially for proprietary platforms like App Engine, these can have very deep lock-in- it is complicated, expensive, and risky to switch deployment platforms after achieving success. And they replace traditional, very open platforms- a tradeoff that isn't always appreciated.

takeaways

'The cloud' isn't going away, but hopefully we can clarify our thinking about it by talking about the different types of clouds. Hopefully this post is a useful step in that direction.

[This post is an extension of some ideas I've been playing around with on my own blog and at the autonomo.us group blog; readers curious about these issues may want to read further in those places. I also recommend reading this piece, which set me on the (very long) road to this particular post.]

The Decline of Localist Broadcasting Policies

Public policy, in the U.S. at least, has favored localism in broadcasting: programming on TV and radio stations is supposed to be aimed, at least in part, at the local community. Two recent events call this policy into question.

The first event is the debut of the Pandora application on the iPhone. Pandora is a personalized "music radio" service delivered over the Internet. You tell it which artists and songs you like, and it plays you the requested songs, plus other songs it thinks are similar. You can rate the songs it plays, thereby giving it more information about what you like. It's not a jukebox – you can't find out in advance what it's going to play, and there are limits on how often it can play songs from the same artist or album – but it's more personalized than broadcast radio. (Last.fm offers a similar service, also available now on the iPhone.)

Now you can get Pandora on your iPhone, so you can listen to Pandora on a battery-powered portable device that fits in your pocket – like a twenty-first century version of the old transistor radios, only this one plays a station designed especially for you. Why listen to music on broadcast radio when you can listen to this? Or to put it another way: why listen to music targeted at people who live near you, when you can listen to music targeted at people with tastes like yours?

The second event I'll point to is a statement from a group of Christian broadcasters, opposing a proposed FCC rule that would require radio stations to have local advisory boards that tell them how to tailor programming to the local community. [hat tip: Ars Technica] The Christian stations say, essentially, that their community is defined by a common interest rather than by geography.

Many people are like the Pandora or Christian radio listeners, in wanting to hear content aimed at their interests rather than just their location. Public policy ought to recognize this and give broadcasters more latitude to find their own communities rather than defining communities only by geography.

Now I'm not saying that there shouldn't be local programming, or that people shouldn't care what is happening in their neighborhoods. Most people care a lot about local issues and want some local programming. The local community is one of their communities of interest, but it's not the only one. Let some stations serve local communities while others serve non-local communities. As long as there is demand for local programming – as there surely will be – the market will provide it, and new technologies will help people get it.

Indeed, one of the benefits of new technologies is that they let people stay in touch with far-away localities. When we were living in Palo Alto during my sabbatical, we wanted to stay in touch with events in the town of Princeton because we were planning to move back after a year. Thanks to the Web, we could stay in touch with both Palo Alto and Princeton. The one exception was that we couldn't get New Jersey TV stations. We had satellite TV, so the nearby New York and Philadelphia stations were literally being transmitted to our Palo Alto house; but the satellite TV company said the FCC wouldn't let us have the station because localist policy wanted us to watch San Francisco stations instead. Localist policy, perversely, pushed us away from local programming and kept us out of touch.

New technologies undermine the rationale for localist policies. It's easier to get far-away content now – indeed the whole notion that content is bound to a place is fading away. With access to more content sources, there are more possible venues for local programming, making it less likely that local programming will be unavailable because of the whims or blind spots of a few station owners. It's getting easier and cheaper to gather and distribute information, so more people have the means to produce local programming. In short, we're looking at a future with more non-local programming and more local programming.

New bill advances open data, but could be better for reuse

Senators Obama, Coburn, McCain, and Carper have introduced the Strengthening Transparency and Accountability in Federal Spending Act of 2008 (S. 3077), which would modify their 2006 transparency act. That first bill created USASpending.gov, a searchable web site of government outlays. USASpending.gov—which was based on software developed by OMB Watch and the Sunlight Foundation—allows end users to search across a variety of criteria. It has begun offering an API, an interface that lets developers query the data and display the results on their own sites. This allows a kind of reuse, but differs significantly from the approach suggested in our recent "Invisible Hand" paper. We urge that all the data be published in open formats. An API delivers search results, but that makes the search interface itself very important: having to work through an interface sometimes limits developers from making innovative, unforeseen uses of the data.

The new bill would expand the scope of information available via USASpending.gov, adding information about federal contracts, leases, and audit disputes, among other areas. But it would also elevate the API itself to a matter of statutory mandate. I'm all in favor of mandates that make data available and reusable, but the wording here is already a prime example of why technical standards are often better left to expert regulatory bodies than etched in statute:

" (E) programmatically search and access all data in a serialized machine readable format (such as XML) via a web-services application programming interface"

A technical expert body would (I hope) recognize that there is added value in allowing the data itself to be published so that all of it can be accessed at once. This is significantly different from the site's current attitude; addressing the list of top contractors by dollar volume, the site's FAQ says it "does not allow the results of these tables to be downloaded in delimited or XML format because they are not standard search results." I would argue that standardizers of search results, whomever they may be, should not be able to disallow any data from being downloaded. There doesn't necessarily need to be a downloadable table of top contractors, but it should be possible for citizens to download all the data so that they can compose such a table themselves if they so desire. The API approach, if it substitutes for making all the data available for download, takes us away from the most vibrant possible ecosystem of data reuse, since whenever government web sites design an interface (whether it's a regular web interface for end users, or a code-level interface for web developers), they import assumptions about how the data will be used.

All that said, it's easy to make the data available for download, and a straightforward additional requirement that could be added to the bill. And in any cause we owe a debt of gratitude to Senators Coburn, Obama, McCain and Carper for their pioneering, successful efforts in this area.

==

Update, June 12: Amended the list of cosponsors to include Sens. Carper and (notably) McCain. With both major presidential candidates as cosponsors, the bill seems to reflect a political consensus. The original bill back in 2006 had 48 cosponsors and passed unanimously.

Government Data and the Invisible Hand

David Robinson, Harlan Yu, Bill Zeller, and I have a new paper about how to use infotech to make government more transparent. We make specific suggestions, some of them counter-intuitive, about how to make this happen. The final version of our paper will appear in the Fall issue of the Yale Journal of Law and Technology. The best way to summarize it is to quote the introduction:

If the next Presidential administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens. Today, government bodies consider their own websites to be a higher priority than technical infrastructures that open up their data for others to use. We argue that this understanding is a mistake. It would be preferable for government to understand providing reusable data, rather than providing websites, as the core of its online publishing responsibility.

In the current Presidential cycle, all three candidates have indicated that they think the federal government could make better use of the Internet. Barack Obama's platform explicitly endorses "making government data available online in universally accessible formats." Hillary Clinton, meanwhile, remarked that she wants to see much more government information online. John McCain, although expressing excitement about the Internet, has allowed that he would like to delegate the issue, possible to a vice-president.

But the situation to which these candidates are responding – the wide gap between the exciting uses of Internet technology by private parties, on the one hand, and the government's lagging technical infrastructure on the other – is not new. The federal government has shown itself consistently unable to keep pace with the fast-evolving power of the Internet.

In order for public data to benefit from the same innovation and dynamism that characterize private parties' use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that "exposes" the underlying data. Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens and can constantly create and reshape the tools individuals use to find and leverage public data. The best way to ensure that the government allows private parties to compete on equal terms in the provision of government data is to require that federal websites themselves use the same open systems for accessing the underlying data as they make available to the public at large.

Our approach follows the engineering principle of separating data from interaction, which is commonly used in constructing websites. Government must provide data, but we argue that websites that provide interactive access for the public can best be built by private parties. This approach is especially important given recent advances in interaction, which go far beyond merely offering data for viewing, to offer services such as advanced search, automated content analysis, cross-indexing with other data sources, and data visualization tools. These tools are promising but it is far from obvious how best to combine them to maximize the public value of government data. Given this uncertainty, the best policy is not to hope government will choose the one best way, but to rely on private parties with their vibrant marketplace of engineering ideas to discover what works.

To read more, see our preprint on SSRN.

Privacy: Beating the Commitment Problem

I wrote yesterday about a market failure relating to privacy, in which a startup company can't convincingly commit to honoring its customers' privacy later, after the company is successful. If companies can't commit to honoring privacy, then customers won't be willing to pay for privacy promises – and the market will undersupply privacy.

Today I want to consider how to attack this problem. What can be done to enable stronger privacy commitments?

I was skeptical of legal commitments because, even though a company might make a contractual promise to honor some privacy rules, customers won't have the time or training to verify that the promise is enforceable and free of loopholes.

One way to attack this problem is to use standardized contracts. A trusted public organization might design a privacy contract that companies could sign. Then if a customer knew that a company had signed the standard contract, and if the customer trusted the organization that wrote the contract, the customer could be confident that the contract was strong.

But even if the contract is legally bulletproof, the company might still violate it. This risk is especially acute with a cash-strapped startup, and even more so if the startup might be located offshore. Many startups will have shallow pockets and little presence in the user's locality, so they won't be deterred much by potential breach-of-contract lawsuits. If the startup succeeds, it will eventually have enough at stake that it will have to keep the promises that its early self made. But if it fails or is on the ropes, it will be strongly tempted to try cheating.

How can we keep a startup from cheating? One approach is to raise the stakes by asking the startup to escrow money against the possibility of a violation – this requirement could be build into the contract.

Another approach is to have the actual data held by a third party with deeper pockets – the startup would provide the code that implements its service, but the code would run on equipment managed by the third party. Outsourcing of technical infrastructure is increasingly common already, so the only difference from existing practice would be to build a stronger wall between the data stored on the server and the company providing the code that implements the service.

From a technical standpoint, this wall might be very difficult to build, depending on what exactly the service is supposed to do. For some services the wall might turn out to be impossible to build – there are some gnarly technical issues here.

There's no easy way out of the privacy commitment problem. But we can probably do more to attack it than we do today. Many people seem to have given up on privacy online, which is a real shame.

Privacy and the Commitment Problem

One of the challenges in understanding privacy is how to square what people say about privacy with what they actually do. People say they care deeply about privacy and resent unexpected commercial use of information about them; but they happily give that same information to companies likely to use and sell it. If people value their privacy so highly, why do they sell it for next to nothing?

To put it another way, people say they want more privacy than the market is producing. Why is this? One explanation is that actions speak louder than words, people don't really want privacy very much (despite what they say), and the market is producing an efficient level of privacy. But there's another possibility: perhaps a market failure is causing underproduction of privacy.

Why might this be? A recent Slate essay by Reihan Salam gives a clue. Salam talks about the quandry faced by companies like the financial-management site Wesabe. A new company building up its business wants to reassure customers that their information will be treated with the utmost case. But later, when the company is big, it will want to monetize the same customer information. Salam argues that these forces are in tension and few if any companies will be able to stick with their early promises to not be evil.

What customers want, of course, is not good intentions but a solid commitment from a company that it will stay privacy-friendly as it grows. The problem is that there's no good way for a company to make such a commitment. In principle, a company could make an ironclad legal commitment, written into a contract with customers. But in practice customers will have a hard time deciphering such a contract and figuring out how much it actually protects them. Is the contract enforceable? Are there loopholes? The average customer won't have a clue. He'll do what he usually does with a long website contract: glance briefly at it, then shrug and click "Accept".

An alternative to contracts is signaling. A company will say, repeatedly, that its intentions are pure. It will appoint the right people to its advisory board and send its executives to say the right things at the right conferences. It will take conspicuous, almost extravagant steps to be privacy-friendly. This is all fine as far as it goes, but these signals are a poor substitute for a real commitment. They aren't too difficult to fake. And even if the signals are backed by the best of intentions, everything could change in an instant if the company is acquired – a new management team might not share the original team's commitment to privacy. Indeed, if management's passion for privacy is holding down revenue, such an acquisition will be especially likely.

There's an obvious market failure here. If we postulate that at least some customers want to use web services that come with strong privacy commitments (and are willing to pay the appropriate premium for them), it's hard to see how the market can provide what they want. Companies can signal a commitment to privacy, but those signals will be unreliable so customers won't be willing to pay much for them – which will leave the companies with little incentive to actually protect privacy. The market will underproduce privacy.

How big a problem is this? It depends on how many customers would be willing to pay a premium for privacy – a premium big enough to replace the revenue from monetizing customer information. How many customers would be willing to pay this much? I don't know. But I do know that people might care a lot about privacy, even if they're not paying for privacy today.

Scoble/Facebook Incident: It's Not About Data Ownership

Last week Facebook canceled, and then reinstated, Robert Scoble's account because he was using an automated script to export information about his Facebook friends to another service. The incident triggered a vigorous debate about who was in the right. Should Scoble be allowed to export this data from Facebook in the way he did? Should Facebook be allowed to control how the data is presented and used? What about the interests of Scoble's friends?

An interesting meme kept popping up in this debate: the idea that somebody owns the data. Kara Swisher says the data belong to Scoble:

Thus, [Facebook] has zero interest in allowing people to escape easily if they want to, even though THE INFORMATION ON FACEBOOK IS THEIRS AND NOT FACEBOOK'S.

Sorry for the caps, but I wanted to be as clear as I could: All that information on Facebook is Robert Scoble's. So, he should–even if he agreed to give away his rights to move it to use the service in the first place (he had no other choice if he wanted to join)–be allowed to move it wherever he wants.

Nick Carr disagrees, saying the data belong to Scoble's friends:

Now, if you happen to be one of those "friends," would you think of your name, email address, and birthday as being "Scoble's data" or as being "my data." If you're smart, you'll think of it as being "my data," and you'll be very nervous about the ability of someone to easily suck it out of Facebook's database and move it into another database without your knowledge or permission. After all, if someone has your name, email address, and birthday, they pretty much have your identity - not just your online identity, but your real-world identity.

Scott Karp asks whether "Facebook actually own your data because you agreed to that ownership in the Terms of Service." And Louis Gray titles his post "The Data Ownership Wars Are Heating Up".

Where did we get this idea that facts about the world must be owned by somebody? Stop and consider that question for a minute, and you'll see that ownership is a lousy way to think about this issue. In fact, much of the confusion we see stems from the unexamined assumption that the facts in question are owned.

It's worth noting, too, that even today's expansive intellectual property regimes don't apply to the data at issue here. Facts aren't copyrightable; there's no trade secret here; and this information is outside the subject matter of patents and trademarks.

Once we give up the idea that the fact of Robert Scoble's friendship with (say) Lee Aase, or the fact that that friendship has been memorialized on Facebook, has to be somebody's exclusive property, we can see things more clearly. Scoble and Aase both have an interest in the facts of their Facebook-friendship and their real friendship (if any). Facebook has an interest in how its computer systems are used, but Scoble and Aase also have an interest in being able to access Facebook's systems. Even you and I have an interest here, though probably not so strong as the others, in knowing whether Scoble and Aase are Facebook-friends.

How can all of these interests best be balanced in principle? What rights do Scoble, Aase, and Facebook have under existing law? What should public policy says about data access? All of these are difficult questions whose answers we should debate. Declaring these facts to be property doesn't resolve the debate – all it does is rule out solutions that might turn out to be the best.

Syndicate content