When the inventor of the web and the director of the World Wide Web Consortium (W3C) comes out against a popular practice, web mavens take notice. Web specialist and self-proclaimed IBMer James Mathewson (author of Audience, Relevance, and Search: Targeting Web Audiences with Relevant Content) takes apart Tim Berners-Lee’s argument about information on Facebook being free—and if that content is free enough.
When the inventor of the web and the director of the World Wide Web Consortium (W3C) comes out against a popular practice, web mavens take notice. Such was the case when Sir Tim Berners-Lee published a Scientific American article, calling out Facebook, LinkedIn, and other social networking sites for violating the central tenets of the open web.

A key facet of Berners-Lee’s argument is that the web must be universal. According to Berners-Lee:

Sites like Facebook fly in the face of universality because they “are walling off information posted by their users from the rest of the Web.” An example of a lack of universality in Facebook is linking to specific posts within a profile. I can link to my main profile page, but I can’t link to a specific post within my profile. Facebook sets itself apart from the web when it violates universality.

Defenders of Facebook and other social networking sites note that there is a good reason to wall off the personal information of its netizens: privacy. By granting me control over who sees what information I post, I am free to post things about myself that I don’t want everyone to know, especially identity thieves. Privacy allows me the freedom to tell jokes, reveal hidden truths about myself, and share photos with my friends and family. If I was not assured that the information I post is only available to friends, I would not post it. For example, prior to Facebook, I would not feel free to post pictures of my son on Flickr or a similar open file-sharing site, out of fear of predators.

The heart of Berners-Lee’s argument seems to be a utopian concept of a web in which all the information on Earth is freely available to everybody. If that is his vision of the web, I want no part of it. The recent news about WikiLeaks highlights the simple truth that not all information should be free. I’m not here to say no information on WikiLeaks should be free and open to the public. But I don’t want state secrets and other classified documents of my country getting into the hands of the wrong people, thereby putting a lot of innocent people in danger. The pervasiveness of Internet criminals makes the utopian vision of “Information Wants to be Free” dangerous and untenable.

Perhaps I am more sympathetic to the claim that not all information wants to be free because I am an IBMer. As an IBMer, I am fundamentally careful about private information about the corporation, its clients, partners, employees, and other shareholders. I am trained annually to ensure that my public practices respect the privacy and security of IBM’s information and intellectual property. A part of this duty is to ensure that all information about IBM that I make public through the web reflects well on the IBM brand and its stewards. It is not much of a stretch for me to say I have a similar duty to myself and my family. Being an IBMer has heightened my awareness of the importance of information security and privacy.

So the concept of a walled garden of content that is only accessible to individuals I designate seems very natural to me. If this is Berners-Lee’s problem, I must confess, I don’t understand the problem. The information on the public web is still easily accessible through any device by any person. Accessibility to that information is not affected by islands of content that are not free and open to all individuals. I would argue that the global information map is richer with those islands. Much of the content on those islands wouldn’t be accessible to anyone if it was not private and secure—much of it would not even exist online.

Is It About Technology?

In fairness to Berners-Lee, perhaps there is a technological reason behind his steadfast position. Another component Berners-Lee considers foundational for the web is decentralization. If you publish in HTML with a URL, and serve it up on the Internet with the HTTP format, your page can be part of the whole web without any approval from any governing body. If you don’t use these three basic standards, or some upgraded version approved by the W3C, your data is not accessible to the whole web. Berners-Lee writes:

This is important because the value of web content is directly proportional to how interconnected it is. If there are inaccessible islands of content, it limits the value of the whole. It seems Berners-Lee is concerned that more and more of the web will be outside of these simple open standards, lessening the value of the content accessed through them.

An example he gives is iTunes. iTunes does not use the standard HTTP protocol, opting for URLs that begin with iTunes://. So, if you have a podcast on iTunes, it is somewhat cumbersome to link to and share with others. The best you can do is share the web page through which you access the podcast. This is similar to sharing your Facebook page without being able to share individual posts within it. On Facebook, bits and pieces of content don’t have URLs at all, so they can’t be shared on the public web. This lessens the value of the content on sites like Facebook.

But how much does it devalue the whole web to have sites like Facebook and iTunes on them? Apple uses a non-standard protocol for a good reason: It needs to protect the intellectual property of artists who sell their media through iTunes. Facebook’s reason is equally good: security and privacy of its users’ data. Much of the content on iTunes and Facebook would never have been posted there with free and open access to it from the web. Its existence doesn’t lessen the value of other free and open content on the web.

Is It About Open Standards?

The third criterion that Berners-Lee cares about is open standards. He has a vision of the web in which all web sites only use open standard technology. “The basic Web technologies that individuals and companies need to develop powerful services must be available for free, with no royalties.” Facebook’s Open Graph violates this because it is a proprietary technology of Facebook. He writes:

If Facebook becomes the de facto standard for social networking, a large component of web activity would violate the criterion of decentralization. In essence, when you post on Facebook, you turn over control of that information to a governing body, which manages the organizing technology for the data. Facebook will say that its Open Graph technology is a standard that enables the content within its pages to be shared and used by external applications on the greater web. But this standard is not controlled by an independent body such as the W3C. It is controlled by Facebook and subject to the whims of Facebook’s governance.

An example is Adobe Flash, which is a pervasive and open standard on the web, but owned and licensed by Adobe. As long as Flash has a viable open alternative in HTML5, Adobe will keep it open, and perhaps even create an open-source version. Again, it is in Adobe’s best interests to keep the standard open if it wants Flash to be a long-term solution on the web.

The real worry with Open Graph then, is there is no viable open-source alternative to keep the folks at Facebook honest. How big of a worry is this? The standard is less than a year old. There is a council of venerable web stewards that governs it. Unlike Flash, though, Open Graph is limited to finding, using, and sharing content from within the Facebook domain on other sites.

Flash is entirely open in Berners-Lee’s sense. The standard is not governed by the W3C or similar body, so it is not decentralized. And it technically violates the rule of universality because you need browser plug-ins to run Flash. Flash is especially noncompliant in this: Flash modules are inaccessible to people who struggle to see well (unless you code your alt attributes appropriately).

One litmus test for the kind of openness that Berners-Lee promotes is whether crawlers from Google and other search engines can find the data. This was mentioned by Richard MacManus in a ReadWriteWeb article, which said:

Again, we find the same problems with Flash, which is not crawlable in this sense. I spend a lot of my time at IBM teaching teams to avoid using this technology if at all possible because it limits the availability of content for Google, which seriously limits its value. Here is the most poignant case of how the value of content is directly proportional to how interconnected it is. And it can’t be interconnected if it is not published in an open standard.

Open Graph could become a technology that sites adopt to protect and secure private data for their users, similar to what Facebook does. If it ever did become this kind of a standard, it would seem to be on equal footing with Flash. And it would likely spawn an open-source alternative that could be used by any site. Perhaps the W3C can begin working on an open-source technology that acts like Open Graph for components of web sites not within the Facebook domain. In that case, anybody could develop a social networking site with the data security and privacy of Facebook using GnuSocial or Diaspora—open source social networking site generators. That possibility would seem to mitigate much of Berners-Lee’s concerns.


If all Berners-Lee is saying is that the value of Facebook content is limited by its closed nature, I wholeheartedly agree. If I am free to publish content without fear of pirates, predators, and other Internet criminals, I would vastly prefer to publish it on a blog, wiki, or some other open site. But if I need to restrict access to the data to only those I trust, Facebook is a good alternative. I’m willing to sacrifice some content value for the sake of privacy and security.

But it sure sounds like he’s saying a whole lot more. I think he’s saying that Facebook presents a special problem for his vision of the web. Here I disagree. Much of the data in Facebook would not be available at all if it weren’t protected by Facebook’s privacy controls, which prevent open access. Furthermore, Open Graph bears remarkable similarity in its lack of openness to a prevalent technology on the web: Flash. If Flash did not limit the web’s function as the greatest content repository in history, Open Graph is not likely to, either. In any event, the W3C could mitigate the risks to his vision by sponsoring a portable open-source Open Graph alternative.

James Mathewson is global search strategy lead for IBM and co-author of Audience, Relevance, and Search: Targeting Web Audiences with Relevant Content. The opinions expressed are his own and not those of the IBM Corporation.

