Freebase is obviously not RDBMS, but then what is it?
May 5th, 2008Freebase is a company with a vision of open, shared database which contains the world’s knowledge; kinda like Wikipedia but structured data instead of free flowing text.
Freebase is a company with a vision of open, shared database which contains the world’s knowledge; kinda like Wikipedia but structured data instead of free flowing text.
In my previous post I wrote about the vast bodies of data available to us, talked about the need for an online application/platform which would allow people to create interesting/relevant visualizations from this data and share them with others. I mentioned Swivel which allows people to explore data. There are a few other like the fascinating gapminder, data360, many eyes and a few other related projects from IBM but in this post I’m going to talk only about Swivel.
Swivel is great but I think there are some shortcomings in their vision and approach. To me, they are perpetuating the very same problem which they set out to solve. Earlier it was a mess of data, now its a mess of graphs. In essence, their high level obsession and focus on data instead of knowledge could be their undoing and lead them away from the path of being great.
There are other issues too. They have been up for around a year and their site says that they have just 7201 users so far. Also, as this graphs shows, it is an interesting fact that a mere 0.7 percent of Swivel users have created almost half of their close to 5 million graphs:
So, is there something wrong with Swivel? I think there is and here is a breakdown of what I think is holding them back:
Five Million Graphs in One Place
A single graph “makes a point”. A group of graphs “tell a story”. And all graphs on a specific subject “embody the knowledge on that subject”. A site like Swivel which makes 5 million points is overwhelming.
Data != Graphs
Data is just not the same thing as a graph. You can’t just take any tabular data and make a graph out of it. One needs to understand data, analyze and extract a relevant subset of it and then create one or more graphs. Swivel gives the impression that data is the same thing as graphs.
Single Source
Graphs and data are more valuable as a collective body of knowledge for a specific subject. Given the detailed nature of content that we are talking about her, each subject is distinct. Swivel is trying to be a single source for all kinds of data and graphs. Every time I go to Swivel I am overwhelmed by all the different, disconnected and disparate stats thrown at me. Its a bit weird to see a graph of hedge funds right next shark attacks next to Discharges with MRSA. It may indeed be possible to create such a single source entity but there seems to be something wrong with the way Swivel is going about it. An argument could also be made for this approach by comparing it to wikipedia. There are a couple of problems with this analogy:
User Generated Content and Separation of Spaces
If I am an expert on a particular field or even if I just love and am interested in a particular subject… would I rather create the body of graphs on a site like Swivel which is an ocean of graphs or would I rather do it on a site which gives me my own separate space -much like a blog- where I have complete freedom to create my own categories, evangelize it and own it. Someone must care a lot about a subject to put in the effort to collect, analyze, extract and create visualizations on that subject. Also since each subject is distinct enough, distinct separation of spaces is warranted.
Youtube for Data
Swivel is trying to create a “youtube for data”. That analogy breaks down pretty quickly. Graphs and data is just not entertainment and cannot be compared as such. I can’t imagine folks at Apple saying, “This is cool, lets lets people watch graphs and data on their iPhones just like we let them watch youtube videos”. As long as this is just offered as a simpler explanation of what Swivel is, its fine but if this is how the founder view the value of Swivel, something is wrong there.
Creating and Standing behind content
Swivel’s management attempts to “stands behind their data”. That could be a good or bad based on what your take is. Some might say that its a good thing for them to stand behind it but since this could be a variety of sources from any user, it is just not possible to vouch for the validity of data… unless its form an “official source” a concept which is indeed built into Swivel. Taking the responsibility for creating and standing behind data exposes Swivel to all kinds of valid criticism like this excellent post at flowing data. Yahoo creates the applications and the content. Google stays away from creating content and focuses on creating applications. I wonder if the latter approach is better for the problem at hand here.
Spirit of Openness
They do not have open APIs. A application which survives on public data and user generated content ought to fully embrace the spirit of openness.
Show me the Money
Where is the revenue? Is have a private graphs edition enough? Shouldn’t they be putting ads to generate revenue so that this can be sustained? To be successful Swivel needs to go further than being another web 2.o innovation with a much more sound business model.
Leave Analysis Behind
Swivel also attempts to be an analysis tool by allowing you to compare different data. There are many problems with that and not the least of which is that analysis of data is something which already has many heavyweight champions with great products. To be fair, they do not claim to be an analysis tool but its highlighting of features like this which makes people compare them to full blown analysis tools which in turn liquidates their offering. Steering clear of analysis features or burying them as special case tools is probably best.
Data, Visualization or Both?
There are two things at play here; data and its visualization. Going from data to its visualization is the journey of going from data to the knowledge contained within it. Swivel management mostly talks about being all about data but visualizations seem to be a big part of their offering. These two aspects are in conflict and the bruises are in plain sight on their website. If they focus purely on data, they can be hugely successful as a central repository of open data from all kinds of sources which anyone can browse, download and read/write to using APIs or the web; much like freebase but statistical/numerical in its data content. If they want to do both, it takes them in the realm of creating knowledge from data. Now that is a different problem to solve and to be successful at it, they will need reevaluate their approach and really think more about the problem.
A summation of this post is easy. Swivel does have a lot of the pieces to become a significant online property; but as with most companies a sharper focus and an execution to match that focus with complete clarity is what will make the difference.
Governments, education institutes, numerous independent organizations and corporations have spent millions of dollars collecting all sorts of data. There is data on almost any topic that one might fancy. A large subset of this data is also public, meaning that anyone can access it freely. If you think about all the knowledge contained in that wondrous body of data, it is pretty fascinating.
This data is however presented as an overwhelming arrays of tables with endless numbers which makes it unusable for most people. For one thing, it hurts my head. For another, data in tabular form is just not interesting enough to most people unless you’re like my father-in-law who is a professional statistician. It is left entirely upto a human who cares enough about the subject to understand these tables and extract knowledge. And once they do have that knowledge, it is for their consumption only! There is no way for them share that knowledge with others in an easy and interesting fashion.
Wouldn’t it be cool if there was a way to easily present data as a series of contextualized visualizations and graphs which tell a story and embody consumable knowledge on specific subjects? For example, I have always been curious about and fascinated with the economic aspects of British Imperialism. The other day I stumbled upon this most wonderful resource kept by the University of Chicago. The numbers over there and the insight which they provide is truly astonishing. However, I had no way to share my newfound knowledge in a much more interactive, usable and interesting fashion.
Now if only there was an online tool to help me take relevant data out of that source, make some interactive graphs/visualizations and create a body of knowledge on this subject which is easily understandable by regular folks!
The closest thing I’ve found to this is Swivel. I really like Swivel but I think there are some issues with their approach which I shall cover in my next post.