<-- Advertise here.

Stanford Disserts 3.jpg
[infosthetics@strataconf 2011 by guest blogger Collin Sullivan]
Just as Day 2 at the 2011 Strata Conference focused on authorship methods of storytelling, the speakers I saw also discussed the other side of the story: the audience.

Dustin Kirk ("Designing for Infinity"), Kim Rees ("Small is the New Big: Lessons in Visual Economy") and Jock Mackinlay ("Telling Great Data Stories Online") all emphasized what should be obvious, that after all of the data collection, number crunching and graphical design, what matters most is what readers and users take away from the visual presentation on the page. The theme was alive and well in the "Visualizing Shared, Distributed Data" panel, too, with Roman Stanek of GoodData, Pete Warden of OpenHeatMap and Alon Halevy of Google.

Jock Mackinlay put the question best: "How do we empower people and groups to share data?" Despite the reality that we are now inundated with more data and information than we could reasonably process, Mackinlay nonetheless operates from the standpoint that "data is best served raw." Only presenting a predefined subset of data introduces a bias. This used to be more of a problem when storytelling was strictly a one-way street, a monologue. Information was printed on a page and that was all that was available to the reader.

That is no longer the case. Data visualization is becoming increasingly interactive, and the facilitation of user input was emphasized over and over among all speakers as the most effective way to allow people to access as much data as is available without too much overwhelming. The monologue has become a dialogue, and data is becoming, in a sense, democratized. In Kim Rees' presentation, she cited the Stanford Dissertation Browser as a great example of a graphical interface that presents lots of data but allows the user to view just those relationships that she prefers.

Dustin Kirk's presentation focused heavily on allowing the user to define what she wants to access. He gave examples of websites like Yelp.com and Amazon Diamond Search that utilize different forms of filtering, allowing the user to pare down the visible results by defining new parameters and data attributes. Rees showed us Hipmunk.com, a travel website that allows exactly that kind of filtering, and that is designed in a visually appealing and well-structured way:


While flters work well with Mackinlay's approach to include all the data, they still only show what is necessary, which complies with Rees' emphasis on visual economy. All of the data ought to be available to us, but we need not see it all at once.

In fact, there is so much emphasis on interaction that one of the panelists is working to put himself out of a job. Pete Warden of OpenHeatMap, on the Shared Data panel, explained how crowdsourcing spreads the labor burden from one to many, and that it, combined with automation, makes large projects much more manageable. Ideally, Warden said, he would no longer interact with the OpenHeatMap software. Instead, people would interact with the program and each other, uploading data, merging sets and producing maps and graphs on their own, thus rendering his role obsolete. He compared his current position as less of a creator or artist and more of a conductor, with the crowd being his orchestra. Eventually, he hopes, they will conduct themselves.

On that same panel, Alon Halevy explained what Google Fusion Tables is doing to contribute to the democratization of data. People can upload datasets and spreadsheets and share or collaborate with others, and all of these data sets are free and publicly accessible. People can comment, not only on a document, but on a specific row or cell. And much like Tableau and OpenHeatMap, it allows for easy interactive visualization of large data sets for those who are not graphic designers or programmers. For example, I did a quick search for "World Bank" in the Fusion Tables database, found one related to ranking the world's GDP by country, and within about a minute I was able to create this:

Other people could find that dataset, merge it with other spreadsheets and gain insight into something that others might never have seen. These applications are allowing more people to both author new visualizations and to find and improve them. The playing field is flattening as data and conclusions are becoming more widely accessible. The technology is democratizing the conversation.

This post was written by Collin Sullivan. He is a research analyst for The Sentinel Project for Genocide Prevention, where data collection, analysis and visualization are being used to design an Early Warning System (EWS) to detect and prevent genocide. Collin lives in San Francisco. You can reach him at collin [at] thesentinelproject [dot] org and follow him on Twitter at @inciteinsight.


Great post! However, I think you should mention the talk by Simon Rogers of the Guardian as well, since his keynote and other session were all about telling stories with data. In fact, the Guardian publishes data also to leverage the crowd to find stories in the data.

Tue 08 Feb 2011 at 9:35 PM

I do agree, the availability of data, and computational tools to work with data, are amazing developments for empowerment of citizens. However, systematic surveys of both adults in general and particular professions (eg medical doctors, politicians) show a persistent lack of understanding of statistical information and poor decision making in the contexts of health, environment, and policy-making (e.g. both patients and doctors regularly make bad decisions about cancer screening tests). I am curious about what the revolution in data access and data tools might do to improve this deeply-rooted problem; it is not obvious to me that greater insight and understanding are going to the outcome - we may end up with a lot more participation through computation that is busy-ness on the surface without insight. There is an old phrase: 'the purpose of computing is insight, not numbers'; visualisations convert numbers into graphical forms, but insight is not inherent to those forms.

Tue 08 Feb 2011 at 11:01 PM

Great post! However, I think you should mention the talk by Simon Rogers of the Guardian as well, since his keynote and other session were all about telling stories with data. In fact, the Guardian publishes data also to leverage the crowd to find stories in the data.

Sat 03 Sep 2011 at 11:35 PM
Commenting has been temporarily disabled.