Advertise here.

computational_journalism.jpg
welcome to the longest post on infosthetics. the best written, the most informational & the post with the most images. this is Brad Stenger's guest blog report of the Computational Journalism Symposium.


"The Journalism 3G Symposium on Computation + Journalism took place February 22-23 at Georgia Tech in Atlanta. Going in, we the organizers called it the most technically substantial conversation ever to take place between a group of journalists and computing professionals this large (200+ attendees). Not knowing exactly what would happen we crammed the panels full of programmers and journalists, as well as designers, entrepreneurs, managers, among all sorts of experts and thinkers. By all accounts, attendees were glad for the time they gave the meeting.

One thread running through the program, explicitly and implicitly, was Information Visualization. There was underlying intent for this. The data literacy, communication skill, and intellectual rigor required to practice coherent, understandable info viz make it a template for how computation can impact journalism, even though, as currently practiced, that impact occurs mostly at the margins of the news.

The Opening Keynotes from Krishna Bharat, creator of Google News, and Michael Skoler, creator of American Public Media's Public Insight Network, introduced the two communities to each other. And the implicit Info Viz program with the program started with the first panel discussion, one called Ubiquitous Journalism. The very first speaker, Mark Hansen, ranks among the most data literate people on Earth. A former Bell Labs statistician he holds joint appointments in the Statistics, Electrical Engineering, and Design departments at UCLA."

continue reading after the break.

MORE

"Hansen began his 15 minute presentation with an overview of the "orgy of computation" he and Ben Rubin created, an art piece titled Moveable Type which reflects the enormity of the New York Times journalism operation and history using sound, words, numbers, and 640 small Linux boxes, all while dominating the lobby space of the just completed Times building near the west end of 42nd Street. He followed by describing his UCLA class in Database Aesthetics and a related graduate student seminar called Site Specifics which dig into the richness and variety of data types being collected and analyzed in the world at large. The classes considered why data is created as well as the motivation and incentives for measuring, analyzing and archiving it whether through computational, graphical, and informal methods. Subjects included the One Wilshire data hotel, the UCLA Medical Center's new patient data record, CalTrans' 1300 automobile traffic monitoring stations and control room coordination, in the National Park Service outposts in the Santa Monica mountains outside LA where asset (a/k/a wildlife) counts (and associated non-counts) attempted to gauge the stressed ecosystem, and lastly, how data factors into Frank Gehry's architecture.

hansen_tukey.jpg

But all this was prelude to a pivotal question by Hansen to the assembled computationalists and journalists, "How many of you know who John Tukey is?" When fewer than a half dozen hands hit the air, Hansen noted, "Excellent." He had fresh ears to tell how Tukey defined the field of Exploratory Data Analysis and showed that exploring data was a valuable exercise. To Hansen the ability to tell stories with data comes from understanding the character of data, which in turn comes from experiencing the communities and cultures of the people at the heart of the data's source.

The Tukey bit set up Hansen's explanation of his 10-year NSF-funded Center for Embedded Network Sensing (CENS) project where dozens of automated and robotic sensors have been applied to understanding changing environmental conditions in a patch of San Jacinto Mountains wilderness. Having reached the four year mark of the program, it is at the point where collection systems (mostly) work and data is there to collect. The CENS program is ambitious and anticipates growing into something capable of assessing the relative environmental health of even urban habitats, undertaken with the help of citizen-collected and/or citizen-analyzed data. "I think that citizen involvement will take on the character of storytelling," said Hansen. "In fact we often joke at CENS that after our 10-year funding has gone that perhaps we can perhaps transition the whole Center into a School of Journalism. And that was before this meeting."

The citizen science aspects of CENS led Hansen to create Sensorbase, a data store and social network for this work, and to have his students develop things like personal environmental impact assessements as Facebook apps. Those Facebook apps, built on credible scientific models, that with the additional data create a nice feedback loop that furthers improve the accuracy of the models.

hansen_facebook.jpg

During the Q&A, Hansen was asked by the moderator about the practical impacts of ubiquitous cell phones and their potential as easy-access sensors. It gave Hansen a chance to mention work he's currently doing with his students on creating frameworks for case-making "campaigns" where citizens use their phones to document the broken state of their neighborhood related to infrastructure or noise. Coupled with easily accessible analysis tools like Swivel and Many Eyes, the campaign framework at least affords the opportunity for citizens (including journalists) to include authoritative data-dependant material in the public discourse.

At the reception after Friday's session, I talked with John Stasko, Chair of the most recent Info Vis conference in Sacramento, and as a Georgia Tech professor was an early supporter of Journalism + Computation. I brought up the Tukey crowd count, and asked if he thought that having such limited data literacy would make it difficult for him and his fellow presenters the next day when they would discuss Sensemaking and Information Visualization. He relayed to me a story about a reporter who had a story on data mining supposedly undertaken by the Department of Homeland Security (DHS). Fortunately, the public greeted the story with a "Well, duh?"-type non-response instead of any sort a public outcry. I mention it because of the resulting chilling effect it since has had on DHS and government acknowledgement of analysis activities. Despite the forward steps towards greater data literacy, public participation, and transparency prompted by Mark Hansen and his students, backstepping can and does also occur.

The Stasko panel also featured Jeff Heer, currently finishing his Ph.D. at UC-Berkeley and being recruited by every top Computer Science department, and Xaquin Gonzalez Viera, an art director for Newsweek.com who handles its interactive data graphics. We the organizers had asked Dr. Stasko to make some mention of Sensemaking, a subject that is gaining traction where Information Visualization meets Human-Computer Interaction, and provides useful shared context for understanding how people assess data and situations to make decisions. Encouragingly the term would appear a few more times later during the day, as we hoped it would, when both journalists and computationalists described either processes or goals for their work.

Stasko's larger message was about Visualization and how it can help people to think. After a quick example of how graphs speed understanding of data otherwise presented in tables, he connected it to similar work undertaken in journalism. A scan of a faded newspaper info graph that showed noisy flight paths into and out of Atlanta's immense Hartsfield-Jackson airport easily explained why so many pilots chose to live in Peachtree City.

stasko_airport.jpg

The audience got a look at (but not a demo of) the Jigsaw project which Stasko has developed to connect the people, places, and noteworthy things that live in text, and which can be connected throughout vast quantities of text and across multiple display screens.

stasko_jigsaw.jpg

Cliff Lampe, a Michigan State Communications professor, asked Stasko about the relationship between visualization and statistics on one hand, and anecdote and narrative on the other hand, when it comes to persuasion and the effective impact of data-dependent communication. As Stasko had said moments before, the research work hasn't yet been done to unpack the viz from the act of storytelling when the two combine to generate persuasive power. But the question was conveniently taken up by the next speaker, Jeff Heer, who started his talk by mentioning two prime examples of data viz persuaders, Hans Rosling and Al Gore. Heer attributes it to the simple "horserace" dynamics of points in motion, which makes sense given how much attention live sports broadcasts receive, and where announcers punctuate the ongoing progress of a data point (usually some sort of ball).

Heer was almost Rosling-like, holding the crowd in the palm of his hand, telling data stories about his work on vizster with social network data. He made the highly relevant point to journalists that by merely building the proper tool for the proper data set, people would reveal themselves socially in stories that depicted group histories. The testbed that he developed with Fernanda Viegas and Martin Wattenberg, sense.us, supports collaborative visualization of 150 years of US Census data. Heer pointed to a handful of worth knowing examples where the data, especially with employment data, could motivate users to ask questions, collect their impressions, and really engage. Heer, Viegas, and Wattenberg found that newcomers, (they were called 'voyagers'), were the primary source of active engagement with the data and of seredipitous discovery that could generate new insights. More experienced users, (they were called 'voyeurs'), were more likely to troll comments and investigate others' insights which could launch them back into voyager mode.

heer_senseus.jpg

Like many who spoke Jeff presented his opportunities to connect with the other camp list, and it bears mentioning. As a toolbuilder and researcher himself, it seemed like these items ranked high on his list of priorities. Beyond the expected opportunities in data management, visualization, and collaboration, Heer pointed to opportunities to examine persuasion sorts of questions related to storytelling and structure that could increase impact, reduce opinion biases, improve discourse, and maybe better people's ability to express themselves and their ideas. One last Heer proposal was especially interesting, noting that social visualization could conceivably increase the pool of good questions out in the world giving journalists the opportunity to become more answerer than reporter, probably an inevitable dynamic that stands to strain the norms of traditional journalism practice.

Last to speak on the panel was Xaquin GV from Newsweek.com who started his presentation with a couple nice bits of info viz that showed the editorial priorities of his employer and his current situation. Out of the 34 person editorial staff, nine would be considered as either artists, creatives, or multimedia, that is non-writers and non-editors. Three of the nine would roughly fit the description of a non-traditional art department. Of these three, one, namely Xaquin, does interactive information graphics for the website. His next info graphic compared contrasted his situation with the New York Times graphics department staffed with "cartographers, visual thinkers, and data analysis experts. They have 3-D experts, visual thinkers. They have designers," said Xaquin, where Newsweek.com has only him, by himself. The situation contrasted the staff of 7 he had in his prior job, graphics director at El Mundo, a magazine based in Spain.

xaquin_nytimes.jpg

Xaquin showed his favorite works from the year he's been at Newsweek.com. A piece on Black Tides clearly showed the preponderance of oil spills off the coast of his native Northern Spain, and their contrast to in frequency and severity to both US coasts. A second plotted economic indicator charts over the Alan Greenspan Era overlaying timeframes and details of his major decisions. A timeline for space exploration missions by nation, and state and metro map showing a correlation between payday lenders and Conservative Christians (finished just last Thursday, Feb 21). Xaquin's favorite work was his examination of global skyscraper projects undertaken since September 11, 2001. Timelines show twelve of the biggest buildings around the globe have been started and finished, while an image of the hole at Ground Zero in lower Manhattan dominates the upper left corner of the image. He uses this to show editors what he can do with interfaces and visualization that plain text cannot, saying "It's way more effective than a 1000-word article."

xaquin_wtc.jpg

A woman asked about editorial decision-making in Xaquin's online journalistic work that contrasted Heer's open-ended tools, noting that there were limits to what one could learn as a result of their different approaches. And Xaquin acknowledged that his approach depended on his point of view, recalling the different levels of pollution in recent American oil spills compared to those off of Europe. It could also a direct of effect of the way Newsweek has put in a spot where's not collaborating heavily.

Michael Skoler, one of the Opening Keynotes, weighed in with experience that the burden of cleaning data had outweighed the benefits that come with good visualization tools. Researchers Stasko and Heer agreed that substantial pain usually accompanies the data munging that precedes visualization, and that machine and human solutions are actively being worked on. Not mentioned, though I talked to Mark Hansen about it, is the payoff in insight and understanding that comes out of the effort it takes to closely examine data for correctness. Just like it's wrong to expect an article to easily write itself, a dataset shouldn't necessarily be easy to prepare for visualizing. And in much the same way that experience tells writers and editors which stories to invest their energy in, experience will, in time, serve news people working with data visually on computational journalism.

Sanjay Sood who is a lead developer for AllVoices.com is going through that right now as he develops methods for automating the process for geocoding and iconizing a wide range of stories from around the globe so that they can presented in ultra-dense, yet understandable fashion on rectangular map of the Earth.

allvoices.jpg


And then sometimes the payoff is obvious, like with Everyblock, which is hyperlocal news that is in stark contrast to the hyperglobal AllVoices. Interaction designer Wilson Miner showed one of the first decidely journalistic uses of Tufte's sparklines. They show aggregate statistics for crime in Chicago neighborhoods, and can be expected to be deployed for any other sort of local statistics that bear aggregating. (Everyblock is also in beta in New York City and San Francisco.) Miner also has the word-size infographics function as an interaction mechanism which cues mouse clicks which in turn lead the reader to further explore the data Everyblock has collected.

everyblock.jpg

It was a hopeful sign in a situation that seems to have frustrated the first generation of journalists who are transitioning to what's next in the field. The info viz skillset with its data literacy, programming knowhow, design sense, and communication effectiveness should figure prominently in the computation and journalism dynamic that has now officially begun."