<-- Advertise here.

Yesterday, the New York Times blog announced the initial release of the New York Times Article Search API [nytimes.com]. This makes it possible to explore the first occurrence of the term "data visualization", or identify articles that appeared on the front page and mentioned "blog". The New York Times now has several APIs available, including (searchable) articles, best sellers (books), campaign finance, community (comments), congress (vote data), movie reviews and dictionaries (will there be a infographic / "interactive media" category, or is this already part of "articles"?).

The Article Search API is meant to make way to find, discover, explore, have fun and build new "things" based on the rich content available on the New York Times. The database contains over 2.8 million articles from 1981 until today (updated hourly). Each article comprises about 35 searchable metadata fields, from title and byline to thumbnail image and geographic region. A few applications seem already to exist, such as some I did not know before like Who's on First. Surely more will follow?

See also the New York Times Visualization Lab.

More information at the forthcoming Times Open event. Any attendee wants to guest blog the experience?


This is cool, but they seem to want to have their cake and eat it to.

Their documentation makes various mentions that this is for non-commercial use, and their terms of use say something similar, but in a more ambiguous manner (suggests commercial use might be ok, but only if it doesn't compete.

They want other people to work for free (or on grant money, I guess), to help them figure out how to keep extracting value from their archive. I'm sure they'll get some takers, but I think they'd end up getting more value in total if they let developers monetize things.

Mon 09 Feb 2009 at 8:31 AM

Actually, in my

Wed 11 Feb 2009 at 8:52 AM

"Who's on first" link is broken. I think you mean this guy:
That project's been there for years now, definitely predates the API. It's also more than a visualization; it dumps the metadata for each day's news into an RDF file, which makes it incredibly easy to query based on relationships using e.g. SPARQL. The NYT actually provides so much good metadata that for many purposes, this RDF dump could be used instead of the API, avoiding throttling/fee issues.

Fri 20 Feb 2009 at 3:34 AM

Thank you, Sam, for pointing this out. I have edited the post.

And indeed, that project predates the API, but seemed relevant for this story anyway.

Fri 20 Feb 2009 at 11:38 AM

TY for the great post! I would not have gotten this myself!

Mon 15 Nov 2010 at 7:24 PM
Commenting has been temporarily disabled.