November 10, 2008

NPR's Open Content Strategy

It has been several weeks since my last post on the goals and challenges of launching NPR's API. I still intend to fill out the story in the coming weeks/months.

I will start up again by talking about my recent presentation at Mashery's API Conference last week. The conference itself was primarily focused on the business of APIs. In my presentation, I mainly discussed NPR's goals for opening up an API along with some of the challenges we faced leading up to the launch.

As NPR reviewed the landscape of content syndication, we found that there were quite a few APIs already in the marketplace. Most of them, however, belong to content aggregators (eg. Google, Yahoo!, etc.), user-generated content sites (eg. Flickr, Wikipedia, etc.), and some e-commerce sites (eg. eBay, Amazon, etc.). There were surprisingly few comprehensive APIs from major media organizations. Some organizations, like DayLife, CBS and BBC, offered APIs, but these limited in a variety of ways.

Mostly, these major media organizations were syndicating their content through RSS or extended RSS, such as Podcasts or MediaRSS. This approach has been surprisingly effective - what I call "Really Successful Syndication". It is successful because RSS is simple, widely adopted in the marketplace, and succeeds in driving traffic back to the site. The major problems with RSS are the same things that make it really successful. That is, in the current marketplace, RSS now stands for "Really Stingy Syndication" because it does not contain very much real content. Instead, it provides enough content to drive traffic back to the source, embracing the "lock-down" model of content.

The marketplace is changing dramatically, though, and people have destinations to which they are attached. They go to Facebook, MySpace, etc. and expect to find content there. Content providers will have to put their content on these sites through widgets and other means of distribution. If the users of Facebook, for example, find the content they want on Facebook, then they are less likely to leave Facebook to get more content (unless the user has a keen interest in a specific content provider). As a result, the richer the content is on Facebook, the more likely the user identifies your brand as a trusted news source. So, RSS is ok only if no other providers offer richer content. But it is only a matter of time before the richer content is there...

Because of these changes in the marketplace, NPR decided to release a comprehensive API of all of our content that we have rights to redistribute. If our content is truly open, it will enable users to mash it up, keep it relevant to them, and share it with new audiences in places where those people are. Although NPR.org is still critical to our strategy, we can no longer rely exclusively on the site as a way to reach people.

There were two other major factors in our decision. First, it is critically important for NPR to provide content and services to our Member stations. The API will enable stations to get NPR content on their sites. We also plan to offer local station content through the API, which will provide a local/national view of content to the users. The second major influence in our decision was NPR's Mission to "create a more informed public". By offering both local and national content in our API, enabling users to mash it up and use it in ways that we have not thought of or don't have the resources to execute, we hope to reach and inform new audiences.

Once we decided to release an API, there were several questions that we needed to answer. First and foremost, we needed to establish what our target audiences for the API would be. They are as follows:

  • End-users and other web developers (These users can post content to blogs as well as create innovative ways of using NPR content)
  • NPR's Digital Media team (NPR Product and Project Managers can improve their products using the API without a lot of effort from NPR Developers)
  • NPR Member Stations
  • Content aggregators and NPR's business partners

Serving each of these audiences through the API enables us to seamlessly integrate with them in such a way that it requires very little involvement from NPR's development staff.

In the slides (attached below) from the conference, I have provided some examples of how these audiences are using the API.


We will be discussing more of our challenges in later posts.
-- Daniel Jacobson

comments () | | e-mail

 
November 3, 2008

NPR Roadshow

While we have been pretty busy building tools for our Election Night reporting, we continue working on the API. The feedback so far has been fantastic. Along with encouragement and congratulations we have received lots great suggestions. We have been very excited by the adoption of this technology and the general embracing of this "Brand and Release" strategy. We hope to have some significant and exciting new features in place by early next year.

But what if you want to hear more...?

Well if you missed us present at OSCON 08 there will be other opportunities to hear us first hand discuss what we have done, and where we are going with the API.

Here are several of the upcoming events we plan to be at:

Today (11/03) at 5:15pm PST Daniel Jacobson will be discussing our efforts on the API at The Business of APIs Conference. If you are attending please stop by.

For those in the Public Broadcasting family, we will be at IMA Public Media 09 in Atlanta Feb 19-21. This is definitely a must attend for those in public broadcasting who see their future world meshing traditional and new media experiences.

We are also very excited to be a finalist for the We Media Game changer award. Out of 150 Nominees we are one of 35 finalist. Additionally we could be chosen as keynote speaker based on community votes.

And, finally we recently got the word from the folks at O'Reilly that we have been invited to present at the Web 2.0 Expo Mar 31st-Apr. 3rd.

Hope to see you soon.

-- Zach Brand

comments () | | e-mail

 
September 18, 2008

API Decisions : Why Did We Create It?

As promised, I wanted to give some history about how we ended up creating the NPR API. The first major decision that we were faced with was whether or not we should open up our API. The decision was not whether or not to build it, as we'd already done that. Back in November, 2007, we built the foundation of the API to launch with NPR Music. This is basically an XML file repository (essentially in an extended NPRML format) that contains all data needed to build pages on NPR.org. In addition to the XML repository, it includes a PHP framework used to render the XML files to the appropriate presentation layer (these layers include NPR.org as well as RSS feeds, podcast feeds, mobile sites and other outputs that we serve). Here is a diagram of the architecture which includes all of the caching layers as well, some of which were incorporated with the actual release of the public API:

Click image to enlarge

There are several reasons for this architectural approach:

1. PERFORMANCE : Requests will first go through the Memcache and file cache layers, which will always be the most efficient. If the requested document is not in Memcache, we have PHP render the output using the XML files. If the XML file cannot be obtained, PHP will access the database for the data. If PHP hits the database, however, a version of the request will be stored back in Memcache to speed up the delivery of the next request. This ultimately takes strain off of the database, which is the most expensive operation in serving documents.

2. ABSTRACTION : Creating a separate layer between the various presentations and the actual database allows the presentation layers to be agnostic with respect to the data repository. Currently, our database is Oracle, but if want to move to MySQL, then the presentation layers don't really care because they are served primarily off of the XML repository (although the final fail-over to the database would require changes).

3. SIMPLIFICATION : The database itself is a complicated relational system. The schema is largely normalized for scalability and efficiency in our write operations. Building pages, as a result, requires expensive table joins across very tall tables. These queries, although tuned, add up when you consider how many queries there are throughout a story page, for example. Executing these queries once and storing the data in a flatter file system enables the pages to be built more efficiently (both because of the flatter model as well as not having to access the database).

4. SCALABILITY : Because of the rendering framework, we are able to easily add new transformation and presentation layers without having to write a lot of extra code or customized database queries. The rendering engine knows how to handle the XML files in a cohesive way because they are relatively flat, so the transformation layers really aren't that different from each other. The framework also allows for reuse of code in the presentation layers because most of the presentations are dealing with the same content and are displaying that content in similar ways. New presentations for NPR.org are the hardest because of all of the design nuances, but adding Atom and MediaRSS are pretty quick and painless. The difficult part is figuring out how to map our fields to those structures, not in the coding of it.

So, the system was largely in place almost a year ago, alleviating many of the technical hurdles in building an API. We knew that if we wanted to open the API up to the world we would still have some technical challenges left, including filtering engines, the registration engine, the query generator, etc. Before getting to those tasks, however, we needed to determine if the public API fits with the overall NPR strategy.

-- Daniel Jacobson

comments () | | e-mail

 
September 12, 2008

API Decisions : Introduction

Over the coming weeks, my colleagues and I will blog about the various decisions that we made while developing the API. The posts will discuss the following topics:

* Output formats
* OpenID
* Query generator
* Caching layer and performance
* Number of requests per user per day
* Audio stream vs. Download
* Amount and type of content offered
* Terms of use
* Rights
* Metrics
* Station content
* The archive and the deep NPR archive

I am sure that during the course of this series other topics will be added, but these capture some of the more prominent issues that were discussed. As you can see, these topics involve technical issues as well as legal and business ones.

Before we can get to any of the above topics, though, we have to address the single most important decision that we made: Should we open up the API?. That will be the first post in this series.

The purpose of this series is to continue to be as transparent as we can be and to be an active, engaging part of the technical community. We hope that some of these decisions that we dealt with will help others successfully pursue creating APIs as well. We also hope that this blog will act as a forum to continue the discussion and will help us continue to better deliver useful tools and services.

I am looking forward to the discussion!
--Daniel Jacobson

comments () | | e-mail

 
August 28, 2008

JSON and the Argot-nauts

This is the first of a series of posts that will discuss decisions we made in the design, architecture, and implementation of the API. We hope that our experiences will be useful to you when working with APIs and similar software projects. We also want to hear from you--what you like, what you think should be changed--so we can make course corrections as the API evolves. So put your software geek hats on and let's talk code.

My favorite way to consume the API is using JSON. With just a few lines of code, I get a data object that I can use with JavaScript--no messy parsing of XML or the DOM necessary. The structure of this JSON data object strongly resembles the structure of the NPRML XML output document. In fact, to create the JSON output, we first generate the NPRML document, and then do some transformations to create the JSON output.

However, XML does not map to JSON seamlessly. The XML in NPRML has element nodes that contain either other element nodes or textual content. The element nodes may also have attributes. It is common practice to map element nodes to objects in JSON, with each sub-element becoming a nested object. However, we had to decide on how to treat textual content and attributes.

It makes sense to make the textual content be a property of the object that contains it, but we need a name for that property. We looked at other APIs for a standard naming convention, but there doesn't appear to be one at this time. For example, Google Data APIs puts textual content in a property named $t. The Flickr API uses a property named _content. In the NPR API, we use a property named $text.

Some APIs take a different approach, treating text nodes as string properties of the object, which means the name of the property is the element node name. Yahoo! Shopping Web Services take this approach. This makes the JSON more readable and simpler, but it doesn't work if nodes with textual content also have attributes.

We map element attributes to object properties. This approach is used by many APIs, although some (such as Yahoo! Shopping) create a specially named nested object to hold all of the attribute values. With our approach, this NPRML fragment:

<show>

    <program id="2" code="ATC">All Things Considered</program>

        <showDate>Fri, 22 Aug 2008 16:00:00 -0400</showDate>

    <segNum>12</segNum>

</show>

gets mapped to this JSON:

"show": [{

    "program": {

        "id": "2",

        "code": "ATC",

        "$text": "All Things Considered"

    },

    "showDate": {

        "$text": "Fri, 22 Aug 2008 16:00:00 -0400"

    },

    "segNum": {

        "$text": "12"

    }

}]

Note that the show property contains an array. It is possible that a story was used in multiple shows. We use arrays for properties that could have more than one value. This is done even when a given story has only one value for the property.

We are interested on hearing what you think is the best approach to JSON. Have you seen other approaches that work better? Is JSON important to you? Let us know in comments.

--Harold Neal

comments () | | e-mail

 
August 20, 2008

OSCON Presentation on the NPR API

Shortly after the launch of the API, Harold Neal and I presented it at O'Reilly's Open Source Convention (OSCON) on July 24th. Here is a copy of that presentation (requires Adobe Acrobat). This version of the presentation has been slightly modified to reflect more current data (particularly around usage of the API) as well as some other changes that will help the presentation live as a standalone document. I have also added screen shots of the Query Generator to represent the live demo of the API that we did during the presentation.

Sharing this presentation in this forum is the first step to making our process, architecture and decisions around the API more transparent and open to our users. There will be other documents and blog posts to follow with more information. Let us know if you have specific questions about our process so we can try to address them in these future posts.



Click here to view the presentation ( (requires Adobe Acrobat)


Continue reading "OSCON Presentation on the NPR API" »

comments () | | e-mail

 
August 11, 2008

Suggestions for the Next Version of NPR's API?

It has been almost a month since we launched our API and we are now preparing requirements for our second release. What would you most like to see in the next version? Are there specific fields or standard formats that you would like us to output? Are there topics or other ways of slicing the data that you would like represented?
- Daniel Jacobson

comments () | | e-mail

 
July 21, 2008

Proposing Questions for an API FAQ

First, thanks to everybody for their API-related comments here and numerous other places. We are a bit overdue, but are working on putting up an FAQ for the API. As we have started to compile a list of questions, a common answer is emerging: We didn't want to hold the API back until everything possible was perfect. We do think the API today is very extensive and fills a void, but we also think that it will evolve as time allows, and as we respond to requests and new opportunities. As with everything else, we like to treat all our online efforts as an ongoing work-in-progress, with opportunities to get even better. But for the moment, we're very excited to see what ideas folks implement with it.

I've started a list of questions below. Please chime in with comments on what other questions you'd like see included in the API FAQ.

Continue reading "Proposing Questions for an API FAQ" »

comments () | | e-mail

 
July 17, 2008

API Rights and NPRML

There have been quite a few comments and posts around the Web about our API and I would like to clarify a few points about the offering. I also plan to engage in some of the discussions in other forums but I wanted to address them first in our own blog. To see some of the more prominent discussions, you can see the articles on TechCrunch and on Mashable.com.

A common discussion point on the API so far has been our exclusions. Below are the reasons for the exclusions referenced in both of the above blogs as well as some other details that I want to explain:

  • NPR programs and series, including Fresh Air, This I Believe and StoryCorps, are getting excluded due to rights restrictions. We obviously would like to include these in the API and are looking into making it happen. That said, we did not want to hold up the launch of the API as we researched the rights.
  • NPR programs, including RadioLabs, Car Talk and The Diane Rehm Show are distributed by NPR but their web content is not. As a result, these programs are currently not available on NPR.org or through the API.
  • Other radio programs, including MarketPlace, This American Life and A Prairie Home Companion, are not NPR programs -- they are produced and distributed by other public radio entities like American Public Media or Public Radio Interactive. NPR does not have the access or the rights to distribute the content from those programs.
  • Currently, we are not providing any of our video content in the API, although it is on our future plans. Our goal was to launch with our primary asset well defined, which is audio. There are still a few details that we need to work out before extending the API to offer our video content, but hope to be opening that up soon.
  • Our online database goes back to 1995, including over 250,000 stories spanning 13 years. We are actively working to get more of the archival content, dating back to 1970, into the system and available through the API.
  • NPRML is the XML structure that is native to our entire system and it is the structure that drives all content for NPR.org, the API and beyond. We decided to open it up just to be transparent with as much content as possible. This structure is not meant to be a new proposed standard or to replace our goals to expand our output formats. We do intend to include other more comprehensive formats like NewsML and others in the future.

Although we believe that our API is an extensive offering, it will only continue to grow with time. We really appreciate the feedback we have been getting and will look forward to getting more in the future. Knowing that there is a desire for video, for example, will help us prioritize accordingly to better serve the API community. Please check back to this blog for more information about our API and our future plans.

-- Daniel Jacobson

comments () | | e-mail

 
July 16, 2008

NPR API is Live on NPR.org

As referenced in yesterday's post, we launched our new API today. To find the API, you can either go directly to http://www.npr.org/api/ or you can follow the new link called "Tools / API" on the NPR.org left nav under the Services section.

In order to use the API, you will need to register using our new registration engine that Zach mentioned in a previous post. Once registered, you will need to generate an apiKey by clicking the Generate Key button on the API tab of your account profile. The apiKey is used to authenticate all requests to the API. After you get your apiKey, you can read our documentation or just go straight to the Query Generator, which is a comprehensive tool that allows you to easily create your API requests and see what your results would look like.

There were quite a few questions that we addressed when developing the API, but one thing that was not really in question was the need to open as much of our content as possible. As a result, almost everything that you can find on NPR.org that we have the rights to redistribute is available through the API. This includes audio, images, full text, etc. That said, there are elements, series and programs that we could not offer due to rights restrictions.

We also discussed in depth which output formats we would support. For launch, we are supporting RSS, MediaRSS, Atom, JSON, JavaScript Widgets, HTML Widgets and our custom tagging structure called NPRML. We would like feedback on what other formats we should support, although as of now we are planning to extend it to include NewsML. Which of the existing formats are you most likely to use from our API?

There were a ton of contributors to this new API with the primary technical architect being Harold Neal. Other major contributors include Joanne Garlow, Jason Grosman, Tony Yan, Ivan Lazarte, Stephanie Oura, Ben Hands, Shain Miley, Lindsay Mangum, Sugirtha Solai, Todd Welstein and Vida Logan, and others.

Finally, we would really like to get as much feedback from the community on the API, particularly on what you think you will use and what is missing from the offering. We will continue to post here with more thoughts and questions.

-- Daniel Jacobson

comments () | | e-mail

 
July 15, 2008

Coming Soon: Our New API

In the next couple of days, NPR.org will be launching our new API, which will be an open and extensive way for our users to share and mash-up our content. Once live, we will be adding a new link on the NPR.org left nav in the Services section called "Tools / API". We are very excited about this new tool and are looking forward to the inventive ways that you will use our content! After all, there are only a few of us but millions of you...

As part of the launch, we will also be showcasing several widgets and applications that were built using the API. All of these will be found on our upcoming widgets page, which will launch with the API. Among them is a widget that maps NPR stories based on Geoff Gaudreault's Reverbiage site, and an iPhone site built by our friends at Axiom Stack.


I will post again on the day of the launch to let you all know when it is live. We will also continue to post to this blog to solicit feedback on the API.

-- Daniel Jacobson

comments () | | e-mail

 


   
   
   
null


 

About Us

Ever wanted to peer under the hood and learn about the inner workings of the NPR website? Have we got a blog for you, then. Here at Inside NPR.org, the NPR Digital Media team will keep you up-to-date on digital products and services we're developing, including social networking tools and our media player. For more info, please see our FAQ and our discussion rules.

 
 

Search Inside NPR.org

 
 

Contact us

Got a question or comment you want to send to us privately? Use our contact form.

 
 
 

Browse Topics

Services

Programs