Inside NPR.org

Inside NPR
 

archive:

Wednesday, September 2, 2009

By Jon Foreman

Please welcome the latest NPR API-powered app: it's the NPR gadget for iGoogle. You can check out the gadget for yourself by adding it to your iGoogle page.

Produced in collaboration with Google, the gadget offers maximum convenience to iGoogle users since content can be consumed entirely within iGoogle. It is possible to scan headlines, listen to audio, read stories, share stories, set up custom feeds, display the headlines of a favorite topic and even play the 'Wait Wait...Don't Tell Me!' news quiz all within the confines of a customized iGoogle page.

Most of the items displayed in the gadget are delivered via the NPR API: headlines, story text, story audio and related story links. Items that don't make use of the API are the The Wait Wait...Don't Tell Me! quiz which is driven by a custom XML document, the sponsorship banner powered by JavaScript and the hourly news and program stream which are direct links to an mp3 file and stream respectively. Stories in the gadget can also be shared with friends -- this is powered by iGoogle's latest social features.

Here are some screen shots of the gadget:

Home View

gadget home view

Continue reading "Introducing The NPR Gadget For iGoogle" >

tags: , ,

categories: 3rd Party Tools, API

1:51 - September 2, 2009

 
Monday, August 24, 2009

By Daniel Jacobson

Last week, at the request of Rob Bole, I gave a presentation on NPR's API to the staff of CPB. This presentation was the first in a series that Rob is hosting to expose the staff of the CPB to the rapidly changing technology advancements in the digital media space.

This presentation is largely similar to the presentation I prepared for OSCON, with a few differences. The OSCON presentation focuses more on the technology of the API and digs a little more into the usage from a technical perspective. In the CPB presentation, on the other hand, I spent more time explaining what API's are, why they are useful, and the particular reasons why NPR built the ones that we have. Both presentations share a lot of the more interesting uses of the API across the four major target audiences: NPR, stations, partners and the public.

So, here is the full CPB presentation.




Click here to view the presentation (requires Adobe Acrobat)

tags: , ,

categories: API

10:30 - August 24, 2009

 
Thursday, August 13, 2009

By Harold Neal

One of the great aspects of NPR.org is our deep archive of free content. For example, you can browse our News topic archives back through 2004. (We have stories back through the mid 1990's, but the older stories are not classified into topics.) In the new site, we made a subtle but important change in how you can navigate these archives.

Previously, our archives used what might be called "search results" style navigation:




The first page of the archive displayed the next most recent 15 stories (after those appearing on the topic home page), and there was a set of numbers at the top and bottom of the page from 1 to 10 that allowed you to navigate further back. The number 2 took you to the next page of older results (results 30-45), the number 3 took you to the third page (results 46-60) and so on. For deep archives with hundreds of stories, there were special arrows that would let you navigate to pages 11-20, 21-30, etc. of the archive. If you've used any search engine, you will be familiar with this style of results. However, for an archive like ours, the date the content appeared is very important. Suppose you were interested in Politics stories about the 2006 midterm elections. The search results style navigation makes it difficult to find the stories you are interested in. You had to either try to guess which page of results corresponds to 2006, or you had to move backwards one page at a time until you found the stories you wanted.

On the new NPR.org site, we emphasize organization of the archives by date. All of the archive pages have a handy calendar in the right column that lets you jump to the time period of your choosing. From there you can still page around using the "New Stories" and "Older Stories" link if you don't know the exact date of the story you are looking for.




A second advantage of date-based archives is that when you bookmark an archive page, it will have the same set of stories when you come back to it later. With the old search results style navigation, new stories were continually being added to the top of the archives, pushing everything before them deeper into the results pages. So with search results style navigation, when you came back to the bookmark a few days later, it contained a completely different set of stories.

Technical Design

Our old archive pages pulled results using complex queries against our main Oracle database. Each new type of archive required its own specialized query, which made the code harder to maintain. The new archive pages run directly off of the NPR API. The API is well-suited to finding lists of stories by dates and by topics and other criteria. The API uses a MySQL database with a schema that is optimized to do these sorts of queries, and the query remains the same whether you are looking at topic archives, column archives, reporter archives, or any other type of archive on our site. By using the API, we get the advantage of the optimized schema plus the data caching built into the API. In pre-release testing, we found that the new archive pages are about 80 percent faster than the old archive pages. Finally, since the API has the logic built into it, we don't have to maintain distinct code to manage the way the results are returned.

categories: API

10:28 - August 13, 2009

 
Tuesday, August 11, 2009

By Daniel Jacobson and Adam Martin

It has been more than a year since NPR released our API and formally announced it at OSCON 2008. A lot has happened since that initial release. Among other things, NPR extended the original release with enhancements like our Station Finder API, Mix Your Own Podcast and Full Transcript API. We also added great new content, including Fresh Air, StoryCorps and extending our MP3 repository by adding more than 150,000 new MP3 files to the API.

In addition to these enhancements, the API has made a lot of progress in reaching new audiences. A wide range of NPR stations are making more extensive use of the API, including WBUR's new site, Minnesota Public Radio's new site and North Country Public Radio, among others. Meanwhile, other users in the general public have created fantastic mashups including NPR Addict (for the iPhone), NPR Backstory, code wrappers in Ruby and Perl, as well as many other mashups and widgets.

Equally important to NPR is how we have taken advantage of the API. Not only are we using the API extensively with partnerships, it is the foundation of the new NPR web site. Moreover, we extended our content management tools to enable content producers to add API feeds to any story or aggregation page on the site without any developer intervention.

At OSCON 2009, two weeks ago, Adam presented these details as well as usage statistics, the future of the API and more. Here is a copy of the slides from that presentation:




Click here to view the presentation (requires Adobe Acrobat)

tags: ,

categories: API

12:36 - August 11, 2009

 
Tuesday, August 4, 2009

By Andy Carvin

Ever since the NPR API came out a year ago, we've toyed around with the idea of inviting local coders to NPR headquarters over pizza and beer to see what kinds of apps and mashups we could come up with together. It seemed like a fun idea, but we realized there was an opportunity for something even more powerful. What if we brought together all sorts of people interested in collaborating with public radio and public TV, to see what we could come up with, including digital tools, citizen journalism and other types community-centered initiatives?

As we talked with our colleagues across the public media system and beyond, it became clear we needed to host a really big camp - a national PublicMediaCamp, that is.

PublicMediaCamp logo

On the weekend of October 17th at American University's campus in Washington DC, NPR, PBS and the AU Center for Social Media will co-host a two-day event that we hope will serve as the kickoff for similar community collaboration events around the country. PublicMediaCamp is going to be organized as an unconference - an event without a rigid, top-down programmatic structure, with the sessions organized by the participants themselves. We're modeling it on other unconferences like Barcamp and Podcamp, which have successfully spawned similar volunteer-driven events around the world, as well as public media unconferences that have been hosted by Minnesota Public Radio and KUSP in Santa Cruz, CA.

All of these unconferences have one thing in common - giving all participants a chance to play a leadership role in the event's success, using tools like wikis and Twitter to plan the event. (Our Twitter hashtag is going to be #PubCamp, to keep it nice and brief.) And that's why we're modeling this event on unconferences. Public broadcasters are well-established pillars within their communities that have inspired a special bond with the public surrounding them. We've been very successful at organizing financial capital campaigns - particularly in the form of pledge drives - but there's still a lot more we can do when it comes to organizing social capital campaigns, in which local volunteers team up with public broadcasters because they've got specific skill sets that can strengthen stations and the community at large. And the only way we can explore the possibilities is to talk to each other, brainstorm and build things together.

Continue reading "PublicMediaCamp: Strengthening Public Broadcasting Through Community Collaboration" >

tags: , , , , , ,

3:49 - August 4, 2009

 
Wednesday, July 29, 2009

By Daniel Jacobson, aka @daniel_jacobson on Twitter

As mentioned, our site redesign resulted in more changes than just the visual ones on the site. Another change that was part of this launch that we are very excited about is the significant updates to the API, as follows:


Major Additions


Transcript API
We are excited to introduce our new Transcript API. This API offers all of our transcripts dating back to May, 2005. As of today, we are opening up over 80,000 transcripts through this API, and this number will grow with every new radio story we produce. This API contains the same transcripts that are now offered on the new NPR.org.

DISCLAIMER: The transcripts that we do have are only for our main programs, including All Things Considered, Fresh Air, Morning Edition, Talk of the Nation, Tell Me More. Weekend Edition Saturday and Weekend Edition Sunday. The transcripts take a while to produce, so they are typically not available until several hours after the program is over. Finally, while we believe the transcripts are largely accurate, there are some cases where they may not align perfectly with the audio or have grammatical or spelling errors in them.

Added Even More MP3 Files
Today, our MP3 repository goes back to 2005, all of which is available through the Story API (to the extent that we had rights to distribute the MP3s). As of later this week, we will be back-filling our repository with MP3 files dating back to 2001 for Morning Edition and All Things Considered. The rest of the programs will be extended to go back to 2003. With this offering, we are providing about 200,000 unique MP3 files through the API totalling more than 15,000 hours of MP3 audio. And these totals will grow as we add MP3 files for new stories and as we continue to back-fill our MP3 repository over time.

Improved Query-Ability : Query Filtering
With this release, we have added the ability to apply a new parameter to your Story API queries, called "requiredAssets". This parameter can receive "text", "images", "audio" or any combination of them. This list will also expand over time. By using requiredAssets, your query will tell the API to only return those stories that have the specified asset. So, for example, if your query has requiredAssets=audio,images, only stories that have at least one audio file AND one image will be returned. RequiredAssets is not currently an option in the Query Generator but will be added soon. In the meantime, this parameter will have to be manually added to your query.


Other Enhancements or Changes


Topic Changes
For the new site, we modified our topic structure to better reflect the kinds of stories that we produce and the way our users will navigate the site. Some of our old topics were retired, others were renamed, while a few were split and moved. For all of these cases, as strong believers in maintaining backwards compatibility, we set up server-side redirects to point all killed topics to new topics that closely (or exactly) relate to them. No old topic ID should fail, neither in returning results nor in having those results be sensible for that topic. The Topic ID list (XML) in our API reflects all of our current topics. Retired topics will continue to be valid and available, although new stories will not be added to them.

You can see all of the changes to our topics here (PDF).

Thumbnail Images
Prior to this release, thumbnail images were only displayed in their own XML element, called < thumbnail >. That element still exists (for backward compatibility). Because of the new format for the website, we now offer a < medium > and < large > thumbnail as sub-elements, the former being a 75 pixels square while the latter is 90 pixels square. In addition to the changes to < thumbnail >, the thumbnails have also been included in the standard < image > output elements.

Story List Element
For NPRML queries, we provide a < list > parent element as a container for the stories that get returned. This container, in addition to the title and teaser information that it has always provided, now offers links back the original API call as well as to an HTML page (if a related one exists). These kinds of outputs help the API be more REST-ful.

Version Upgrade
All of these changes culminate in a version upgrade. The new version is .93.

In addition to the new features mentioned above, I also think it is interesting to point out that the new NPR site is heavily dependent on the API. The dependencies are both in infrastructure and in enabling more extensive ways for efficient and expedient content creation. These topics will be covered in other blogs posts in this series. In our next post, however, we will discuss how we got started on the technical implementation: new tools, new processes.

We are very excited about these additions to the API and would like your feedback. Do these changes inspire new mashup ideas for you? What else would you like to see offered in our API's?

(By the way, for more conversations about NPR's API and other technical advancements, follow us on Twitter at @NPRTechTeam.)

tags: , , ,

categories: API

10:05 - July 29, 2009

 
Wednesday, July 8, 2009

Media Mashup - OSCON 2009

Later this month, the annual Open Source Convention (aka OSCON) will be hitting San Jose, CA. For this conference, NPR has teamed up with The New York Times to present Media Mashup, an event focused on exploring creative uses of content from the NPR API and Times APIs.

The day will start at 10:45am with a presentation by Derek Gottfrid on the Open APIs of The New York Times. My presentation on NPR, Open Content and APIs will follow Derek's in the same room, starting at 11:35am. Right after my session, we will go across the hall to hold our Media Mashup, starting at 12:30pm.

If you are interested in coming to the Media Mashup event, please let us know in advance by registering and giving us your thoughts on what you would like to discuss, review, code, etc.

For this event, we have provided several examples of mashup ideas in GitHub. They can be found on our Media Mashup splash page. Please feel free to explore these ideas, make them better, create branches, etc. And come to the session with ideas of your own as well. These mashups are really meant to just get the conversation/development started.

To keep on top of this event, ask questions about it, etc., please follow @NPRTechTeam and @TimesOpenon Twitter.

I should also point out that we will providing food and beverages for this event.
--Daniel Jacobson

tags: , ,

categories: API

4:05 - July 8, 2009

 
Thursday, June 25, 2009

In the next month or so, we will be making some significant changes to NPR.org. Some of these changes are visual, while others are architectural. As a result, there will likely be an impact on the API. That said, we did put in a lot of effort to make the system as backward compatible as possible to ensure that API users would be as minimally affected as possible.

I will post again to this blog soon with more details on the changes. In the meantime, here are some high-level descriptions of what to expect:

1. There will be changes to our topic structure resulting in some topics being eliminated, others being added, and others being renamed. For any changes to existing topics, there will be redirects to corresponding topics that will be maintained for a reasonable period of time to ensure backward compatibility. Our goal is to ensure that any applications that are dependent on specific topics existing will continue to work.

2. There will be some nodes and parameters added to the API output for NPRML. These will largely be to support the new features on NPR.org. They should not break any applications dependent on NPRML unless those applications require that these additional elements do not exist.

3. There will be new products and extensions added to the API. These will not adversely affect any current API calls.

As I mentioned earlier, I will publish to this blog again with more details on the changes as we draw closer. In the meantime, please provide any feedback or concerns about this in the comments section for this post.
--Daniel Jacobson

tags:

categories: API

2:02 - June 25, 2009

 
Monday, June 8, 2009

One of the things that I am most commonly asked about regarding the NPR API is rights management. Because we are distributing content to unknown destinations, it is critical to make sure the API itself can control what gets offered and to whom. To handle these kinds of issues, we built a robust permissions and rights management system into the API. But that is not enough. Rights management starts with contracts and ensuring that the content is tagged appropriately. Without these steps, the rights management system cannot accurately withhold the content that is not allowed to be distributed. So, here is a breakdown of the steps we went through and the systems we built to handle rights in our API.

Contracts
Before launching the API, we spent a lot of time with our legal team reviewing existing contracts and our rights tagging system. Based on this review, we determined that a few changes needed to be made to the rights tagging system, but there were quite a few restrictions on what could be offered through the API. One interesting example is Fresh Air. Fresh Air is a program produced by WHYY and distributed on the radio by NPR. NPR is also responsible for displaying the content on NPR.org and is allowed to distributed Fresh Air content through limited outlets, like RSS, based on the terms of the contract. At the time of launch, however, NPR was not permitted to offer Fresh Air content through the API using the richer output formats. By the December 2008 upgrade to the API, however, the contract was renegotiated to include distribution through the API.

This highlights two points. First, at launch, we needed to incorporate a rights management system in the API that could identify specific types of content and then restrict that content from being distributed for certain types of users. The second key point is that NPR has been shifting our contract strategy to enable more content that we pick up to be distributable anywhere NPR content appears, including through the API.

Rights Tagging System
Our system for tagging assets not produced by NPR is critical for the success of rights management. That said, a sizable portion of this system involves manual effort. After all, it is the editorial process that chooses stories from external sources (e.g. AP, Reuters, etc.), images, videos and other assets. Upon selection of these assets, editorial staff then enter them into our content management system that contains appropriate fields for tagging the owner of the content.

Of course, we do have scripts that pull in some materials, like the AP Business feeds on our site. Those stories and assets that get pulled in through automated systems also get tagged by the scripts.

Finally, we also have scripts to remove content from our system based on contractual obligations. For example, if we have the rights to present an image for only 30 days, these scripts will purge the system of that image and its metadata at the appropriate time.

Rights Management System
After we determine what we are allowed to do based on the contracts, and after appropriately tagging the content itself, we were able to create a pretty flexible and powerful system for managing the distribution of the content through the API. This system has four aspects to it, including query-level filtering, story-level filtering, asset-level filtering and user permissions.

Query-level filtering enables the system to remove any story or list (ie. topic, program, series, etc.) from the system due to the permissions. It does this in two ways. First, the system will analyze the API query for any IDs that the user does not have permissions to access. If, for example, the user does not have the rights to view content from This I Believe and the user has included id=4538138 in their API query, the story-level filtering will remove the ID from the query and will proceed to execute the query without it.

Once a valid query passes through the system and figures out what stories to return, the story-level filter gets applied. This filter determines which individual stories need to be removed before returning the feed back to the user. This is done by applying the list of IDs in the filter, for the user's access level, as exclusions in the query to the API. The list of IDs in the filter include list IDs (eg. topics, programs, series, etc.), so the same rule applies to any stories that belong to any of these lists. For example, we have already established that my API key does not give me permissions to see stories that belong to This I Believe. If I request the top 10 stories that belong to the Opinion topic, and if the third story is a This I Believe story, then the system will eliminate the the third story and will add the eleventh to the results to accommodate my request for 10 stories.

Asset-level filtering is less stringent that story-level filtering in that it does not remove the story completely (as in the example above). Rather, it will display the story, but will only return those assets that the user has the rights to see. For example, if I request the top 10 stories from the People & Places topic, that result set may include a story from Fresh Air and This I Believe. In this case, let's say story number three is still a This I Believe story and story number seven is a Fresh Air story. We have already established that my API key does not allow me to see This I Believe, so the story-level filter will remove the third story and will include the eleventh in my results. Meanwhile, my API key allows me to see Fresh Air stories, just not all of them (any such restriction is no longer the case, but when we first launched the API, Fresh Air was only available through RSS). As a result, the seventh story will get through the story-level filter, but the asset-level filter will remove all assets other than the RSS information. We have other asset-level filters for audio, images, video, full text, etc.

The final element of this system, which has been mentioned throughout, is permissions. Our permission levels include Public, Partner, Station, NPR.org and Master, with increasing level of access in that order. For each level, there is a distinct list of IDs associated with each filter type (although the query and story filter lists are always the same). As a result, the same story in our system can theoretically be removed for the Public user, only have RSS content for Partner users, have everything but images for Stations, and be fully available to the NPR.org users. Meanwhile, a different story can theoretically have a completely different permission scheme enabling NPR.org users no access to it while public users can see it all.

To see how this filtering layer sits on top of our system, here is an architectural diagram:



Click here to enlarge

Ongoing Challenges
Although this system handles our cases for the most part, rights filtering is and will always be a challenge. There are certainly cases that could sneak through the system. These cases could be a result of the editorial process, the tagging tools or the code in the API. We also encounter new scenarios that sometimes require us to quickly modify the API to handle them. Despite these challenges, we have been pretty happy with this system so far.

--Daniel Jacobson

tags: , ,

categories: API

9:55 - June 8, 2009

 
Monday, April 6, 2009

Today, we added two updates to the API, as follows:

XML Field Remap
This new functionality allows you to modify our NPRML elements to whatever you want, so your API requests can fit your existing applications without you having to change your code. The remap function allows any node or any attribute to be renamed and it can apply to any number of elements in the document. And again, this only applies to the NPRML output. To see how it works, go to the API Input Reference. In the meantime, here are some examples of how to modify the API query string to implement the remap:

- To change the list element, use "remap=list:newList", which will rename the list node to "newList".

- To change a sub-element of list, use "remap="list.title:newListTitle", which will rename the title node under list to "newListTitle".

- To change the story element, use "remap=list.story:newStory", which will rename the story node to "newStory".

- To change a sub-element of story, use "remap=list.story.title:newStoryTitle", which will rename the title node under story to "newStoryTitle".

- To change a attribute for any element in the NPRML output (even if the node itself was changed), use "remap=story~id:newStoryId", which will rename the id attribute for the story node to "newStoryId".

- To apply many of these changes in a single query, use the comma to separate the remap commands, as follows: "remap=list:newList,story:newStory,story.teaser:newTeaser,
story~id:newStoryId,list.story.text.paragraph:textParagraph".

Most Emailed Feed
We also opened up the Most Emailed list through the API. Previously, it was only available as an RSS feed, but now, it can be accessed through the API, including access to full text, audio, images, and other assets that NPR has the rights to redistribute. There are a few limitations in the feed, however, that are not present in any of our other existing options in the API. For example, this feed cannot be mashed-up with any other feeds from the API, it cannot be sorted, and the queries cannot be restricted by date or search term. As a result of these limitations, the Most Emailed feed is also not present in the Query Generator.

To acces the Most Emailed feed, add "id=100" to the API query string.
--Daniel Jacobson

tags: ,

categories: API

10:28 - April 6, 2009

 
Thursday, March 19, 2009

NPR's programs "Day to Day" and "News & Notes" will be broadcasting their final shows on Friday, March 20, 2009. Although the programs will no longer be producing new shows, the entire archive for both of these shows will still be available on NPR.org and through the API. Eventually, we will likely remove these programs from the API Query Generator, although their IDs will still be valid in the API and can be found in the API Mapping Index.

API queries that use these program IDs will not return any new content after this Friday. The IDs, however, will remain valid, so your applications should continue to work as expected.

Finally, to access the full archives of these programs in the API, you can use the functions available in the Control tab of the Query Generator. These functions include searches based on search terms and date ranges and allow for the ability to paginate through the results.

Please let us know if anything unexpected happens as a result of this change.
--Daniel Jacobson

tags:

categories: API

3:26 - March 19, 2009

 
Wednesday, March 11, 2009

As mentioned in Zach's previous post, I will be part of a panel at SXSW. The panel discussion will be on APIs, is called "Get Me Rewrite! Developing APIs and the Changing Face of News", and is on Sunday at 3:30pm. For more information on the panel, go to the SXSW page for this panel.

The panel moderator is Jacob Harris, from The New York Times. Joining the discussion will be Brad Stenger from Wired, and John Donovan from Daylife.

We will have a substantial time set aside for Q&A although prior to the Q&A we will be addressing many of the challenges in producing and maintaining APIs. That said, there are myriad things we can focus on when discussing APIs...

So, please let us know what is most on your mind. What kinds of questions do you want this panel to answer? Are you interested in technical background, business goals, legal issues, getting corporate buy-in, the marketplace for APIs, etc.? We will be using this feedback to refine our topics accordingly as we finish preparing for the session.

--Daniel Jacobson

tags: , ,

categories: API

9:46 - March 11, 2009

 
Wednesday, January 7, 2009

We've had a positive reception to the Mix Your Own Podcast tool launched December 18. Here are a few tips to help you get more out of this new feature.

Every Story is an Episode

Our traditional podcasts, launched in August 2005, often combine multiple stories in a single podcast episode. For example, the Economy podcast has episodes that typically contain 4 stories, delivered on Tuesday and Friday. With Mix Your Own Podcast, each story appears as its own episode. Here is a Mix Your Own version of the Economy podcast. This allows you to download the stories as soon as the audio is available on NPR.org, and it gives you more control over what you want to listen to.

However, if you set up a podcast on a popular topic, you may get several episodes per day, so you may want to adjust your podcast software to keep more episodes available. In iTunes, this is done by selecting the Podcast Tab and then clicking the Settings button on the lower left. You may also want to set your software to download episodes more frequently so that you get timely news as soon as it is available. Here are some suggested settings.

Click to enlarge

 

Refined Search

Mix Your Own Podcast finds stories relevant to your interests in one of two ways. First, NPR categorizes stories in many different ways: the program on which the story was aired/published, topics associated with the story, the reporters of the story, musical artists featured in the story, and so on. You can use any of these pre-existing categories to build your podcast. In the Mix Your Own Podcast tool, pre-existing categories will appear as you type in the keyword field. You can select these categories by clicking on them.

Mix Your Own Podcast drop down

 

Second, your podcast can be based on free text searches of the content of stories. Originally, this search was done on any text content found on the web page for the story as well as the audio transcripts for the stories (if available). While comprehensive, this can find stories that are only tangentially related to your keywords. For example, if you entered "Cat" as your keyword, your podcast could include stories where a reporter used the phrase "Let the cat out of the bag." So, we have changed the way text search is used in Mix Your Own Podcast; now, we will only search the title and the summary of the story. This should provide more relevant stories for your podcast. This change took place automatically, so you don't have to make any changes to your podcast to take advantage of it. However, if you liked the full text search, see the next tip.

Mix Tool for Power Users

You can still use the full text version of search to build your podcast via the API Query Generator. Mix Your Own Podcast is built on top of the NPR API. Using the Query Generator, you can fine tune the criteria used to pick stories for your podcast. To use the Query Generator, you will need to sign up for a free API Key. Then, in the Query Generator, go to the "Fields" tab and select "Podcast" as your "Output Format". You can then use the other tabs to customize your podcast to your heart's content.

Click to enlarge

For example, if you preferred the full text search option for building your podcast, go to the "Control" tab, type in your search terms, and select "Full Content of Story" as the "Search Type".

Another example of what you can do with the Query Generator is controlling how your selection criteria are combined. In the Mix Your Own Podcast tool, we return stories that match any of your specified criteria. If you enter several categories, the podcast will contain stories that match at least one of the criteria. In technical terms, we call this a "Boolean Or" API query. Perhaps, though, you want to combine your criteria to get a more focused podcast that contains only the stories that match all of the category selections you have made. For example, if I wanted a podcast that contained only stories that were about both Technology and Politics, I would go to the Query Generator "Topics" tab, check both the "Technology" and "Politics" options, and then go to the "Control" tab and select the "And" option for "Boolean for IDs" option.

Click to enlarge

The end result is my Techlogy and Politics custom podcast.

We would like to hear how you are using the Mix Your Own Podcast tool. If you have created an interesting custom podcast, please post the URL in the comments section of this post.

--Harold Neal

tags: , ,

categories: API

7:22 - January 7, 2009

 
Thursday, December 18, 2008

Today we have some exciting new API enhancements to share with you, including Mix Your Own Podcast, a new extension that offers users an infinite number of ways to customize NPR podcasts. Here are more details about Mix Your Own Podcast as well as some of the other features and content that we launched:

Mix Your Own Podcast
Prior to this release, the API offered only streaming formats of our audio content, including Windows Media, Real Audio, and progressive download MP3. These formats were supported by a Terms of Use that required API users to stream the audio from our servers, preventing them from downloading the audio. With today's launch, however, the API now allows users to slice through the NPR.org archive to create custom podcast feeds based on virtually any aggregation (or combination of aggregations) in the API. To learn more about this, go to the NPR Podcast Directory.
Due to various current constraints, the only real exception here is that users will not be allowed to create full-show podcasts of Morning Edition, All Things Considered, Weekend Edition Saturday or Weekend Edition Sunday. However, all stories from these and other programs will be available to create any other podcast mashup in the system.

Station Finder API. With this release, we are also offering access to our Station Finder API. This API will allow users to pass in zip codes, city/state, station call letters or latitude/longitude information, and we will return a list of stations that can be heard in that location. The station results also include key information about the stations, including links to their home page, schedule page, audio streams, RSS feeds, podcasts, station logo and more. Because the system also has station stories from some of these stations (and more of this content will become available in the coming months), you will be able to, for example, search for a zip code, identify the stations in that zip code, then find all of the stories from all of the stations returned. Over the coming months, more station content will be made available through the API.

New Content: Fresh Air and StoryCorps. With this release, we are also making available the full archive of Fresh Air and StoryCorps. For Fresh Air, we will be explosing over 10,000 stories (and counting) dating back to 1993. The StoryCorps offering will include about 200 stories (and counting) dating back to 2005.

Query By Asset Type
Now you can query the API to get stories that contain a particular type of asset. For example, you can filter your query to only get stories that contain images (useful if you are building a slideshow application, for example), or stories with audio, or stories with long-form text. To use this new feature, append &requiredAssets=image to your query string and you will get only stories with images. The other allowed values for this parameter are audio and text. You can combine these filters with a comma-delimited string (&requiredAssets=image,text,audio). This new feature will be added to the documentation and the Query Generator in the next week or so. This feature does not work yet with API queries based on free-text search.

We are excited about this new release and view it as the next step in our continued effort to open up our content to the world.
--Daniel Jacobson

tags: , ,

categories: API

9:40 - December 18, 2008

 
Monday, December 8, 2008

As mentioned in my previous post about metrics, we have identified quite a few different usages of the API. These implementations range from incorporating NPR stories on member stations' web sites to widgets created by developers in the public. Below are some of the more interesting or comprehensive uses that we have found.

NPR Member Station Implementations

Minnesota Public Radio Program Archives

North Country Public Radio

Oregon Public Broadcasting

KGOU

SouthEast Public Radio

WAMC

Hearing Voices Widget

KJZZ - NPR Simile Timeline


Public User Websites, Widgets, and Applications

Reverbiage Widget

Axiom Stack iPhone Site

KDE Desktop NPR Audio Player

NPR Backstory Twitter Mashup

RubyNPR - A code wrapper in Ruby

All Tweets Considered

NPR Song of the Day Widget for Mac OSX Dashboard

NPR Audio Search Box FireFox Plug-In

If you have created something using the API and it is not included in this list, please let us know about it by adding it in the comments of this post.
--Daniel Jacobson

tags:

categories: API

8:59 - December 8, 2008

 
Monday, November 24, 2008

When we launched the API back in July, we had some ideas as to how to gauge success from a metrics perspective. Some of those success measures were around adoption by member stations, others we based on total number of registrants, and others were based on number of requests. That said, having one of the first comprehensive content APIs, it was hard to determine what the actual numbers meant. In our first few weeks, we had over 300 registrants. Was that good? We think so, but it is hard to know. We know that many of those registrants were member stations, many were developers in the public, and some percentage were people who registered simply to take a look at what they just read about in an article somewhere. After one month, we exceeded 1,000,000 requests to the API itself. We were pretty confident that number was a good one, but again, we had no real basis of comparison.

Despite the challenges in figuring out what our numbers mean, we do believe that our usage and registration numbers (published most recently two weeks ago in my last post) are a strong indication of success for the API.

Another challenge is how to actually get our metrics. While our goal is to encourage the re-use of our content, we obviously want some way to measure success. There are several key ways that we have baked into the system to allow us to see how the API is being used. Keep in mind that there is no 100% way to know how many eyes are seeing the content, only how people are implementing it, and in some cases, on which websites, blogs or applications people are seeing the content that came from the API. The primary methods are as follows:

* Since all audio must be served from NPR servers (based on our Terms of Use), we are able to tag the audio accordingly, indicating that the request originated from the API.

* All requests to the API require an access key. This helps us identify trends in usage of the API at the key level, in addition to at much higher levels.

* For each request in the system, we will be outputting a log to our servers that includes the request, the API key used in the request, and the stories/assets that were returned. Over time, we will be able to see trends of use, most popular requests, most commonly distributed stories, etc.

* For any rich-content request to the API (ie. text elements that contain HTML), we have included a 1x1 pixel image that is served from NPR servers (which is an industry standard approach for capturing metrics online) and passes information back to our logs. This will help us identify some of the places where NPR content is appearing when it has been cached by the website, blog or application.

Like I said, this is not the complete picture, but these approaches result in metrics that do give us a good indication as to how the API is getting used and by whom. With that in mind, these numbers only have weight if they translate into real-world consumption of the content. In my next post I will highlight some of the more interesting implementations and usages that we have heard about in the marketplace.
-- Daniel Jacobson

tags: ,

categories: API

12:14 - November 24, 2008

 
Monday, November 10, 2008

It has been several weeks since my last post on the goals and challenges of launching NPR's API. I still intend to fill out the story in the coming weeks/months.

I will start up again by talking about my recent presentation at Mashery's API Conference last week. The conference itself was primarily focused on the business of APIs. In my presentation, I mainly discussed NPR's goals for opening up an API along with some of the challenges we faced leading up to the launch.

As NPR reviewed the landscape of content syndication, we found that there were quite a few APIs already in the marketplace. Most of them, however, belong to content aggregators (eg. Google, Yahoo!, etc.), user-generated content sites (eg. Flickr, Wikipedia, etc.), and some e-commerce sites (eg. eBay, Amazon, etc.). There were surprisingly few comprehensive APIs from major media organizations. Some organizations, like DayLife, CBS and BBC, offered APIs, but these limited in a variety of ways.

Mostly, these major media organizations were syndicating their content through RSS or extended RSS, such as Podcasts or MediaRSS. This approach has been surprisingly effective - what I call "Really Successful Syndication". It is successful because RSS is simple, widely adopted in the marketplace, and succeeds in driving traffic back to the site. The major problems with RSS are the same things that make it really successful. That is, in the current marketplace, RSS now stands for "Really Stingy Syndication" because it does not contain very much real content. Instead, it provides enough content to drive traffic back to the source, embracing the "lock-down" model of content.

The marketplace is changing dramatically, though, and people have destinations to which they are attached. They go to Facebook, MySpace, etc. and expect to find content there. Content providers will have to put their content on these sites through widgets and other means of distribution. If the users of Facebook, for example, find the content they want on Facebook, then they are less likely to leave Facebook to get more content (unless the user has a keen interest in a specific content provider). As a result, the richer the content is on Facebook, the more likely the user identifies your brand as a trusted news source. So, RSS is ok only if no other providers offer richer content. But it is only a matter of time before the richer content is there...

Because of these changes in the marketplace, NPR decided to release a comprehensive API of all of our content that we have rights to redistribute. If our content is truly open, it will enable users to mash it up, keep it relevant to them, and share it with new audiences in places where those people are. Although NPR.org is still critical to our strategy, we can no longer rely exclusively on the site as a way to reach people.

There were two other major factors in our decision. First, it is critically important for NPR to provide content and services to our Member stations. The API will enable stations to get NPR content on their sites. We also plan to offer local station content through the API, which will provide a local/national view of content to the users. The second major influence in our decision was NPR's Mission to "create a more informed public". By offering both local and national content in our API, enabling users to mash it up and use it in ways that we have not thought of or don't have the resources to execute, we hope to reach and inform new audiences.

Once we decided to release an API, there were several questions that we needed to answer. First and foremost, we needed to establish what our target audiences for the API would be. They are as follows:

  • End-users and other web developers (These users can post content to blogs as well as create innovative ways of using NPR content)
  • NPR's Digital Media team (NPR Product and Project Managers can improve their products using the API without a lot of effort from NPR Developers)
  • NPR Member Stations
  • Content aggregators and NPR's business partners

Serving each of these audiences through the API enables us to seamlessly integrate with them in such a way that it requires very little involvement from NPR's development staff.

In the slides (attached below) from the conference, I have provided some examples of how these audiences are using the API.


We will be discussing more of our challenges in later posts.
-- Daniel Jacobson

tags: , ,

categories: API

12:30 - November 10, 2008

 
Monday, November 3, 2008

While we have been pretty busy building tools for our Election Night reporting, we continue working on the API. The feedback so far has been fantastic. Along with encouragement and congratulations we have received lots great suggestions. We have been very excited by the adoption of this technology and the general embracing of this "Brand and Release" strategy. We hope to have some significant and exciting new features in place by early next year.

But what if you want to hear more...?

Well if you missed us present at OSCON 08 there will be other opportunities to hear us first hand discuss what we have done, and where we are going with the API.

Here are several of the upcoming events we plan to be at:

Today (11/03) at 5:15pm PST Daniel Jacobson will be discussing our efforts on the API at The Business of APIs Conference. If you are attending please stop by.

For those in the Public Broadcasting family, we will be at IMA Public Media 09 in Atlanta Feb 19-21. This is definitely a must attend for those in public broadcasting who see their future world meshing traditional and new media experiences.

We are also very excited to be a finalist for the We Media Game changer award. Out of 150 Nominees we are one of 35 finalist. Additionally we could be chosen as keynote speaker based on community votes.

And, finally we recently got the word from the folks at O'Reilly that we have been invited to present at the Web 2.0 Expo Mar 31st-Apr. 3rd.

Hope to see you soon.

-- Zach Brand

tags:

categories: API

11:26 - November 3, 2008

 
Thursday, September 18, 2008

As promised, I wanted to give some history about how we ended up creating the NPR API. The first major decision that we were faced with was whether or not we should open up our API. The decision was not whether or not to build it, as we'd already done that. Back in November, 2007, we built the foundation of the API to launch with NPR Music. This is basically an XML file repository (essentially in an extended NPRML format) that contains all data needed to build pages on NPR.org. In addition to the XML repository, it includes a PHP framework used to render the XML files to the appropriate presentation layer (these layers include NPR.org as well as RSS feeds, podcast feeds, mobile sites and other outputs that we serve). Here is a diagram of the architecture which includes all of the caching layers as well, some of which were incorporated with the actual release of the public API:

Click image to enlarge

There are several reasons for this architectural approach:

1. PERFORMANCE : Requests will first go through the Memcache and file cache layers, which will always be the most efficient. If the requested document is not in Memcache, we have PHP render the output using the XML files. If the XML file cannot be obtained, PHP will access the database for the data. If PHP hits the database, however, a version of the request will be stored back in Memcache to speed up the delivery of the next request. This ultimately takes strain off of the database, which is the most expensive operation in serving documents.

2. ABSTRACTION : Creating a separate layer between the various presentations and the actual database allows the presentation layers to be agnostic with respect to the data repository. Currently, our database is Oracle, but if want to move to MySQL, then the presentation layers don't really care because they are served primarily off of the XML repository (although the final fail-over to the database would require changes).

3. SIMPLIFICATION : The database itself is a complicated relational system. The schema is largely normalized for scalability and efficiency in our write operations. Building pages, as a result, requires expensive table joins across very tall tables. These queries, although tuned, add up when you consider how many queries there are throughout a story page, for example. Executing these queries once and storing the data in a flatter file system enables the pages to be built more efficiently (both because of the flatter model as well as not having to access the database).

4. SCALABILITY : Because of the rendering framework, we are able to easily add new transformation and presentation layers without having to write a lot of extra code or customized database queries. The rendering engine knows how to handle the XML files in a cohesive way because they are relatively flat, so the transformation layers really aren't that different from each other. The framework also allows for reuse of code in the presentation layers because most of the presentations are dealing with the same content and are displaying that content in similar ways. New presentations for NPR.org are the hardest because of all of the design nuances, but adding Atom and MediaRSS are pretty quick and painless. The difficult part is figuring out how to map our fields to those structures, not in the coding of it.

So, the system was largely in place almost a year ago, alleviating many of the technical hurdles in building an API. We knew that if we wanted to open the API up to the world we would still have some technical challenges left, including filtering engines, the registration engine, the query generator, etc. Before getting to those tasks, however, we needed to determine if the public API fits with the overall NPR strategy.

-- Daniel Jacobson

tags: , , ,

categories: API

9:37 - September 18, 2008

 
Friday, September 12, 2008

Over the coming weeks, my colleagues and I will blog about the various decisions that we made while developing the API. The posts will discuss the following topics:

* Output formats
* OpenID
* Query generator
* Caching layer and performance
* Number of requests per user per day
* Audio stream vs. Download
* Amount and type of content offered
* Terms of use
* Rights
* Metrics
* Station content
* The archive and the deep NPR archive

I am sure that during the course of this series other topics will be added, but these capture some of the more prominent issues that were discussed. As you can see, these topics involve technical issues as well as legal and business ones.

Before we can get to any of the above topics, though, we have to address the single most important decision that we made: Should we open up the API?. That will be the first post in this series.

The purpose of this series is to continue to be as transparent as we can be and to be an active, engaging part of the technical community. We hope that some of these decisions that we dealt with will help others successfully pursue creating APIs as well. We also hope that this blog will act as a forum to continue the discussion and will help us continue to better deliver useful tools and services.

I am looking forward to the discussion!
--Daniel Jacobson

tags: , , , , , , , ,

categories: API

3:36 - September 12, 2008

 
Thursday, August 28, 2008

This is the first of a series of posts that will discuss decisions we made in the design, architecture, and implementation of the API. We hope that our experiences will be useful to you when working with APIs and similar software projects. We also want to hear from you--what you like, what you think should be changed--so we can make course corrections as the API evolves. So put your software geek hats on and let's talk code.

My favorite way to consume the API is using JSON. With just a few lines of code, I get a data object that I can use with JavaScript--no messy parsing of XML or the DOM necessary. The structure of this JSON data object strongly resembles the structure of the NPRML XML output document. In fact, to create the JSON output, we first generate the NPRML document, and then do some transformations to create the JSON output.

However, XML does not map to JSON seamlessly. The XML in NPRML has element nodes that contain either other element nodes or textual content. The element nodes may also have attributes. It is common practice to map element nodes to objects in JSON, with each sub-element becoming a nested object. However, we had to decide on how to treat textual content and attributes.

It makes sense to make the textual content be a property of the object that contains it, but we need a name for that property. We looked at other APIs for a standard naming convention, but there doesn't appear to be one at this time. For example, Google Data APIs puts textual content in a property named $t. The Flickr API uses a property named _content. In the NPR API, we use a property named $text.

Some APIs take a different approach, treating text nodes as string properties of the object, which means the name of the property is the element node name. Yahoo! Shopping Web Services take this approach. This makes the JSON more readable and simpler, but it doesn't work if nodes with textual content also have attributes.

We map element attributes to object properties. This approach is used by many APIs, although some (such as Yahoo! Shopping) create a specially named nested object to hold all of the attribute values. With our approach, this NPRML fragment:

<show>

    <program id="2" code="ATC">All Things Considered</program>

        <showDate>Fri, 22 Aug 2008 16:00:00 -0400</showDate>

    <segNum>12</segNum>

</show>

gets mapped to this JSON:

"show": [{

    "program": {

        "id": "2",

        "code": "ATC",

        "$text": "All Things Considered"

    },

    "showDate": {

        "$text": "Fri, 22 Aug 2008 16:00:00 -0400"

    },

    "segNum": {

        "$text": "12"

    }

}]

Note that the show property contains an array. It is possible that a story was used in multiple shows. We use arrays for properties that could have more than one value. This is done even when a given story has only one value for the property.

We are interested on hearing what you think is the best approach to JSON. Have you seen other approaches that work better? Is JSON important to you? Let us know in comments.

--Harold Neal

categories: API

8:00 - August 28, 2008

 
Wednesday, August 20, 2008

Shortly after the launch of the API, Harold Neal and I presented it at O'Reilly's Open Source Convention (OSCON) on July 24th. Here is a copy of that presentation (requires Adobe Acrobat). This version of the presentation has been slightly modified to reflect more current data (particularly around usage of the API) as well as some other changes that will help the presentation live as a standalone document. I have also added screen shots of the Query Generator to represent the live demo of the API that we did during the presentation.

Sharing this presentation in this forum is the first step to making our process, architecture and decisions around the API more transparent and open to our users. There will be other documents and blog posts to follow with more information. Let us know if you have specific questions about our process so we can try to address them in these future posts.



Click here to view the presentation ( (requires Adobe Acrobat)


Continue reading "OSCON Presentation on the NPR API" >

tags: , ,

categories: API

3:00 - August 20, 2008

 
Monday, August 11, 2008

It has been almost a month since we launched our API and we are now preparing requirements for our second release. What would you most like to see in the next version? Are there specific fields or standard formats that you would like us to output? Are there topics or other ways of slicing the data that you would like represented?
- Daniel Jacobson

tags:

categories: API

10:50 - August 11, 2008

 
Monday, July 21, 2008

First, thanks to everybody for their API-related comments here and numerous other places. We are a bit overdue, but are working on putting up an FAQ for the API. As we have started to compile a list of questions, a common answer is emerging: We didn't want to hold the API back until everything possible was perfect. We do think the API today is very extensive and fills a void, but we also think that it will evolve as time allows, and as we respond to requests and new opportunities. As with everything else, we like to treat all our online efforts as an ongoing work-in-progress, with opportunities to get even better. But for the moment, we're very excited to see what ideas folks implement with it.

I've started a list of questions below. Please chime in with comments on what other questions you'd like see included in the API FAQ.

Continue reading "Proposing Questions for an API FAQ" >

tags: , , ,

categories: API, Administrative Stuff

11:05 - July 21, 2008

 
Thursday, July 17, 2008

There have been quite a few comments and posts around the Web about our API and I would like to clarify a few points about the offering. I also plan to engage in some of the discussions in other forums but I wanted to address them first in our own blog. To see some of the more prominent discussions, you can see the articles on TechCrunch and on Mashable.com.

A common discussion point on the API so far has been our exclusions. Below are the reasons for the exclusions referenced in both of the above blogs as well as some other details that I want to explain:

  • NPR programs and series, including Fresh Air, This I Believe and StoryCorps, are getting excluded due to rights restrictions. We obviously would like to include these in the API and are looking into making it happen. That said, we did not want to hold up the launch of the API as we researched the rights.
  • NPR programs, including RadioLabs, Car Talk and The Diane Rehm Show are distributed by NPR but their web content is not. As a result, these programs are currently not available on NPR.org or through the API.
  • Other radio programs, including MarketPlace, This American Life and A Prairie Home Companion, are not NPR programs -- they are produced and distributed by other public radio entities like American Public Media or Public Radio Interactive. NPR does not have the access or the rights to distribute the content from those programs.
  • Currently, we are not providing any of our video content in the API, although it is on our future plans. Our goal was to launch with our primary asset well defined, which is audio. There are still a few details that we need to work out before extending the API to offer our video content, but hope to be opening that up soon.
  • Our online database goes back to 1995, including over 250,000 stories spanning 13 years. We are actively working to get more of the archival content, dating back to 1970, into the system and available through the API.
  • NPRML is the XML structure that is native to our entire system and it is the structure that drives all content for NPR.org, the API and beyond. We decided to open it up just to be transparent with as much content as possible. This structure is not meant to be a new proposed standard or to replace our goals to expand our output formats. We do intend to include other more comprehensive formats like NewsML and others in the future.

Although we believe that our API is an extensive offering, it will only continue to grow with time. We really appreciate the feedback we have been getting and will look forward to getting more in the future. Knowing that there is a desire for video, for example, will help us prioritize accordingly to better serve the API community. Please check back to this blog for more information about our API and our future plans.

-- Daniel Jacobson

tags: , , , , ,

categories: API

12:11 - July 17, 2008

 
Wednesday, July 16, 2008

As referenced in yesterday's post, we launched our new API today. To find the API, you can either go directly to http://www.npr.org/api/ or you can follow the new link called "Tools / API" on the NPR.org left nav under the Services section.

In order to use the API, you will need to register using our new registration engine that Zach mentioned in a previous post. Once registered, you will need to generate an apiKey by clicking the Generate Key button on the API tab of your account profile. The apiKey is used to authenticate all requests to the API. After you get your apiKey, you can read our documentation or just go straight to the Query Generator, which is a comprehensive tool that allows you to easily create your API requests and see what your results would look like.

There were quite a few questions that we addressed when developing the API, but one thing that was not really in question was the need to open as much of our content as possible. As a result, almost everything that you can find on NPR.org that we have the rights to redistribute is available through the API. This includes audio, images, full text, etc. That said, there are elements, series and programs that we could not offer due to rights restrictions.

We also discussed in depth which output formats we would support. For launch, we are supporting RSS, MediaRSS, Atom, JSON, JavaScript Widgets, HTML Widgets and our custom tagging structure called NPRML. We would like feedback on what other formats we should support, although as of now we are planning to extend it to include NewsML. Which of the existing formats are you most likely to use from our API?

There were a ton of contributors to this new API with the primary technical architect being Harold Neal. Other major contributors include Joanne Garlow, Jason Grosman, Tony Yan, Ivan Lazarte, Stephanie Oura, Ben Hands, Shain Miley, Lindsay Mangum, Sugirtha Solai, Todd Welstein and Vida Logan, and others.

Finally, we would really like to get as much feedback from the community on the API, particularly on what you think you will use and what is missing from the offering. We will continue to post here with more thoughts and questions.

-- Daniel Jacobson

tags: , , , , ,

categories: API

9:47 - July 16, 2008

 
Tuesday, July 15, 2008

In the next couple of days, NPR.org will be launching our new API, which will be an open and extensive way for our users to share and mash-up our content. Once live, we will be adding a new link on the NPR.org left nav in the Services section called "Tools / API". We are very excited about this new tool and are looking forward to the inventive ways that you will use our content! After all, there are only a few of us but millions of you...

As part of the launch, we will also be showcasing several widgets and applications that were built using the API. All of these will be found on our upcoming widgets page, which will launch with the API. Among them is a widget that maps NPR stories based on Geoff Gaudreault's Reverbiage site, and an iPhone site built by our friends at Axiom Stack.


I will post again on the day of the launch to let you all know when it is live. We will also continue to post to this blog to solicit feedback on the API.

-- Daniel Jacobson

tags: , , , ,

categories: API

5:16 - July 15, 2008

 

About Inside NPR.org

Ever wanted to peer under the hood and learn about the inner workings of the NPR website? Have we got a blog for you, then. Here at Inside NPR.org, the NPR Digital Media team will keep you up-to-date on digital products and services we're developing, including social networking tools and our media player. For more info, please see our FAQ and our discussion rules.

search Inside NPR.org

Contact us

Got a question or comment you want to send to us privately? Use our contact form.