Inside NPR.org

Inside NPR
 

archive

Thursday, June 25, 2009

In the next month or so, we will be making some significant changes to NPR.org. Some of these changes are visual, while others are architectural. As a result, there will likely be an impact on the API. That said, we did put in a lot of effort to make the system as backward compatible as possible to ensure that API users would be as minimally affected as possible.

I will post again to this blog soon with more details on the changes. In the meantime, here are some high-level descriptions of what to expect:

1. There will be changes to our topic structure resulting in some topics being eliminated, others being added, and others being renamed. For any changes to existing topics, there will be redirects to corresponding topics that will be maintained for a reasonable period of time to ensure backward compatibility. Our goal is to ensure that any applications that are dependent on specific topics existing will continue to work.

2. There will be some nodes and parameters added to the API output for NPRML. These will largely be to support the new features on NPR.org. They should not break any applications dependent on NPRML unless those applications require that these additional elements do not exist.

3. There will be new products and extensions added to the API. These will not adversely affect any current API calls.

As I mentioned earlier, I will publish to this blog again with more details on the changes as we draw closer. In the meantime, please provide any feedback or concerns about this in the comments section for this post.
--Daniel Jacobson

tags:

categories: API

2:02 - June 25, 2009

 
Wednesday, June 17, 2009

If you have used our search recently, you may have noticed that we just launched a 'new' search in beta. You can either follow the link from the search page or you can try it by clicking here.

So what's the big deal you ask? Visually, we made changes to have cleaner look while getting the results more prominently positioned on the page.   Behind the scenes the technology is completely different. The new search is powered by the Google Search Appliance. While our previous search tool has similar potential ability to yield accurate results, it required a high degree of technical expertise to tune. One of the core philosophies in NPR Digital Media's technology team is that we want to be a partner, not a bottleneck to innovation. As part of this we embrace the idea we call 'Self Service' -- the antithesis to maintaining a technology fiefdom. The idea being the more we can provide empowering tools to our colleagues, the more we can accomplish as a team. Prior to the inventions of the lighter and matches, starting a fire was a cumbersome affair, typically involving a tinderbox, flint and a piece of steel. In modern day we usually take making fire for granted -- not because we all have become experts in the discipline, but rather because we have self service tools that work really well. So that is the root of our approach: implement smart, maintainable tools that are easy for people to use.

So it is this same spirit of self service that led to the selection of the search appliance. In this case there was no need to build it ourselves, as other companies had invested quite a lot in making solid search tools. While we think Google is really smart with its search algorithms, even more appealing was the ease of tuning via the appliance's interface. A colleague of ours Javaun Moradi is in charge of search (among many other things). Without making any changes to our code or critical configurations he is able to easily make changes using the GSA interface to help ensure we are indexing and surfacing the results that are desired. Using the variety of information and meta-data available about our content, rules can be defined to bias towards more relevant pages, and make sure to exclude redundant or unnecessary items from the results. Even within the first 24 hours of the tool being up in beta, he informs me that he has already made several tuning changes to help surface information about our shows while also surfacing breaking news.

Another aspect we especially like is the ability of the appliance to render its results in XML. While a suggested implementation is to use XSLT directly on the box to yield result pages, we appreciate the flexibility to make service calls to the appliance and then work with the very clean, portable XML results. Currently we are rendering out the XML results as search result pages using PHP -- which mirrors the architecture we use for the rest of the site. We expect in the future we will be able to use this to better integrate other content and features with search, and search with other features.

This leads us to why we are launching this new search in 'beta'. While we have done some tuning and configuration we are still working to get it right. By putting this tool out in a preliminary beta we can watch to see the queries it is getting and tune it to make it better. On the web, seeing how you all actually use our website is the most authoritative way to judge what is working and what isn't. Everyday is an opportunity to improve upon what we did yesterday. One example of this is that we see many searches that people mistakenly believe NPR produce, such as "This American Life" or "A Prairie Home Companion". By seeing that users are making these searches, we can make sure appropriate results are showing up. So whether you are searching for a story you heard this morning, or want to find those performances at Bob Boilen's tiny desk -- hopefully we get you what you want. We currently anticipate moving it out of 'beta' and as our primary search tool later this summer as we announce some other changes to our digital media tools and products.

Please share any observations or feedback below, or via this comment tool we setup specifically for the new search. We anticipate numerous improvements in the months ahead.

Happy Searching.

-- Zach Brand

categories: Technology

12:20 - June 17, 2009

 
Monday, June 8, 2009

One of the things that I am most commonly asked about regarding the NPR API is rights management. Because we are distributing content to unknown destinations, it is critical to make sure the API itself can control what gets offered and to whom. To handle these kinds of issues, we built a robust permissions and rights management system into the API. But that is not enough. Rights management starts with contracts and ensuring that the content is tagged appropriately. Without these steps, the rights management system cannot accurately withhold the content that is not allowed to be distributed. So, here is a breakdown of the steps we went through and the systems we built to handle rights in our API.

Contracts
Before launching the API, we spent a lot of time with our legal team reviewing existing contracts and our rights tagging system. Based on this review, we determined that a few changes needed to be made to the rights tagging system, but there were quite a few restrictions on what could be offered through the API. One interesting example is Fresh Air. Fresh Air is a program produced by WHYY and distributed on the radio by NPR. NPR is also responsible for displaying the content on NPR.org and is allowed to distributed Fresh Air content through limited outlets, like RSS, based on the terms of the contract. At the time of launch, however, NPR was not permitted to offer Fresh Air content through the API using the richer output formats. By the December 2008 upgrade to the API, however, the contract was renegotiated to include distribution through the API.

This highlights two points. First, at launch, we needed to incorporate a rights management system in the API that could identify specific types of content and then restrict that content from being distributed for certain types of users. The second key point is that NPR has been shifting our contract strategy to enable more content that we pick up to be distributable anywhere NPR content appears, including through the API.

Rights Tagging System
Our system for tagging assets not produced by NPR is critical for the success of rights management. That said, a sizable portion of this system involves manual effort. After all, it is the editorial process that chooses stories from external sources (e.g. AP, Reuters, etc.), images, videos and other assets. Upon selection of these assets, editorial staff then enter them into our content management system that contains appropriate fields for tagging the owner of the content.

Of course, we do have scripts that pull in some materials, like the AP Business feeds on our site. Those stories and assets that get pulled in through automated systems also get tagged by the scripts.

Finally, we also have scripts to remove content from our system based on contractual obligations. For example, if we have the rights to present an image for only 30 days, these scripts will purge the system of that image and its metadata at the appropriate time.

Rights Management System
After we determine what we are allowed to do based on the contracts, and after appropriately tagging the content itself, we were able to create a pretty flexible and powerful system for managing the distribution of the content through the API. This system has four aspects to it, including query-level filtering, story-level filtering, asset-level filtering and user permissions.

Query-level filtering enables the system to remove any story or list (ie. topic, program, series, etc.) from the system due to the permissions. It does this in two ways. First, the system will analyze the API query for any IDs that the user does not have permissions to access. If, for example, the user does not have the rights to view content from This I Believe and the user has included id=4538138 in their API query, the story-level filtering will remove the ID from the query and will proceed to execute the query without it.

Once a valid query passes through the system and figures out what stories to return, the story-level filter gets applied. This filter determines which individual stories need to be removed before returning the feed back to the user. This is done by applying the list of IDs in the filter, for the user's access level, as exclusions in the query to the API. The list of IDs in the filter include list IDs (eg. topics, programs, series, etc.), so the same rule applies to any stories that belong to any of these lists. For example, we have already established that my API key does not give me permissions to see stories that belong to This I Believe. If I request the top 10 stories that belong to the Opinion topic, and if the third story is a This I Believe story, then the system will eliminate the the third story and will add the eleventh to the results to accommodate my request for 10 stories.

Asset-level filtering is less stringent that story-level filtering in that it does not remove the story completely (as in the example above). Rather, it will display the story, but will only return those assets that the user has the rights to see. For example, if I request the top 10 stories from the People & Places topic, that result set may include a story from Fresh Air and This I Believe. In this case, let's say story number three is still a This I Believe story and story number seven is a Fresh Air story. We have already established that my API key does not allow me to see This I Believe, so the story-level filter will remove the third story and will include the eleventh in my results. Meanwhile, my API key allows me to see Fresh Air stories, just not all of them (any such restriction is no longer the case, but when we first launched the API, Fresh Air was only available through RSS). As a result, the seventh story will get through the story-level filter, but the asset-level filter will remove all assets other than the RSS information. We have other asset-level filters for audio, images, video, full text, etc.

The final element of this system, which has been mentioned throughout, is permissions. Our permission levels include Public, Partner, Station, NPR.org and Master, with increasing level of access in that order. For each level, there is a distinct list of IDs associated with each filter type (although the query and story filter lists are always the same). As a result, the same story in our system can theoretically be removed for the Public user, only have RSS content for Partner users, have everything but images for Stations, and be fully available to the NPR.org users. Meanwhile, a different story can theoretically have a completely different permission scheme enabling NPR.org users no access to it while public users can see it all.

To see how this filtering layer sits on top of our system, here is an architectural diagram:



Click here to enlarge

Ongoing Challenges
Although this system handles our cases for the most part, rights filtering is and will always be a challenge. There are certainly cases that could sneak through the system. These cases could be a result of the editorial process, the tagging tools or the code in the API. We also encounter new scenarios that sometimes require us to quickly modify the API to handle them. Despite these challenges, we have been pretty happy with this system so far.

--Daniel Jacobson

tags: , ,

categories: API

9:55 - June 8, 2009

 

About Inside NPR.org

Ever wanted to peer under the hood and learn about the inner workings of the NPR website? Have we got a blog for you, then. Here at Inside NPR.org, the NPR Digital Media team will keep you up-to-date on digital products and services we're developing, including social networking tools and our media player. For more info, please see our FAQ and our discussion rules.

search Inside NPR.org

Contact us

Got a question or comment you want to send to us privately? Use our contact form.