Inside NPR.org

API

NPR Story API, Now with Google Goodness

Last week, we launched the integration of our Google Search Appliance into the NPR Story API. This new integration provides several new features and should make searching a more consistent experience between the API and the website.

First, a bit of history. When the Story API was first built, NPR.org used a different search engine on its website. With the launch of the re-designed NPR.org in Summer of 2009, the web site began using the Google Search Appliance (GSA) for searching the site. We left the old search engine in the API, however, due to time constraints and a desire to fine tune the GSA before attempting another integration.

Although the Query Generator hides the details from you, the machinery that powers the API can differ radically depending on what query you make. If you make an Story API Query that consists solely of selections you have made on the Topics, Programs, Bios, Music Artists, Columns, Series, and Stations tabs—in other words, using predefined criteria—you will note that your query string contains a bunch of numbers. When you search using these predefined criteria, we use a highly optimized database running on MySQL to determine which stories meet your criteria. However, if you want to search on free-form text by using the Search Term box on the Control tab, we turn to a search engine to calculate your results, even if your query also contains one or more selections from the other tabs of the Query Generator.

New Features

The integration of the GSA into the API allowed us to add several new features to the Story API search.

Search with Required Assets

First, you can now use the requriedAssets parameter when querying the Story API via search. The requiredAssets parameter allows you to limit your results to stories that contain text, images, or audio (or any combination of these.) This is particularly useful if you are using the API to generate a podcast—now you can make sure all your stories contain audio. To use the requiredAssets parameter, go to the Control tab of the Query Generator and select any combination of the check boxes labeled "Show only stories with:" As an example, here is a search for stories that contain the word phone in their main text and that also contain images:

http://api.npr.org/query?requiredAssets=image&searchTerm=phone&output=MediaRSS&searchType=mainText&apiKey={YOUR_KEY_HERE}.

(You must register for a API Key and substitute it for {YOUR_KEY_HERE} to use these examples.)

Note that this example used the searchType=mainText parameter which corresponds to using the "Summary of Story" option in the Query Generator. When this option is used, we only return stories that contain the search terms in the title and/or teasers of the story. The default search option is to search the full text of the story. When you use the mainText option, stemming is not performed on your search terms, so you need to be precise.

Search by Reporter

Second, you can now search by keywords and limit your results to stories by a particular reporter. To do this, select a reporter on the Bios tab of the Query Generator and then enter your search term on the Control tab. For example, here is an Story API Query for stories containing the word "house" that were written by Michel Martin:

http://api.nprorg/query?id=5201175&searchTerm=house&output=RSS&apiKey={YOUR_KEY_HERE}

Sort by Relevance

Third, we have added a new sorting option for Story API Queries that use search: sort by relevance. This is still an experimental feature, so we haven't added it to the Query Generator tool yet, but you can access it by adding &sort=relevance to your query. For example, this search for "glee":

http://api.npr.org/query?searchTerm=Glee&output=MediaRSS&searchType=fullContent&apiKey={YOUR_KEY_HERE}&sort=relevance

returns more relevant results than a search without the sort parameter.  (By default, we sort by date.)

Consistency

Finally, from testing, we believe that the GSA-backed search will provide a more robust and consistent set of results. And as Google improves their technology, we automatically get the improvements in the API.

Note that the Story API only returns story pages, so you will see differences between results in the API versus those found by doing a search on NPR.org, which can contain topic pages, bio pages, and some stories that we cannot include in the API, such as Associated Press stories. 

Technical Details

The integration of the Google Search Appliance into the Story API took less than two days. The software for the API is primarily broken into three parts: First, there is an instance of a model, which generates a list of story Ids that match the criteria of an API query. Second, a view component retrieves the content of the stories and assembles them into a generic version of the response to be returned. Finally, there is a transform engine that applies various changes to the generic document. For example, if you have asked that only title be returned, a transform will hide all the other elements in the generic document.

Overview of API Design

The API makes use of multiple models (yellow) to determine results and can display results in a multitude of formats by using multiple views (blue). Harold Neal/NPR hide caption

itoggle caption Harold Neal/NPR

For the GSA integration, all we had to do was to create a new model class which uses the input parameters specified by a user to build up a query URL that the GSA will understand. For some search criteria, we only had to transform the names of the query parameters we use to those used by the GSA. For other search criteria, we added custom metadata to the GSA to allow it to filter the results for our needs. The GSA runs the query and returns results in XML. From that XML, the model pulls out the story Ids which then get passed to the view. By isolating and focusing the responsibility of each component, the amount of work to do a new integration is very small. In addition, the different querying techniques used by different models can coexist. In fact, we have left our old search implementation in place for now so that we can compare results with the new implementation.

Comments

 

Please keep your community civil. All comments must follow the NPR.org Community rules and terms of use, and will be moderated prior to posting. NPR reserves the right to use the comments we receive, in whole or in part, and to use the commenter's name and location, in any medium. See also the Terms of Use, Privacy Policy and Community FAQ.

Inside NPR.org