NPR logo Our National Archives: Updating NPR's Archive Architecture


Our National Archives: Updating NPR's Archive Architecture

One of the great aspects of is our deep archive of free content. For example, you can browse our News topic archives back through 2004. (We have stories back through the mid 1990's, but the older stories are not classified into topics.) In the new site, we made a subtle but important change in how you can navigate these archives.

Previously, our archives used what might be called "search results" style navigation:


The first page of the archive displayed the next most recent 15 stories (after those appearing on the topic home page), and there was a set of numbers at the top and bottom of the page from 1 to 10 that allowed you to navigate further back. The number 2 took you to the next page of older results (results 30-45), the number 3 took you to the third page (results 46-60) and so on. For deep archives with hundreds of stories, there were special arrows that would let you navigate to pages 11-20, 21-30, etc. of the archive. If you've used any search engine, you will be familiar with this style of results. However, for an archive like ours, the date the content appeared is very important. Suppose you were interested in Politics stories about the 2006 midterm elections. The search results style navigation makes it difficult to find the stories you are interested in. You had to either try to guess which page of results corresponds to 2006, or you had to move backwards one page at a time until you found the stories you wanted.

On the new site, we emphasize organization of the archives by date. All of the archive pages have a handy calendar in the right column that lets you jump to the time period of your choosing. From there you can still page around using the "New Stories" and "Older Stories" link if you don't know the exact date of the story you are looking for.

A second advantage of date-based archives is that when you bookmark an archive page, it will have the same set of stories when you come back to it later. With the old search results style navigation, new stories were continually being added to the top of the archives, pushing everything before them deeper into the results pages. So with search results style navigation, when you came back to the bookmark a few days later, it contained a completely different set of stories.

Technical Design

Our old archive pages pulled results using complex queries against our main Oracle database. Each new type of archive required its own specialized query, which made the code harder to maintain. The new archive pages run directly off of the NPR API. The API is well-suited to finding lists of stories by dates and by topics and other criteria. The API uses a MySQL database with a schema that is optimized to do these sorts of queries, and the query remains the same whether you are looking at topic archives, column archives, reporter archives, or any other type of archive on our site. By using the API, we get the advantage of the optimized schema plus the data caching built into the API. In pre-release testing, we found that the new archive pages are about 80 percent faster than the old archive pages. Finally, since the API has the logic built into it, we don't have to maintain distinct code to manage the way the results are returned.