A lot of work has been going on recently under the hood of NPR's API. A quirk of the versioning system we use for the API is that only the native format, NPRML, has our version attached. One reason for that is that many formats, such as RSS, have a "version" already specified for them, while NPRML was originally derived from an internal XML-based format NPR uses. We've made some fairly major changes recently. Even though the bulk of them are not very visible to the outside world, they seem to warrant at least a small version bump, and so we rolled out NPRML 0.94 on Wednesday the 16th of February, 2011.
We've been calling this the "API Refactor," though it's more extensive than changes for which the term refactor is usually used, and wasn't prompted by any major problem with the running system, but by the knowledge that major changes will be needed as we expand usage of the API.
When an API request comes in, one of the request parameters is the output type, which can be NPRML, RSS, JSON, or one of several others. Prior to version 0.94, choosing a different output type resulted in a lot of work specific to that output type being done, and changes or upgrades could easily be (and sometimes were) accidentally applied only to one or some output types. In general, these output types are supposed to be different display formats for the same data, so this situation was suboptimal. It's also true that there are times when one output type really does need to have some specific difference from the other output types — other than, of course, the format itself — and so we still needed to be able to apply such differences where they were wanted.
Later this year, we expect to have a lot more API Ingest activity, with many public radio stations adding their own content to our system. Since various legal requirements exist regarding the rights to some of this content, we needed to have a more flexible rights management system, so that stations can control which resources are available through the API and to whom. Previously, we'd only needed a simpler system in which we could just add business rules directly if needed, so thinking about the requirements for this wider availability helped us flesh out the design for the new systems we're building.
Lastly, our original design had been to add all the information about all the resources we had associated with a request, and then pare away the parts which weren't relevant or were restricted by rights considerations, and add metadata based on the document itself to the document (some of our metrics information, for example). We did this by building an XML document containing all possible results and then using XPath, an XML querying method, to find, alter, and sometimes remove information. We called this layer the "Transform" layer. This worked quite well for most requests, but had some difficulties as well: when we wanted to add a new tag, it was necessary to ensure it didn't collide with XPath queries we were already using; when we wanted to remove some information from the initial superdocument, it was very difficult to be certain it wasn't being used by some XPath query in one or more Transforms. Some of this we had already solved by having extensive tests run throughout each release to make sure we were still getting the expected data (and sometimes not getting extraneous data) after each change. However, there was always the risk that a change would be made which wasn't covered by our existing tests.
So, we wanted the API refactor to make it
- easy to make changes that apply to the results without regard to the output type;
- easy to also make changes that apply only to a specific output type;
- possible for a given API Ingest user to have their own set of rights-handling rules, so that they can use the API to distribute their content without undue concern;
- easier to make changes and additions to formats (especially our native format, NPRML) without worry that we're exposing too much or not enough data for a given request.
How things were
As mentioned above, our system used to build a document which contained all possible information we might return from this request. This document, which was a superset of NPRML, might then be run through the transformation layer to produce NPRML output, or it might be used to construct some very different output, like RSS or HTML, which would then be run through the transforms. The work of building the XML was done in classes we called Views (though, for those familiar with the MVC paradigm, they subsumed both the View and most of the Controller aspect). Views did much of the work of building the XML and held much of the implementation of various output rules. When adding a new behavior, it was quite often unclear whether it should go in the appropriate View(s) or in the transformation layer, or both. It was common, for example, to add flags to the data in a View, and then use them to do other work in a Transform, and remove the flags in a different Transform that ran later.
In general, an API call went like this, neglecting our caching:
- Receive request
- Find and sort stories, applying some business rules
- Build superset document in XML of all story data found, applying some business rule
- Slice and dice this superset to eliminate or add data per business rules
How things are
Now, our API call looks more like this:
- Receive request
- Apply business rules to request via rules system
- Find and sort stories
- Build story models
- Apply business rules to model via rules system
- Build outgoing XML
A high level overview of the Story API
A high level overview of the Story API
In some cases, the final output isn't XML, and in that case there's a conversion step after the above.
While the main motivation for this work was to more easily support upcoming changes, we had two speed-related expectations. First, we believe that speed improvements, like our other improvements, can be more easily made. Second, we thought there might be some improvement in responsiveness without specific attention to that, just through not traversing a gigantic XML document repeatedly to implement business rules. This turned out to be true, in that we've seen some speed improvements since the API refactor went live on api.npr.org: our average response time has dropped from above 0.45 seconds to about 0.35 seconds, a better than 20% speed increase.
How things will be
The work we did over the winter on this refactor will continue to pay off, but we have about as much left to do as we've done — the only part of the API we changed was the Story API, which serves up, as the name implies, stories. The other API types, such as lists of topics, artists, and other types, and such as the Player API that supports the NPR multimedia player, are yet to be converted to the new API system.
In future months, we plan to finish converting the rest of our API types, and we expect to continue reaping the benefits of this project for years to come.