Peeling back layers of content
At our annual rax.io internal technical conference in San Antonio this week, I had a blast hacking on a reporting tool for our new content engine behind developer.rackspace.com and support.rackspace.com.
Presentation layer: the web pages
To take a look at each layer, start with the obvious one: the one you read! For each documentation page found on developer.rackspace.com/docs and support.rackspace.com/how-to, there's lovely documentation.
Looking at each page, there's a particular layout for each that makes the page useful. For example, work with server image restoration with the dashboard and create an image of a server with the API.
Authoring layer: the source files
Peeling off that presentation layer and looking at the source files, you find RST and Markdown files, which are stored and edited in GitHub. We can edit with authors around the world on GitHub, and it's truly amazing.
Delicious layer: the content API
The next layer is the one I wanted to hack on this week, because the content service enables a content API. Typically, we use the API to post the content for display in the upper layers. Now that we have migrated to this new system, we have a way to report on the content for quality and completeness. I started with some use cases, wrote them down in the README, and got going.
One of the best parts of this learning curve was realizing how helpful iPython is for this type of development -- type in a few ideas, import
pythonfilename, edit some more, reload the
.py file with
reload(pythonfilename), call the function directly in iPython with tab completion, and keep going.
I focused squarely on my Python knowledge and set to making some API calls with the requests library. The first order of business was to get a list of content IDs. My first thought was to use the GitHub API and search for repos with "docs-" in the name. Then I looked for a Python library to do that and to scope it only to the rackerlabs organization. Two of my teammates helped me find suitable Python libraries, and I went with one but discovered that I couldn't figure out authentication in time to demo in the afternoon. So, we created a list by hand, and the code iterated through that list to create URL-encoded content IDs that the content API can understand.
Here's a Github repo URL:
Here's a URL-encoded content ID:
Thanks, Python requests.utils.quote!
Now to take that list of content IDs and look at loads of metadata. I fed the content IDs into another function, this one uses Python requests to get the JSON from the content service and then only look at the titles.
As an example, look at what you get back when you do a GET for a content ID by clicking this URL.
Lots of JSON! I can get some meta data, the title, and with a list of content IDs, I could get a list of titles. Or a list of authors, or a list of even more metadata. Exciting!
... RackConnect API 3.0 ... Rackspace Private Cloud Powered By Red Hat ... None Rackspace How-To Articles
And look at the next to last line, the script already found a content ID with no title, though in this case it all checks out fine.
I can also search through our existing content using the
/search?q=:term operation. After the hackathon I added the ability to do a search query based on a term. How about searching for RackConnect using the content API and getting back some JSON as well as a count of 210 results.
The endpoint for our content API is open for read actions, so if you're interested you can take a look at the work so far at https://github.com/deconst/cli-deconst/ and join in. The content API is documented in the content-service repo. Next we'll wrap it up in a CLI for easier reporting. Feel free to join in the delicious content API layer fun.