Using Phoenix & Elixir to turn HackerNews frontpage into a JSON API


I love turning web pages into structured API’s and lately, I’ve been learning both Elixir & Phoenix. It was only natural to mix the two and put my freshly acquired skills to turn one of my favorite website (i.e.: Hacker News) into an API.

> I’m aware of the Official HackerNews API which is way better, but I just wanted to play around and satisfy my itch with Elixir & Phoenix, so I went ahead and spent a couple hours on this small, side project.

Here’s the final live result of how the API behaves:

Complete source code is available on the Github repo here:

I started out by creating a new Phoenix project:

and connected a simple route to a controller action in Phoenix and just have it output “Hello World” in JSON format for now.

I decided to use “/api/top-stories” as my route that will eventually render the HN frontpage stories.

A StoriesController.top_stories/2 was necessary to respond to any requests on this endpoint:

Visiting, http://localhost:4000/api/top-stories in my development environment was now rendering the “hello world” I was expecting. The relevant git commit is here

Time to make things interesting now, let’s fetch HN Frontpage, parse the response and turn it into a structured map that can be rendered as JSON by Phoenix. HTTPoison and Floki are two excellent Elixir libraries that we can use for the task. If you are interested, you can refer to this git commit on how and where to add Hex packages to your Phoenix project.

I started from the controller first, because I like to hash out the API of my code before I’ve even written any code. It’s clear to me that my controller needs to handle two simple cases of success and error for now. There may be more edge cases to handle, but we’ll keep things simple for now.

Having modified our controller with the above code, I needed a Hackernews.Frontpage.fetch/0 function which shall return the desired tuple of either {:ok, response} or {:error, reason}. Locations, such as lib/ in Phoenix would force me to restart the app everytime I make a change, So, I decided to add this module to my web/models folder.

HTTPoison offers a HTTPoison.Base module that can be used as a mixin within your own modules to make it easier for you to create HTTP Clients. So, I decided to use that.

Invoking Hackernews.Frontpage.get("/") fires off the request to HN’s frontpage and because we are using HTTPoison.Base in our module, we have to define Hackernews.Frontpage.process_url/1 and Hackernews.Frontpage.process_response_body/1 functions. Hackernews.Frontpage.process_response_body/1 is invoked when the HTTP request finishes and the argument contains the HTML part of the response.

From here onwards it’s up to us to parse the HTML and turn it into structured Map that our controller can easily encode into valid JSON. After looking at the HTML structure, it became clear that 3 table rows (<tr>) were being used to represent 1 single story on HN frontpage, and because we have 30 stories on the homepage, I just took the first 90 table rows (Enum.take(90)) and then grouped them in groups of 3 (Enum.chunk(3)).

This gives us a List of 30 elements, where each element contains the necessary markup to extract out everything there is about an HN story.

Each of these elements are then passed onto the Hackernews.Frontpage.extract_story/1 function which takes care of turning the markup into a structured Map. This was really the most time consuming part of this whole project, as I had to refer to each attribute of a story and figure out how to extract it out.

Once extracted, the returned Map is simply put back into a List which is then passed back into the controller and rendered as JSON. You can see the result here:

I already have ideas on how to further refactor my extract_story/1 function, and probably also render attributes such as comments count and score count as JSON Number values instead of strings, but I decided to stop here as this was supposed to be a fun, side project only. And I’m quite happy with how it turned out.

Provided that I’m still learning Elixir/Phoenix and yet I invested only 2 hours (approximately) on this project from start to finish, I was quite happy about my productivity during the process, also the fact that I was not exchanging my productivity for performance was the icing on the cake.

I’ve spent fair amount of time learning more about OTP/Ecto/Model Layer/Phoenix Channels etc. as well and I’m already planning to adopt and push more for Elixir & Phoenix for our client projects here at Metaware.

Please feel free to write to me at jasdeep [at] metawarelabs [dot] com in case you have any questions.

Special thanks to Gagandeep, Dann and Manpreet for their feedback on this.