May 20, 2014

Algorithmic Tagging of HackerNews (or any other site)

Part of making algorithms more discoverable is creating meta-data tags to classify them. Often sites will allow users to pick their own tags but what if the content had already been generated? This is the problem we faced when trying to tag all the algorithms in our API. Each algorithm had a description page and we believed that using some simple machine learning algorithms already in our API we could generate tags for each one.

By generating tag data, it becomes easy to classify documents, make recommendations, optimize SEO, etc. Below we show how we approached this task, using HackerNews as an example data source.

Full demo site

So how did we do this? Our secret sauce is that Algorithmia is designed to make it extremely easy to combine algorithms, to create a pipeline for processing and generating tags for almost any site.

The basics:

  • Given a site, pull the data and iterate over every link
  • Extract the text from each linked page
  • Run the text though topic analysis algorithm (such as Latent Dirichlet Allocation)
  • Return tagged data
  • Render tags next to links

All this really is, is a pipeline of algorithms (plus some clever front-end js :P). In most cases, this would require some serious code stitching and algorithm development, but most of the components already existed in the Algorithmia API, and the solution was extremely clean, check it out:


This also works for arbitrary webpages. Enter a URL below to automatically generate tags for it:

Once we had these its becomes easy to classify algorithms, make recommendations and relate one algorithm to another. We’ll leave that for the next post…

Get topic tags for your site dynamically - contact us !

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up