January 10, 2018

Extending Alexa's AI with Algorithmia Microservices

Amazon reports that there are now "tens of millions" of Alexa-enabled devices in use, from the compact Echo Dot to the revamped Alexa-enabled Fire Stick and Kindle. Voice-enabled devices are hotter than ever, but would be nearly useless without the wide variety of external services they rely on. Whether you're asking Alexa to turn on the lights or tell you the weather, there's a microservice in the loop, responding intelligently to your requests.

As a developer, how do you bring your own algorithm or service into Alexa? If your code is relatively simple Node, Python, Java, or C#, then you can use AWS Lambda for your base logic. If you're using other languages, complex frameworks, or big GPU-dependant Machine Learning models, you may want to consider Algorithmia. Even if your core functionality is not complex, Algorithmia's library of 4500+ ready-to-run algorithms can superpower your Alexa app, quickly adding advanced NLP, web scraping, image processing, and other turnkey machine-learning tools.

To help you along, we've created a simple step-by-step tutorial which shows you how to create an AWS Lambda function, hook it up to Algorithmia, and then trigger it from an Alexa Skill. If you work through the tutorial, you'll notice a few things:

1. You're never working with raw audio: Alexa translates voice to text before your Skill ever sees it, so your functions will accept text and respond with text and/or audio (and images/video if you're working with a device that has a screen).

2. The Alexa Skills Kit allows you to assign "Utterances" (things the user said) to "Intents" (ways you'll respond). Essentially, it acts as a simple pattern-matcher, looking for key phrases and assigning specific responses/actions to each match. There isn't any advanced Natural Language Processing baked in... at least, not in a way that is accessible to developers. If you want to do any complicated analysis of the text, you'll need to use something outside of the Alexa ecosystem, such as the Summarizer, Sentiment Analysis, or Parsey McParseface. The easiest way to accomplish this is to have a single Utterance which captures the user's entire speech as text, then passes it on to a single Intent, as in the tutorial.

3. Alexa translates voice to text in a very literal way, and doesn't attempt to interpret special characters. If the user says "Amazon dot com", that's the exact text your function will receive... not "Amazon.com", which was probably their intent. If you care about special characters or similar word interpretations, you'll need to replace them manually in your Lambda function, probably with a regex such as text.replace(/ dot /gi,'.');

4. Both Lambda and Alexa Skills have time limits, but the user's expectations are the real time limit: people expect fast responses when speaking. If your process will take some time, it can be helpful to warn the user before initiating the action. Alexa also doesn't let you initiate unsolicited responses; individual responses are triggered only by individual user requests. This puts some restrictions on how you must interact with the user, but there are some available workarounds using pauses in responses and session state.

All done with the tutorial? By now you're hopefully thinking of Alexa Skills as a sort of text-based chatbot system which happens to listen and respond with speech on the user's end, and which can use external services to enhance its powers. If you're ready to explore further, you may want to take a look at:

Have fun, and let us know if there's any way we can help you along your journey!

Jon Peck

Jon Peck

More Posts

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up