December 21, 2017

Advanced Image Manipulation and Data Extraction

A good image editor has a wide variety of features, from simple resizing to advanced photo manipulation. A good software platform needs similar tools as well, and when run in a scalable serverless environment, can include a variety of powerful image-transformation and data-extraction algorithms fueled by machine learning.

We've been building up a library of image-related algorithms for some time, created both by our in-house staff and our amazing community of 60,000 developers. If you're interested in building algorithms and making them available to the community (as open-source or for royalty payments), it's easy to publish an algorithm on Algorithmia!

Meanwhile, check out these great tools which you can use from any programming language, allowing you to code up complex image-editing and image-analysis workflows with just a few lines of code...

Image Acquisition and Enhancement

Downloading images is simple... unless the host uses non-referral links, redirectors, or wrappers which prevent direct downloads. SmartImageDownloader detects and circumvents these problems on a wide range of sites, giving you direct access to the images you want.

If you have a black-and-white photo, ColorfulImageColorization can automatically colorize it for you (demo here).

Apply artistic styles to your images with DeepFilter, making your home photos look like a Van Gogh or a stone carving.

Ever wished you could zoom-and-enhance your photos like in the old cop and sci-fi shows? EnhanceResolution does just this, upscaling your photos without pixelation artifacts.

PhotoQualityEnhancement makes your phone photos look like they're taken with a DSLR.

Resizing, both simple and content-aware

When you just want a simple way to resize your images without altering the content, use ResizeImage.

But if you want to automatically crop or resize the image while retaining salient features (e.g., not cutting off the top of your subject's head), check out, SmartThumbnail, and ContentAwareResize --- they all use deep learning to ensure that the important parts of your image stay visible.

Lastly, A2RL_online will re-crop your image in a way that improves the aesthetics (a.k.a, "make my image pretty").

Detecting nudity, saliency, and memorability

Don't want you users uploading NSFW content onto your site? Automatically identify nude photos with NudityDetectioni2v.

Want to figure out which parts of an image are the most important?  SalNet identifies the most salient areas of a photo, such as people's faces and relevant objects.

Need to know whether your image will be memorable? Try LargescaleImageMemorability. And while you're at it, use SocialMediaImageRecommender to determine which of your images works best for your article / post.

Detecting age, gender, and emotion

AgeClassification and GenderClassification do just what their names imply, attempting to figure out the age range and the gender of your people in your photo.

If you're more interested in knowing how they feel, EmotionRecognitionCNNMBP tells you which emotions are present on their faces. It can help you figure out how users feel about your app, or how a crowd is reacting to your presentation.

Detecting and classifying objects (general)

You have a photo, and you want to know what's in it... simple, right? For computers, this is still a difficult task, but a range of algorithms are available to help.

InceptionNet, CaffeNet, and ImageClassifier attempt to classify the image as a whole.

ObjectDetectionCOCO and IllustrationTagger will identify many objects in the image, returning bounding boxes and confidence levels for each.

Detecting and classifying objects (domain-specific)

If you know what specific kinds of objects you're looking for, you can often get more accurate / faster results.

Use DeepFashion for clothing (scarves, pants, handbags, etc).

Looking at physical locations? Try out RealEstateClassifier and Places365Classifier.

If all you care about is cars -- but you care a lot -- CarMakeandModelRecognition tells you the specific make, model, and year of vehicles.

Finding, identifying, and censoring faces

FaceDetection will locate faces in your image. But if you want to know who those faces belong to, you'll need to train FaceRecognition (see our tutorial). Or try out DeepFaceRecognition, which recognizes many celebrities.

Need to censor out those faces for privacy reasons? Use CensorFace (blogpost) and PixelateFace.

Image metadata, straightening, and similarity

Want to know the color breakdown of your image? PNGImageHistogram will calculate it for you.

Want to level that tilted photo? Find the horizon line with deephorizon, then straighten it out.

Want to know how visually similar two images are, so you can check for duplicates or detect plagiarism? Use ImageSimilarity and VisualImageDiff.

Reading text.

Finding and reading words in an image is another task that most humans find easy, but computers struggle with. You can read about several approaches to the problem, or just jump right in and try out these APIs:

TextDetectionCTPN locates text but does not read it. It's a good first step in any text-processing pipeline.

If you've already cropped your text areas, tesseract can usually read them. If not, NaturalTextNet deals well with noisy / natural images such as billboards.

...and SmartTextExtraction combines all these algorithms into a single API call, locating and reading all the text in an image!


Ready to build your own object detector or classifier? Hit the "source" tab on these algorithms and use them as a model:

InceptionNetDemo demonstrates how to build an image classifier using Tensorflow.

openimagesDemo also uses Tensorflow, bur for object detection. Copy the source code to a new algorithm and have fun building!


At their core, videos are just a stream of images flying by at a few dozen frames per second. You can grab videos from hard-to-strip sites using SmartVideoDownloader, inspect their metadata via VideoInfo, find scene breaks with SceneDetection (see blogpost), or simply chop up a video into individual images with SplitVideoIntoFrames.

These are just some basic tools, though. Algorithmia's APIs are naturally composable -- the output from one can be directly piped into another. This, plus a scalable serverless infrastructure, allow us to do some advanced things with videos.

For example, we can apply an image algorithms to each individual frame of a video, then reassemble the result into a new video file. It's difficult to describe how powerful this is, so just try out the demo to see colorization, stylization, and other transformations being applied to entire videos. Then read the blogpost and try out VideoTransform!

Similarly, VideoMetadataExtraction (see blogpost) lets you apply data-extraction algorithms to each frame of a video, then assembles the results into a single JSON result (or pipe it through VideoTagSequencer for a more searchable format). Check out the demo to find cars, clothing, faces, emotion, and more in video sequences. Or try this tutorial to detect and remove nudity from whole videos.

Jon Peck

Jon Peck

More Posts

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up