October 27, 2016

Using R to Build a Sentiment Analysis Forecasting Pipeline

Using R to Forecast Sentiment AnalysisTime series forecasting algorithms are a common method for predicting future values based on historical data using sequential data, such as snowfall per hour (anyone ready for snowboarding season?), customer sign-ups per day, or quarterly sales data. In this R recipe, we'll show how to easily link algorithms together to create a data analysis pipeline for sentiment time series forecasting.

In a previous post, we introduced the Sentiment Time Series algorithm, which grabs the sentiment of unstructured text and creates a time series object. The output is a sentiment time series plot and JSON file with the positive, neutral, and negative sentiment frequency counts and timestamps.

Now we want to teach how to integrate this into your R project and build a pipeline for forecasting the sentiment of a time series using the Forecast algorithm. Forecasting sentiment time series data is useful when there is a seasonal component in a variety of use cases such as scheduling call center employees for a retail business, understanding market sentiment for stock market prediction or adjusting your social media marketing campaigns based on sentiment forecasts.

Let’s get started!

Prerequisites:

You'll need a dataset of sentiment frequencies to use with the Forecast algorithm. If you don't have a dataset handy, try using using our Twitter search algorithm to pull data and create a CSV. This could then be passed it into the Sentiment Time Series algorithm. Or, try our handy blog post on machine learning datasets. This analysis won’t perform that well if your data doesn't contain seasonality or a linear trend.

Step 1: Install the Algorithmia Client

Let's start by installing the Algorithmia package and stats library from CRAN, and loading them in your R environment:
[code python]
install.packages("Algorithmia")
install.packages("stats")

library(algorithmia)
library(stats)

[/code]

Now grab your Algorithmia API key, found on your profile page under the Credentials tab.

Credentials screenshot

Then create a client object by plugging in your API key:

[code python]

<code class=" language-python">client &lt;<span class="token operator">-</span> getAlgorithmiaClient<span class="token punctuation">(</span><span class="token string">"your_api_key"</span><span class="token punctuation">)</span></code>

[/code]

Step 2: Analyze the Time Series Sentiment

Before we run the Forecast algorithm, we'll need to get sentiment score frequencies. We do this by running the Sentiment Time Series algorithm. If your time series data set contains observations that aren't equally spaced out or in sequential order, don't worry. The algorithm will take care of that for you. Learn more about using the Sentiment Time Series algorithm.

Now, let's run the algorithm. Remember to define where your files will be written to in the output_file and output_plot paths. The example shown is using Algorithmia's hosted data source, which lets you store files and data models. We also support Dropbox and S3 data connections.

[code python]

# This is input for the Sentiment Time Series algorithm
sent_freq <- function(){
sent_input <- list(input_file="data://username/data_collection_name/time_comments.csv",
output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
output_file="data://username/data_collection_name/sent_freq_file.json",
start=data_start_date,
end=data_end_date,
freq=observations_per_season,
dt_format=date_format,
tm_zone=timezone)

# Call the Sentiment Time Series algorithm
sent_algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
# Pipe in sent_input to write the files to your stated directories in output_plot and output_file paths
sent_algo$pipe(sent_input)$result
}
sent_freq()

[/code]

Most of the arguments passed into this algorithm are used to create a time series object in R. Check out documentation to learn more about the time series object in the stats library.

Step 3: Forecast the Sentiment Score Frequencies

Next, let's get the JSON file from the previous step.
[code python]
# Extract your data from the JSON file, saving it to a variable called input which is an R list.
forecast_input <- client$file("data://.my/testing/sent_freq_file.json")$getJson()

[/code]

We then want to create a function that maps the timestamp with the newly generated forecast frequencies:
[code python]
restructure_df <- function(sent_tm, results){
# Map results of forecast with original timestamp
structure(do.call(rbind.data.frame, Map('c', results, tm=sent_tm)),names=c('forecast_freq','timestamp'))
}
[/code]

Now, let's call the Forecast algorithm and pass in our sentiment frequencies from our JSON file. In order to just get the forecast frequency results without any metadata use $result at the end of the algo$pipe(input).

[code python]

plot_sent_ts <- function(){
# Call Forecast algorithm and retrieve result for pos, neg, neu sentiment
algo <- client$algo("TimeSeries/Forecast/0.2.0")
pos_results <- algo$pipe(forecast_input$pos$freq)$result
neg_results <- algo$pipe(forecast_input$neg$freq)$result
neu_results <- algo$pipe(forecast_input$neu$freq)$result

pos_df <- restructure_df(forecast_input, pos_results)
neg_df <- restructure_df(forecast_input, neg_results)
neu_df <- restructure_df(forecast_input, neu_results)

# Creates time series objects
neu_ts <- ts(neu_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
pos_ts <- ts(pos_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
neg_ts <- ts(neg_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))

# Option to plot all three sentiment forecasts on one plot
# plot_ts <- ts.plot(pos_ts, neg_ts, neu_ts, gpars = list(col = c("green", "red", "blue")))
# Plot just the neutral sentiment
plot_ts <- ts.plot(neu_ts)

# Return the plot you want to save to your Desktop
return(plot_ts)
}

[/code]

We're using the first timestamp and last timestamp data observations for our time series object.

Step 4: Save the Time Series Plot

We'll save the plot as a PNG and to the same directory as our R script.

[code python]

plot_forecast <- function(){
# Create a png object with the filename of your choice
png(filename="neutral_forecast.png")
# Create the plot
plot_sent_ts()
# Turn off graphical device
dev.off()
}

plot_forecast()

[/code]

If everything went as planned, then you should get a plot that shows the forecast sentiment time series looking something like this:

neutral sentiment forecast plot

While this shows the end result being a simple plot, you could use the accompanying JSON file to pipe it into Tableau or Plotly for better data visualizations.

Conclusion

In this post, we showed you how to easily link algorithms together to create a data analysis pipeline in R. The algorithms used in this recipe were Sentiment Time Series and Forecast.

Get the complete Sentiment Analysis Forecasting Pipeline on GitHub, and then run it from your console or IDE with:

[code python]

Rscript name_of_script.R

[/code]

Here's the complete code snippet for trying this recipe out for yourself:
[code python]
install.packages("Algorithmia")
install.packages("stats")

library(algorithmia)
library(stats)

client <- getAlgorithmiaClient("your_api_key")
# This is input for the Sentiment Time Series algorithm
sent_freq <- function(){
sent_input <- list(input_file="data://username/data_collection_name/time_comments.csv",
output_plot="data://username/data_collection_name/sent_timeseries_plot.png",
output_file="data://username/data_collection_name/sent_freq_file.json",
start=data_start_date,
end=data_end_date,
freq=observations_per_season,
dt_format=date_format,
tm_zone=timezone)

# Call the Sentiment Time Series algorithm
sent_algo <- client$algo("nlp/SentimentTimeSeries/0.1.0")
# Pipe in sent_input to write the files to your stated directories in output_plot and output_file paths
sent_algo$pipe(sent_input)$result
}
sent_freq()

restructure_df <- function(sent_tm, results){
# Map results of forecast with original timestamp
structure(do.call(rbind.data.frame, Map('c', results, tm=sent_tm)),names=c('forecast_freq','timestamp'))
}

plot_sent_ts <- function(){
# Extract your data from the JSON file, saving it to a variable called input which is an R list.
forecast_input <- client$file("data://.my/testing/sent_freq_file.json")$getJson()

# Call Forecast algorithm and retrieve result for pos, neg, neu sentiment
algo <- client$algo("TimeSeries/Forecast/0.2.0")

# Pipe each sentiment frequency input into algor$pipe and retrieve results
pos_results <- algo$pipe(forecast_input$pos$freq)$result
neg_results <- algo$pipe(forecast_input$neg$freq)$result
neu_results <- algo$pipe(forecast_input$neu$freq)$result

# Map each sentiment result with their corresponding timestamp
pos_df <- restructure_df(forecast_input, pos_results)
neg_df <- restructure_df(forecast_input, neg_results)
neu_df <- restructure_df(forecast_input, neu_results)

# Creates time series objects
neu_ts <- ts(neu_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
pos_ts <- ts(pos_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))
neg_ts <- ts(neg_df$forecast_freq, start=start(pos_df$timestamp), end=end(pos_df$timestamp))

# Option to plot all three sentiment forecasts on one plot or you can plot only one sentiment at a time
plot_ts <- ts.plot(pos_ts, neg_ts, neu_ts, gpars = list(col = c("green", "red", "blue")))

# Return the plot you want to save to your Desktop
return(plot_ts)
}

plot_forecast <- function(){
# Create a png object with the filename of your choice
png(filename="sentiment_forecast.png")
# Create the plot
plot_sent_ts()
# Turn off graphical device
dev.off()
}
plot_forecast()

[/code]

 

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up