April 26, 2018

Results of The First Trustless Machine Learning Contract

Two of the most interesting things potentially ever are happening in our lifetime: the rise of machine learning and the blockchain revolution.

Machine Learning (ML) systems have been able to surpass humans in many problem domains. These systems are now better at lip reading, speech recognition, location tagging, playing Go, image classification, and more.

With the invention of the blockchain and bitcoin, we've seen a wave of new cryptocurrencies and distributed applications built on these new blockchains.

The DanKu protocol is an overlap between the blockchain and Machine Learning. It helps facilitate exchanging ML models on the Ethereum blockchain. We even published a whitepaper about it here. You can read more about the DanKu protocol in our previous blog post.

To demonstrate the protocol we decided to organize a 52 day DanKu competition. The contract had a 6-week submission period, a 3-day test reveal period, and a 7-day evaluation period.

The dataset was a collection of 500 samples from the 2016 Presidential U.S county election data. Each data point represented a latitude, longitude, and binary voter preference. The contract randomly selected 400 training and 100 testing data points for the competition. The minimum accuracy rate was 50% (a low number was picked to ensure a winner).

There were a total of 40 submissions and 7 unique participants (excluding us, the organizer). The model with the highest score was submission #21 with an accuracy score of 87%. But since this model was too complicated, it couldn't run on the testing dataset without hitting the current gas-limit of 8 million.

The submission that stayed within the gas-limit and had the highest accuracy score had a score of 83%. Here's the prediction map of this winning submission:

As you can see on the map, the model predicts most of the West Coast being Democrat, the North-East part of the country also being Democrat, and the rest of the country being Republican. This prediction is mostly consistent with the 2016 Presidential election results.

Disclosure: After deploying the contract, we noticed a bug in the code. According to the protocol, when there are two winning submissions with the same accuracy score, the contract is supposed to pick the earlier submission as the winner. This additional logic of checking the order was missing in the contract code. Since smart contracts cannot be changed after deployment, there wasn't much we could do as the organizer to remedy the issue.

A participant who was affected by this disclosed the bug to us before the competition ended. As a bug bounty, we rewarded them with 5 ETH for disclosing the bug.

We also got a chance to ask a few questions to this participant:

  1. Mind giving some information about yourself? Are you an individual or a team of people? Where are you from? What's your favorite cryptocurrency?
    We're a small research company in Northern Europe specializing in high-tech areas such as Blockchain and AI. We like to not focus too much on the cryptocurrency aspect of the industry - but Ethereum is without a doubt the best blockchain platform out there.
  2. Where did you first hear about the competition?
    I think it was on Reddit - probably /r/ethereum.
  3. Do you (guys) have a background in Ethereum and Machine Learning?
    Yes, both.
  4. Do you think you need to be an expert in both Ethereum and Machine Learning to be able to participate in a DanKu competition?
    If the right frameworks/libraries are created, this type of competition should not require any knowledge of Ethereum. We suggest that you create a Python package that can auto-publish a given network to the blockchain.
  5. Did you train a model using the sample code provided, or did you write your training code from scratch?
    We wrote our own training code from scratch. [We] simply prefer PyTorch over TensorFlow - much better for quick prototyping and given how small the data set [was], there's really no need for the better performance characteristics that TF has.
  6. What major challenges did you face during the competition.
    The current version of the smart contract requires you to upload the entire network to the blockchain. This puts a limit on the size of the neural network (I think our largest network had ~30 neurons) - the biggest challenge was to keep improving the performance of the network without increasing its size.
  7. How many collective hours have you or your team have spent participating?
    Probably 5 days of two people's full-time attention.
  8. What are your thoughts on the DanKu protocol? How would you improve the experience? Do you see a bright future for this sort of competition-based model creation?
    Very cool protocol. Definitely makes a lot of sense for distributed problem-solving. We've already filed several bug reports and feature suggestions around scalability and security of the protocol. For example, there's no need to store the entire model itself on Ethereum - simply keeping hashes of it on-chain would be much more scalable while still maintaining the same security properties.
  9. How would you make this more accessible to the broader data science community who aren't familiar with smart contracts and blockchains?
    Keep organizing competitions like this one and the word will spread - we must have told at least 5-10 (non-blockchain) people about the concept during the competition.
  10. Got any other feedback you want to give?
    Consider introducing a formal bug-bounty program :)

In summary, as the organizer, we only had to initialize the contract and reveal the training and testing dataset. The contract managed the rest of the competition without requiring any 3rd party involvement. This is the first time that machine learning models were exchanged in a trustless manner!

Got any feedback? Send us a message at @algorithmia!

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up