February 26, 2018

Trustless Machine Learning Contracts: Evaluating and Exchanging Machine Learning Models on the Ethereum Blockchain

Machine Learning algorithms are being developed and improved at an incredible rate, but are not necessarily getting more accessible to the broader community. That's why today Algorithmia is announcing DanKu, a new blockchain-based protocol for evaluating and purchasing ML models on a public blockchain such as Ethereum. DanKu enables anyone to get access to high quality, objectively measured machine learning models. At Algorithmia, we believe that widespread access to algorithms and deployment solutions is going to be a fundamental building block of a balanced future for AI, and DanKu is a step towards that vision.

The DanKu protocol utilizes blockchain technology via smart contracts. The contract allows anyone to post a data set, an evaluation function, and a monetary reward for anyone who can provide the best trained machine learning model for the data. Participants train deep neural networks to model the data, and submit their trained networks to the blockchain. The blockchain executes these neural network models to evaluate submissions, and ensure that payment goes to the best model.

The contract allows for the creation of a decentralized and trustless marketplace for exchanging ML models. This gives ML practitioners an opportunity to monetize their skills directly. It also allows any participant or organization to solicit machine learning models from all over the world. This will incentivize the creation of better machine learning models, and make AI more accessible to companies and software agents. Anyone with a dataset, including software agents can create DanKu contracts.

We're also launching the first DanKu competition for a machine learning problem. For more info, please refer to the last section regarding the competition below.

Some background

As of 2018, Artificial Intelligence and Blockchain continue to dominate tech news everywhere. Earlier in 2017, we asked ourselves if we could brings these two things together and solve a problem in Machine Learning. As with most ideas, we noticed that we weren't the first group of people who played around with the idea of using blockchain and ML together.

We immediately noticed a diversity of ideas, where solutions were proposed for all kinds of problems. One good example is OpenMined, which allows you to train ML models on data that you never have access to.

Since putting Blockchain and AI in the same sentence sounded very click-baity, we decided early on to show and not only tell. This led us to focus our efforts on a narrow problem definition.

And the idea for DanKu was born: Trustless machine learning contracts for exchanging machine learning models on the Ethereum Blockchain.

How does it actually work?

You can describe the DanKu protocol at a high-level in 4 steps:

1. Bob creates a new DanKu contract. He submits a dataset, an evaluation criteria, and a reward amount to the contract.

2. ML practitioner Alice downloads the dataset submitted by Bob, and works independently to train a ML model. After successfully training a model, Alice submits her solution to the contract (aka. the Blockchain). Other participants like Alice can also submit their solutions.

3. Bob reveals the testing dataset after the submission period ends. The testing dataset will be used for evaluating the submissions.

4. At some point in the future, the blockchain will evaluate the submitted models, and payout to the winning submission. If no submission fulfills the criteria, the reward is refunded back to Bob.

And voila! Bob and other participants just exchanged ML models in a completely trustless manner. The contract also ran a fully functional ML model on the blockchain too! Isn't that neat!

Note: Some additional steps are necessary to ensure the trust and fairness of the competition. Refer to the white paper for more details.

Running the first neural network on the blockchain

For demo purposes, we decided to write a DanKu contract where a neural network was implemented. Solidity, the programming language for Ethereum contracts, was not designed with Machine Learning in mind. It did not have a math library, or even floating point numbers. Ethereum also had issues running code that was too computationally expensive. It was a software engineering nightmare.

Initially, we started with something simple: A linear model. After gaining a bit of confidence, we tackled the problem of implementing a neural network from scratch. We initially started with a neural network with no hidden layers. After getting that working, we focused on allowing submitters to define any simple network architecture, and it would work if it wasn't too computationally expensive.

We developed everything locally, until we were ready to test it on a live blockchain. We started off by testing it on the Ethereum Ropsten test net. Everything worked without much of an issue. Afterwards we wanted to try it on the Ethereum blockchain to make sure it worked properly on the real thing.

And with a moment’s notice, 22 thousand machines ran the first neural network on the Ethereum blockchain. What looked like machine code to these everyday miners, was actually a fully functioning neural network. Feb 15th was a good day, like a Friday.

Some disruptive features

It was exciting to see the first DanKu contract in the wild. Like with most new tech, DanKu also disrupted a few things:

Since the DanKu protocol does not require trust, it removes the need for a middleman for exchanging ML models. This game changing feature will further democratize access to Machine Learning for the masses. Hopefully this will create an uptick in open-source ML models available for everyone.

The protocol also makes it possible to collectively raise money for things like cancer research. Universities and research groups can create contracts for open-problems like protein folding, etc. Any person could directly donate money to these contracts. This would attract even more ML/Bioinformatic practioners and hopefully result in solving some of these problems. The contract will ensure that the money will directly go to the person/group who solves the problem. Donating money for medical research will never be the same ever again.

It's also likely that DanKu contracts will create opportunities for GPU miner arbitrage. GPU mining farms/pools will likely switch to ML training if it's more profitable. These pools will probably be managed by Data Scientists, who will try to solve these ML problems. The contract reward could later be divided among the data scientist and GPU suppliers.

Another interesting application would be in finance. If submitted models result in a tangible financial result, it would be a lot easier to create DanKu contracts and finance them. This would also create a well defined price for DanKu contracts, since it's a lot easier to assess the value of these types of predictive models.

DanKu contracts also create the opportunity for AI systems to self-improve. AI systems could contract out work to improve itself in an automated and seamless manner when it encounters new data. The use of cryptocurrencies also makes it an attractive method of self-improvement, since crypto payments and transactions are accessible by anyone (or anything). It's just another API endpoint.

So, what's next?

It's hard to tell what will be next on the roadmap, but there's still a lot of room for improvement. Improvements in Ethereum, further optimizations in the contract, and improved protocol design can all make DanKu contracts better. With these improvements, a more diverse set of ML models could be further supported.

The first public DanKu contract competition

Since we've just announced the protocol, we thought that it would be also fitting to create the first public DanKu contract.

For this competition, we've decided to use the 2016 U.S Presidential county election data as our dataset. Every county is represented with 3 data points: longitude, latitude and elected candidate.

For example, a data point can look something like this: [047606200, 122332100, 0].
The first two values refers to the latitude and longitude of Seattle.
The third value, 0 refers to the Democratic candidate, whereas 1 refers to Republican candidate in this data format

500 random data points are selected as the dataset for this competition. 80% of this dataset will be used for training. The remaining 20% of the dataset will be used for evaluating the dataset. Since the training and testing datasets are randomly selected by the contract, the fairness of the competition is assured.

Participants of the contract are required to train a simple forward pass neural network where they can define the neural network structure in terms of layers, neurons and biases. After training, they are required to submit their network definition, weights and biases to the contract.

The DanKu contract was initialized on block 5121944 (Feb 19th, 2018). The evaluation criteria will look for models that has at least 50% accuracy rate. The reward for the winning submission is 5 Ether (ETH).

You can find the guide here to help you participate in the competition. After the competition ends, we'll deploy the winning model to the Algorithmia marketplace!

Comments, questions or criticisms are welcomed at @algorithmia

Here's 50,000 credits
on us.

Algorithmia AI Cloud is built to scale. You write the code and compose the workflow. We take care of the rest.

Sign Up