How collective learning works¶
A Colearn experiment begins when a group of entities, referred to as learners, decide on a model architecture and begin learning. Together they will train a single global model. The goal is to train a model that performs better than any of the learners can produce by training on their private data set.
How Training Works¶
Training occurs in rounds; during each round the learners attempt to improve the performance of the global shared
model.
To do so each round an update of the global model (for example new set of weights in a neural network) is proposed.
The learners then validate the update and decide if the new model is better than the current global model.
If enough learners approve the update then the global model is updated. After an update is approved or rejected a
new round begins.
The detailed steps of a round updating a global model M are as follows:
- One of the learners is selected and proposes a new updated model M'
- The rest of the learners validate M'
- If M' has better performance than M against their private data set then the learner votes to approve
- If not, the learner votes to reject
- The total votes are tallied
- If more than some threshold (typically 50%) of learners approve then M' becomes the new global model. If not, M continues to be the global model
- A new round begins.
By using a decentralized ledger (a blockchain) this learning process can be run in a completely decentralized, secure and auditable way. Further security can be provided by using differential privacy to avoid exposing your private data set when generating an update.
Learning algorithms that work for collective learning¶
Collective learning is not just for neural networks; any learning algorithm that can be trained on subsets of the data and which can use the results of previous training rounds as the basis for subsequent rounds can be used. Neural networks fit both these constraints: training can be done on mini-batches of data and each training step uses the weights of the previous training step as its starting point. More generally, any model that is trained using mini-batch stochastic gradient descent is fine. Other algorithms can be made to work with collective learning as well. For example, a random forest can be trained iteratively by having each learner add new trees (see example in mli_random_forest_iris.py). For more discussion, see here.
The driver¶
The driver implements the voting protocol, so it handles selecting a learner to train, sending the update out for voting, calculating the vote and accepting or declining the update. Here we have a very minimal driver that doesn't use networking or a blockchain. Eventually the driver will be a smart contract. This is the code that implements one round of voting:
def run_one_round(round_index: int, learners: Sequence[MachineLearningInterface],
vote_threshold=0.5):
proposer = round_index % len(learners)
new_weights = learners[proposer].mli_propose_weights()
prop_weights_list = [ln.mli_test_weights(new_weights) for ln in learners]
approves = sum(1 if v.vote else 0 for v in prop_weights_list)
vote = False
if approves >= len(learners) * vote_threshold:
vote = True
for j, learner in enumerate(learners):
learner.mli_accept_weights(prop_weights_list[j])
return prop_weights_list, vote
The driver has a list of learners, and each round it selects one learner to be the proposer. The proposer does some training and proposes an updated set of weights. The driver then sends the proposed weights to each of the learners, and they each vote on whether this is an improvement. If the number of approving votes is greater than the vote threshold the proposed weights are accepted, and if not they're rejected.
The Machine Learning Interface¶
# ------------------------------------------------------------------------------
#
# Copyright 2021 Fetch.AI Limited
#
# Licensed under the Creative Commons Attribution-NonCommercial International
# License, Version 4.0 (the "License"); you may not use this file except in
# compliance with the License. You may obtain a copy of the License at
#
# http://creativecommons.org/licenses/by-nc/4.0/legalcode
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ------------------------------------------------------------------------------
import abc
from enum import Enum
from typing import Any, Optional
import onnx
import onnxmltools
import sklearn
import tensorflow as tf
import torch
from pydantic import BaseModel
from tensorflow import keras
model_classes_keras = (tf.keras.Model, keras.Model, tf.estimator.Estimator)
model_classes_scipy = (torch.nn.Module)
model_classes_sklearn = (sklearn.base.ClassifierMixin)
def convert_model_to_onnx(model: Any):
"""
Helper function to convert a ML model to onnx format
"""
if isinstance(model, model_classes_keras):
return onnxmltools.convert_keras(model)
if isinstance(model, model_classes_sklearn):
return onnxmltools.convert_sklearn(model)
if 'xgboost' in model.__repr__():
return onnxmltools.convert_sklearn(model)
if isinstance(model, model_classes_scipy):
raise Exception("Pytorch models not yet supported to onnx")
else:
raise Exception("Attempt to convert unsupported model to onnx: {model}")
class DiffPrivBudget(BaseModel):
target_epsilon: float
target_delta: float
consumed_epsilon: float
consumed_delta: float
class ErrorCodes(Enum):
DP_BUDGET_EXCEEDED = 1
class TrainingSummary(BaseModel):
dp_budget: Optional[DiffPrivBudget]
error_code: Optional[ErrorCodes]
class Weights(BaseModel):
weights: Any
training_summary: Optional[TrainingSummary]
class DiffPrivConfig(BaseModel):
target_epsilon: float
target_delta: float
max_grad_norm: float
noise_multiplier: float
class ProposedWeights(BaseModel):
weights: Weights
vote_score: float
test_score: float
vote: Optional[bool]
class ModelFormat(Enum):
PICKLE_WEIGHTS_ONLY = 1
ONNX = 2
class ColearnModel(BaseModel):
model_format: ModelFormat
model_file: Optional[str]
model: Optional[Any]
def deser_model(model: Any) -> onnx.ModelProto:
"""
Helper function to recover a onnx model from its deserialized form
"""
return onnx.load_model_from_string(model)
class MachineLearningInterface(abc.ABC):
@abc.abstractmethod
def mli_propose_weights(self) -> Weights:
"""
Trains the model. Returns new weights. Does not change the current weights of the model.
"""
pass
@abc.abstractmethod
def mli_test_weights(self, weights: Weights) -> ProposedWeights:
"""
Tests the proposed weights and fills in the rest of the fields
"""
@abc.abstractmethod
def mli_accept_weights(self, weights: Weights):
"""
Updates the model with the proposed set of weights
:param weights: The new weights
"""
pass
@abc.abstractmethod
def mli_get_current_weights(self) -> Weights:
"""
Returns the current weights of the model
"""
pass
@abc.abstractmethod
def mli_get_current_model(self) -> ColearnModel:
"""
Returns the current model
"""
pass
There are four methods that need to be implemented:
propose_weights
causes the model to do some training and then return a new set of weights that are proposed to the other learners. This method shouldn't change the current weights of the model - that only happens whenaccept_weights
is called.test_weights
- the models takes some new weights and returns a vote on whether the new weights are an improvement. As with propose_weights, this shouldn't change the current weights of the model - that only happens whenaccept_weights
is called.accept_weights
- the models accepts some weights that have been voted on and approved by the set of learners. The old weights of the model are discarded and replaced by the new weights.current_weights
should return the current weights of the model.
For more details about directly implementing the machine learning interface see the tutorial here