Abstract. This paper tackles the issue of data siloing, where organisations are unable to share data with each other because of privacy concerns. Machine Learning models, which could benefit greatly from larger data sets shared between organisations, suffer in this era of data isolation. To solve this problem, a blockchain based implementation is proposed that allows training of machine learning models in a privacy compliant way. Instead of using blockchain in a typical database-style manner, the proposed solution uses blockchain as a means to handle joint ownership and joint control over a computer system known as the Training Machine. The Training Machine, set-up jointly by consortium members, serves as a secure, independent container that accepts data sets and an untrained model as inputs from different entities, trains the model internally, and outputs the trained model without revealing any data to other entities. Data is then deleted automatically. Blockchain ensures that this machine is not under the control of any one entity but is rather controlled transparently by all data-sharing parties. By placing sensitive information in an isolated system, and establishing blockchain based access control, the solution ensures that data is not accessible to any party other than the owner. The paper also shares use cases of this technology, along with a risk analysis and proof of concept.

Keywords: Private data sharing, Shared model training, Blockchain access control, Consortium data exchange, Deep learning training.

Paper Presented at IEEE Cybernetics Conference in University of Salamanca, Spain. Published as part of conference proceedings in Springer’s Advances in Intelligent Systems and Computing series (AISC, volume 1010): https://doi.org/10.1007/978-3-030-23813-1_8