Security and Protection for Machine Learning
Machine learning. We believe that it has the potential to be a game changer that will transform the course of our future. But what actually is machine learning? The term refers to training an artificial intelligence with typically massive sets of data. The resulting trained model can then predict other data. To illustrate this, we can see how medical data could be processed automatically by an artificial intelligence.
Think of a doctor screening a patient for TBC. The doctor takes a look at the X-ray image to tell whether the patient is infected or not. This needs years of training and lots of experience. A trained AI could do the same job much more efficiently and make the technique available even to general practitioners. Imagine that a maker of TBC screening systems produces a system that read the X-ray scans and make the diagnosis by itself. All it needs is a good set of X-ray images with the right diagnoses attached. The system is trained with these scans to produce a model that can predict the diagnoses for future scans. What is immediately clear is that the data that the machine is being fed with has to be sound: All incorrect or flawed data has to be screened out, and the meaningful data has to be identified and correlated. The model that slowly grows from this process becomes the intellectual property of the manufacturer. And as soon as intellectual property comes into the picture, we get problems: There will be counterfeiters trying to build similar systems by abusing the property of the original maker. There might even be outright saboteurs who want to manipulate what the system can do in practice.
Latest at this point – much sooner, if you ask us – the device maker should start thinking about ways to protect that IP. The IP comes in multiple forms: It is the data used originally to train the model, the training setup itself, and the eventual trained model. It does not matter whether the model in question is a virtual system operating in the cloud or an actual device sitting on a desk in a doctor’s surgery room somewhere. At stake in both cases is a data model that can be accessed via the physical device and that needs to be securely stored in the cloud. And for both threats, CodeMeter has a perfect solution: Encrypting the model to protect it against unauthorized use, copying, or espionage.
Our example is a simple case of AI and machine learning doing something that a person could also do, albeit with lots of medical training and experience. Another current example – Covid-19 – shows us the potential of machine learning far beyond the abilities of the human brain. Humans take only single or a few individual factors like vaccination rates as input and analyze the prevalence of severe cases depending on those factors. An AI can go beyond this microcosm and include multiple factors as input for the machine learning process. It could consider data like the patient’s age, preexisting conditions, environmental factors, social contacts like family members or the number and age of children, or hygiene habits. Machine learning could take all of this and arrive at reliable predictions that would not be influenced by the short-term thinking of non-specialist politicians.
This example again shows the importance of data in all of this – specifically of very private personal data. Two threats need to be considered: The data could simply be stolen, or it could be manipulated for other purposes. An unscrupulous insurance company could want this data to identify high-risk clients. Or a pharmaceutical company could tweak the data just enough to make their drugs look like the better treatment option. Both would, naturally, be against the law, but they are possible. Again, CodeMeter can encrypt and sign the data to add an extra layer of security on top of anonymization or pseudonymization.
Access the Recordings
In our masterclass, we will first take another look at the basic principles of machine learning and the specific risks and threats associated with it. In the second part, we are turning our attention to CodeMeter Protection Suite and its ability to protect ML data models:
- What is Machine Learning?
- The threats
- CodeMeter and the secure ML lifecycle
- CodeMeter Protection Suite
- Feature file encryption
- Data model encryption in practice
Join us for our webinar, discover whether and how machine learning could help you with new products, and see how you can protect your ML data models from thieves, spies, and saboteurs.