Building Privacy-Preserving Deep Learning Systems using Differential Privacy and Federated Learning
Abstract:Deep Learning and Artificial Intelligence can be as effective as the underlying data used to train the model. Given the growing concern of privacy violations and increased regulations around the usage of data, a lack of relevant data is impeding the growth of deep learning systems. This problem is more acute in healthcare where most of the data is locked in different hospitals, insurance companies, and patient health records. Federated Learning provides a secure way to train deep learning without the data being pulled into one place for training. Federated Learning with Differential Privacy provides a privacy-preserving approach to training and can provide higher levels of prediction accuracy on par with the traditional approaches. I am able to achieve around 87% accuracy with deep learning models trained with fundus image datasets distributed across in edge devices (mobile phones) and using federated learning.
Bibliography/Citations:No additional citations
Additional Project Information
How to train and build deep learning systems without jeopardizing the privacy and regulatory concerns?
Deep Learning systems are as good as the data being fed into the training of the underlying models and data, especially in healthcare, is very difficult to acquire because of privacy and regulatory concerns. Also, data is locked in different hospitals, insurance companies, and research laboratories.
Goals / Expected Outcomes
Build an effective and secure approach for accurate training deep learning models while addressing privacy concerns and leveraging distributed datasets.
Federated Learning with Differential Privacy provides a secure approach to train the models with distributed datasets.
- I would like to start evaluating data security practices currently available and check if any of the traditional approaches are effective in training. Following are key security practices:
- Data Encryption
- Data Anonymization
- Data Randomization
- Data Federation
- Differential Privacy
- Identify all the privacy and health regulations related to using healthcare data
- California Privacy Act
- Identify deep learning requirements against data security approaches and regulatory frameworks
- Assess and score the effectiveness of each of the approaches for deep learning
- Setup distributed environment for validating distributed deep learning approaches and Federated Learning
- Collect and seed medical fundus image data in the distributed environment. Make sure that none of the edge devices (mobile phones or Raspberry PI with Coral) have more than 50 images
- Develop Deep Learning model using TensorFlow Federated Learning (TFF) and deploy in Google Colaboratory
- Train using TFF model and with differential privacy and validate the model for accuracy
- Deploy a distributed model built using CNN in a central server and deploy images in edge devices. Make sure that images do not exceed 50 in each of the devices. Add noise to the images and send them to the centralized model for training. Validate the accuracy of distributed deep learning model
- Compare accuracies of Federated Learning Model with Distributed Deep Learning