MSP 2019 – Deep Learning for Data Privacy Classification

The ubiquity of electronic services and communication has allowed organizations to collect increasingly large volumes of data on private citizens. As this trend continues, more advanced and automated methods are required to protect the privacy of these individuals. This project explores a number of machine learning techniques for classification of arbitrary text documents into three distinct privacy tiers: non-personal information, personal information, and sensitive personal information. We find that applying feed forward neural networks to bag-of-words representations of documents achieves the best performance while ensuring low training and prediction times.