"Natural language processing for mobile app privacy compliance" by Peter Story, Sebastian Zimmeck et al.
 

Computer Science

Natural language processing for mobile app privacy compliance

Peter Story, School of Computer Science
Sebastian Zimmeck, Wesleyan University Middletown
Abhilasha Ravichander, School of Computer Science
Daniel Smullen, School of Computer Science
Ziqi Wang, School of Computer Science
Joel Reidenberg, Fordham University
N. Cameron Russell, Fordham University
Norman Sadeh, School of Computer Science

Abstract

Many Internet services collect a flurry of data from their users. Privacy policies are intended to describe the services' privacy practices. However, due to their length and complexity, reading privacy policies is a challenge for end users, government regulators, and companies. Natural language processing holds the promise of helping address this challenge. Specifically, we focus on comparing the practices described in privacy policies to the practices performed by smartphone apps covered by those policies. Government regulators are interested in comparing apps to their privacy policies in order to detect non-compliance with laws, and companies are interested for the same reason. We frame the identification of privacy practice statements in privacy policies as a classification problem, which we address with a three-tiered approach: a privacy practice statement is classified based on a data type (e.g., location), party (i.e., first or third party), and modality (i.e., whether a practice is explicitly described as being performed or not performed). Privacy policies omit discussion of many practices. With negative F1 scores ranging from 78% to 100%, the performance results of this three-tiered classification methodology suggests an improvement over the state-of-the-art. Our NLP analysis of privacy policies is an integral part of our Mobile App Privacy System (MAPS), which we used to analyze 1,035,853 free apps on the Google Play Store. Potential compliance issues appeared to be widespread, and those involving third parties were particularly common.