Computer Science

PrivOnto: A semantic framework for the analysis of privacy policies

Alessandro Oltramari, Carnegie Mellon University
Dhivya Piraviperumal, Carnegie Mellon University
Florian Schaub, University of Michigan, Ann Arbor
Shomir Wilson, College of Engineering and Applied Science
Sushain Cherivirala, Carnegie Mellon University
Thomas B. Norton, Fordham University
N. Cameron Russell, Fordham University
Peter Story, Carnegie Mellon University
Joel Reidenberg, Fordham University
Norman Sadeh, Carnegie Mellon University

Abstract

Privacy policies are intended to inform users about the collection and use of their data by websites, mobile apps and other services or appliances they interact with. This also includes informing users about any choices they might have regarding such data practices. However, few users read these often long privacy policies; and those who do have difficulty understanding them, because they are written in convoluted and ambiguous language. A promising approach to help overcome this situation revolves around semi-automatically annotating policies, using combinations of crowdsourcing, machine learning and natural language processing. In this article, we introduce PrivOnto, a semantic framework to represent annotated privacy policies. PrivOnto relies on an ontology developed to represent issues identified as critical to users and/or legal experts. PrivOnto has been used to analyze a corpus of over 23,000 annotated data practices, extracted from 115 privacy policies of US-based companies. We introduce a collection of 57 SPARQL queries to extract information from the PrivOnto knowledge base, with the dual objective of (1) answering privacy questions of interest to users and (2) supporting researchers and regulators in the analysis of privacy policies at scale. We present an interactive online tool using PrivOnto to help users explore our corpus of 23,000 annotated data practices. Finally, we outline future research and open challenges in using semantic technologies for privacy policy analysis.