Computer Science
Analyzing privacy policies at scale: From crowdsourcing to automated annotations
Document Type
Article
Abstract
Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have the time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Some recent efforts have aimed to use a combination of crowdsourcing, machine learning, and natural language processing to interpret privacy policies at scale, thus producing annotations for use in interfaces that inform Internet users of salient policy details. However, little attention has been devoted to studying the accuracy of crowdsourced privacy policy annotations, how crowdworker productivity can be enhanced for such a task, and the levels of granularity that are feasible for automatic analysis of privacy policies. In this article, we present a trajectory of work addressing each of these topics. We include analyses of crowdworker performance, evaluation of a method to make a privacy-policy oriented task easier for crowdworkers, a coarse-grained approach to labeling segments of policy text with descriptive themes, and a fine-grained approach to identifying user choices described in policy text. Together, the results from these efforts show the effectiveness of using automated and semi-automated methods for extracting from privacy policies the data practice details that are salient to Internet users' interests. 2018 Copyright is held by the owner/author(s).
Publication Title
ACM Transactions on the Web
Publication Date
2018
Volume
13
Issue
1
ISSN
1559-1131
DOI
10.1145/3230665
Keywords
Crowdsourcing, Human computer interaction (HCI), Machine learning, Natural language processing, Privacy, Privacy policies
Repository Citation
Wilson, Shomir; Schaub, Florian; Liu, Frederick; Sathyendra, Kanthashree Mysore; Smullen, Daniel; Zimmeck, Sebastian; Ramanath, Rohan; Story, Peter; Liu, Fei; Sadeh, Norman; and Smith, Noah A., "Analyzing privacy policies at scale: From crowdsourcing to automated annotations" (2018). Computer Science. 230.
https://commons.clarku.edu/faculty_computer_sciences/230
APA Citation
Wilson, S., Schaub, F., Liu, F., Sathyendra, K. M., Smullen, D., Zimmeck, S., ... & Smith, N. A. (2018). Analyzing privacy policies at scale: From crowdsourcing to automated annotations. ACM Transactions on the Web (TWEB), 13(1), 1-29.