Computer Science
Quantifying the effect of in-domain distributed word representations: A study of privacy policies
Document Type
Conference Paper
Abstract
Privacy policies are documents that describe what data is collected by a website or an app and how that data is handled. Privacy policies are often long and difficult to understand. Recently people have started to turn to Natural Language Processing (NLP) to automatically extract statements from the text of these policies. This article reports on a study to evaluate the benefits of using word embeddings in this endeavor. Specifically, we use 150,000 privacy policies to build word vectors in an unsupervised manner. This includes evaluating the benefits of privacy specific word embeddings. Evaluation is conducted on the OPP-115 corpus of privacy policy annotations. By building privacy-specific embeddings we hope to accelerate research at the intersection of privacy policies and language technologies.
Publication Title
CEUR Workshop Proceedings
Publication Date
2019
Volume
2335
First Page
46
Last Page
52
ISSN
1613-0073
Repository Citation
Kumar, Vinayshekhar Bannihatti; Ravichander, Abhilasha; Story, Peter; and Sadeh, Norman, "Quantifying the effect of in-domain distributed word representations: A study of privacy policies" (2019). Computer Science. 229.
https://commons.clarku.edu/faculty_computer_sciences/229
APA Citation
Kumar, V. B., Ravichander, A., Story, P., & Sadeh, N. (2019). Quantifying the effect of in-domain distributed word representations: A study of privacy policies. In AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies.