Computer Science

Quantifying the effect of in-domain distributed word representations: A study of privacy policies

Document Type

Conference Paper

Abstract

Privacy policies are documents that describe what data is collected by a website or an app and how that data is handled. Privacy policies are often long and difficult to understand. Recently people have started to turn to Natural Language Processing (NLP) to automatically extract statements from the text of these policies. This article reports on a study to evaluate the benefits of using word embeddings in this endeavor. Specifically, we use 150,000 privacy policies to build word vectors in an unsupervised manner. This includes evaluating the benefits of privacy specific word embeddings. Evaluation is conducted on the OPP-115 corpus of privacy policy annotations. By building privacy-specific embeddings we hope to accelerate research at the intersection of privacy policies and language technologies.

Publication Title

CEUR Workshop Proceedings

Publication Date

2019

Volume

2335

First Page

46

Last Page

52

ISSN

1613-0073

APA Citation

Kumar, V. B., Ravichander, A., Story, P., & Sadeh, N. (2019). Quantifying the effect of in-domain distributed word representations: A study of privacy policies. In AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies.

Share

COinS