Towards a Classifier for Digital Sensitivity Review
The sensitivity review of government records is essential before they can be released to the official government archives, to prevent sensitive information (such as personal information, or that which is prejudicial to international relations) from being released. As records are typically reviewed and released after a period of decades, sensitivity review practices are still based on paper records. The transition to digital records brings new challenges, e.g. increased volume of digital records, making current practices impractical to use. In this paper, we describe our current work towards developing a sensitivity review classifier that can identify and prioritise potentially sensitive digital records for review. Using a test collection built from government records with real sensitivities identified by government assessors, we show that considering the entities present in each record can markedly improve upon a text classification baseline.
KeywordsPersonal Information International Relation Sentiment Analysis Digital Record Test Collection
Unable to display preview. Download preview PDF.
- 1.Nguyen-Son, H.Q., Nguyen, Q.B., Tran, M.T., Nguyen, D.T., Yoshiura, H., Echizen, I.: Automatic anonymization of natural languages texts posted on social networking services and automatic detection of disclosure. In: Proc. ARES (2012)Google Scholar
- 4.He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using opinionfinder. In: Proc. SIGIR (2008)Google Scholar
- 5.Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Siddharth Patwardhan, E.R.: Opinionfinder: a system for subjectivity analysis. In: Proc. HLT/EMNLP (2005)Google Scholar
- 10.Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proc. OSIR (2006)Google Scholar
- 11.Joachims, T.: Learning to Classify Text Using Support Vector Machines – Methods, Theory, and Algorithms. Kluwer/Springer (2002)Google Scholar