Advertisement

Towards a Classifier for Digital Sensitivity Review

  • Graham McDonald
  • Craig Macdonald
  • Iadh Ounis
  • Timothy Gollins
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)

Abstract

The sensitivity review of government records is essential before they can be released to the official government archives, to prevent sensitive information (such as personal information, or that which is prejudicial to international relations) from being released. As records are typically reviewed and released after a period of decades, sensitivity review practices are still based on paper records. The transition to digital records brings new challenges, e.g. increased volume of digital records, making current practices impractical to use. In this paper, we describe our current work towards developing a sensitivity review classifier that can identify and prioritise potentially sensitive digital records for review. Using a test collection built from government records with real sensitivities identified by government assessors, we show that considering the entities present in each record can markedly improve upon a text classification baseline.

Keywords

Personal Information International Relation Sentiment Analysis Digital Record Test Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nguyen-Son, H.Q., Nguyen, Q.B., Tran, M.T., Nguyen, D.T., Yoshiura, H., Echizen, I.: Automatic anonymization of natural languages texts posted on social networking services and automatic detection of disclosure. In: Proc. ARES (2012)Google Scholar
  2. 2.
    Hart, M., Manadhata, P., Johnson, R.: Text classification for data loss prevention. In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 18–37. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Neamatullah, I., Douglass, M.M., Li-wei, H.L., Reisner, A., Villarroel, M., Long, W.J., Szolovits, P., Moody, G.B., Mark, R.G., Clifford, G.D.: Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making 8(1), 32 (2008)CrossRefGoogle Scholar
  4. 4.
    He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using opinionfinder. In: Proc. SIGIR (2008)Google Scholar
  5. 5.
    Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Siddharth Patwardhan, E.R.: Opinionfinder: a system for subjectivity analysis. In: Proc. HLT/EMNLP (2005)Google Scholar
  6. 6.
    Cohen, J., et al.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  7. 7.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378 (1971)CrossRefGoogle Scholar
  8. 8.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Orlikowski, W.J., Yates, J.: Genre repertoire: The structuring of communicative practices in organizations. Administrative Science Quarterly 39(4), 541–574 (1994)CrossRefGoogle Scholar
  10. 10.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable information retrieval platform. In: Proc. OSIR (2006)Google Scholar
  11. 11.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines – Methods, Theory, and Algorithms. Kluwer/Springer (2002)Google Scholar
  12. 12.
    Savoy, J.: Authorship attribution based on specific vocabulary. ACM Transactions on Information Systems (TOIS) 30(2), 12 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Graham McDonald
    • 1
  • Craig Macdonald
    • 1
  • Iadh Ounis
    • 1
  • Timothy Gollins
    • 1
  1. 1.School of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations