Advertisement

Towards Maximising Openness in Digital Sensitivity Review Using Reviewing Time Predictions

  • Graham McDonaldEmail author
  • Craig Macdonald
  • Iadh Ounis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)

Abstract

The adoption of born-digital documents, such as email, by governments, such as in the UK and USA, has resulted in a large backlog of born-digital documents that must be sensitivity reviewed before they can be opened to the public, to ensure that no sensitive information is released, e.g. personal or confidential information. However, it is not practical to review all of the backlog with the available reviewing resources and, therefore, there is a need for automatic techniques to increase the number of documents that can be opened within a fixed reviewing time budget. In this paper, we conduct a user study and use the log data to build models to predict reviewing times for an average sensitivity reviewer. Moreover, we show that using our reviewing time predictions to select the order that documents are reviewed can markedly increase the ratio of reviewed documents that are released to the public, e.g. +30% for collections with high levels of sensitivity, compared to reviewing by shortest document first. This, in turn, increases the total number of documents that are opened to the public within a fixed reviewing time budget, e.g. an extra 200 documents in 100 hours reviewing.

References

  1. 1.
    TNA: The digital landscape in government 2014–2015: business intelligence review (2016)Google Scholar
  2. 2.
    Oard, D.W., Baron, J.R., Hedin, B., Lewis, D.D., Tomlinson, S.: Evaluation of information retrieval for e-discovery. Artif. Intell. Law 18(4), 347–386 (2010)CrossRefGoogle Scholar
  3. 3.
    TNA: The application of technology-assisted review to born-digital records transfer (2016)Google Scholar
  4. 4.
    Gollins, T., McDonald, G., Macdonald, C., Ounis, I.: On using information retrieval for the selection and sensitivity review of digital public records. In: Proceedings of PIR@SIGIR (2014)Google Scholar
  5. 5.
    McDonald, G., Macdonald, C., Ounis, I., Gollins, T.: Towards a classifier for digital sensitivity review. In: Proceedings of ECIR (2014)Google Scholar
  6. 6.
    Elragal, A., Päivärinta, T.: Opening digital archives and collections with emerging data analytics technology: a research agenda. Tidsskriftet Arkiv 8(1), 1–15 (2017)CrossRefGoogle Scholar
  7. 7.
    McDonald, G., Macdonald, C., Ounis, I.: Enhancing sensitivity classification with semantic features using word embeddings. In: Proceedings of ECIR (2017)Google Scholar
  8. 8.
    Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In: Proceedings of CIKM (2015)Google Scholar
  9. 9.
    McDonald, G., Macdonald, C., Ounis, I.: Using part-of-speech n-grams for sensitive-text classification. In: Proceedings of ICTIR (2015)Google Scholar
  10. 10.
    Berardi, G., Esuli, A., Sebastiani, F.: A utility-theoretic ranking method for semi-automated text classification. In: Proceedings of SIGIR (2012)Google Scholar
  11. 11.
    Jethani, C.P., Smucker, M.D.: Modeling the time to judge document relevance. In: Proceedings of SIGIR (2010)Google Scholar
  12. 12.
    Damessie, T.T., Scholer, F., Culpepper, J.S.: The influence of topic difficulty, relevance level, and document ordering on relevance judging. In: Proceedings of ADCS (2016)Google Scholar
  13. 13.
    Mc Laughlin, G.H.: SMOG grading - a new readability formula. J. Reading 12(8), 639–646 (1969)Google Scholar
  14. 14.
    Senter, R., Smith, E.A.: Automated readability index. Technical report, DTIC (1967)Google Scholar
  15. 15.
    Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRefGoogle Scholar
  16. 16.
    Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of GlasgowGlasgowUK

Personalised recommendations