Sentence-Level Novelty Detection in English and Malay

  • Agus T. Kwee
  • Flora S. Tsai
  • Wenyin Tang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5476)


Novelty detection (ND) is a process for identifying information from an incoming stream of documents. Although there are many studies of ND on English language documents, however, to the best of our knowledge, none has been reported on Malay documents. This issue is important because there are many documents with a mixture of both English and Malay languages. This paper examines multilingual sentence-level ND in English and Malay documents using TREC 2003 and TREC 2004 Novelty Track data. We describe the text processing for multilingual ND, which consists of language translation, stop words removal, automatic stemming, and novel sentence detection. We compare the results for sentence-level ND on English and Malay documents and find that the results are fairly similar. Therefore, after preprocessing is performed on Malay documents, our ND algorithm appears to be robust in detecting novel sentences, and can possibly be extended to other alphabet-based languages.


Novelty detection sentence novelty multilingual automatic stemming stop words removal Malay 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abdullah, M.T., Ahmad, F., Mahmod, R., Sembok, T.M.T.: A stemming algorithm for Malay language. In: CITA, pp. 181–186 (2005)Google Scholar
  2. 2.
    Allan, J., Wade, C., Bolivar, A.: Retrieval and novelty detection at sentence level. In: SIGIR 2003, Toronto, Canada, pp. 314–321. ACM, New York (2003)Google Scholar
  3. 3.
    Bhanot, D.K.: The first online Malay — English dictionary (June 2008),
  4. 4. online dictionary (August 2008),
  5. 5.
    U. R. Group. Example-Based Machine Translation (EBMT) prototype, Universiti Sains Malaysia (July 2008),
  6. 6.
    Li, X., Croft, W.B.: An information-pattern-based approach to novelty detection. Information Processing and Management 44(3), 1159–1188 (2008)CrossRefGoogle Scholar
  7. 7.
    Malay language. In: Wikipedia, The Free Encyclopedia (2008),
  8. 8.
    Mangalam, S.S.V.: Malay-language stemmer. Sunway Academic Journal 3, 147–153 (2006)Google Scholar
  9. 9.
    Ng, K.W., Tsai, F.S., Goh, K.C., Chen, L.: Novelty detection for text documents using named entity recognition. In: 2007 6th International Conference on Information, Communications and Signal Processing, pp. 1–5 (2007)Google Scholar
  10. 10.
    Porter, M.F.: An algorithm for suffix stripping. In: Readings in information retrieval, pp. 313–316 (1997)Google Scholar
  11. 11.
    Ranaivo, B.M.: Computational analysis of affixed words in Malay language. In: 8th International Symposium on Malay/Indonesian Linguistics (ISMIL) (2004)Google Scholar
  12. 12.
    Robertson, S., Soboroff, I.: The TREC 2002 Filtering Track report. In: TREC 2002 - the 11th Text REtrieval Conference (2002)Google Scholar
  13. 13.
    Soboroff, I.: Overview of the TREC 2004 Novelty Track. In: TREC 2004 - the 13th Text REtrieval Conference (2004)Google Scholar
  14. 14.
    Soboroff, I., Harman, D.: Overview of the TREC 2003 Novelty Track. In: TREC 2003 - the 12th Text REtrieval Conference (2003)Google Scholar
  15. 15.
    Zhang, H.-P., Sun, J., Wang, B., Bai, S.: Computation on sentence semantic distance for novelty detection. Journal of Computer Science and Technology 20(3), 331–337 (2005)CrossRefGoogle Scholar
  16. 16.
    Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filter. In: SIGIR 2002, Tampere, Finland, pp. 81–88. ACM, New York (2002)Google Scholar
  17. 17.
    Zhao, L., Zheng, M., Ma, S.: The nature of novelty detection. Information Retrieval 9, 527–541 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Agus T. Kwee
    • 1
  • Flora S. Tsai
    • 1
  • Wenyin Tang
    • 1
  1. 1.School of Electrical & Electronic EngineeringNanyang Technological UniversitySingapore

Personalised recommendations