Skip to main content
Log in

A New Term Significance Weighting Approach

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The authors present a new term significance measure that integrates term frequency retrieval characteristics, term frequency, document collection characteristics, and both the term depth and width distribution characteristics. A new concept, the term depth distribution, is introduced and its impact on the term significance is analyzed. The authors address the features of the new term significance measure from the angles of the impact of the variables (parameters) on it and the iso-significance contour analyses. An experimental study was conducted to compare the newly developed approach with two other popular approaches from the perspectives of both efficiency and effectiveness. The results show that the newly developed approach achieves satisfactory performance. Issues for further research on this topic are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, J.D. and Perez Carballo, J. (2001a). The Nature of Indexing: How Humans and Machines Analyze Messages and Texts for Retrieval. Part I: Research, and the Nature of Human Indexing. Information Processing and Management, 37(2), 231–254.

    Google Scholar 

  • Anderson, J.D. and Perez Carballo, J. (2001b). The Nature of Indexing: How Humans and Machines Analyze Messages and Texts for Retrieval. Part II: Machine Indexing, and the Allocation of Human Versus Machine Effort. InformationProcessing and Management, 37(2), 255–277.

  • Atlam, E.S., Fuketa, M., and Morita, K. (2000). Similarity Measurement Using Term Negative Weight and Its Application to Word Similarity. Information Processing and Management, 36(5), 717–736.

    Google Scholar 

  • Boger, Z., Kuflik, T., and Shoval, P. (2001). Automatic Keyword Identification by Artificial Neural Networks Compared to Manual Identification by Users of Filtering Systems. Information Processing and Management, 37(2), 187–198.

    Google Scholar 

  • Debole, F. and Sebastiani, F. (2003). Information Access and Retrieval: Supervised Term Weighting for Automated Text Categorization. In Proceedings of the 2003 ACM Symposium on Applied Computing (pp. 784–788). Melbourne, Florida: ACM.

  • Gordon, M.D. and Dumais, S. (1998). Using Latent Semantic Indexing for Literature Based Discovery. Journal of the American Society for Information Science, 49(8), 674–685.

    Google Scholar 

  • Greiff, W.R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 11–19). Melbourne, Australia: ACM.

  • Greiff, W.R., Morgan, W.T., and Ponte, J.M. (2002). Information Retrieval Models: The Role of Variance in Term Weighting for Probabilistic Information Retrieval. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 252–259). New York, NY: ACM.

  • Jin, R., Falusos, C., and Hauptmann, A.G. (2001). Meta-Scoring: Automatically Evaluating Term Weighting Schemes in IR Without Precision-Recall. In Proceedings of the 24th Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval (pp.83–89). New Orleans, Louisiana: ACM.

  • John, W.W. (2001). Global Term Weights for Document Retrieval Learned from TREC Data. Journal of Information Science, 27(5), 303–310.

    Google Scholar 

  • Keen, E.M. (1991). The Use of Term Position Devices in Ranked Output Experiments. Journal of Documentation, 47, 1–22.

    Google Scholar 

  • Korfhage, R. (1997). Information Storage and Retrieval. New York: Wiley Computer Pub.

    Google Scholar 

  • Lai, Y.S. and Wu, C.H. (2002). Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-Word Methodology. ACM Transactions on Asian Language Information Processing (TALIP), 1(1), 34–64.

  • Luhn, H.P. (1957). A Statistical Approach to the Mechanized Encoding andSearching of Literary Information. IBM Journal of Research and Development, 1(4), 309–317.

    Google Scholar 

  • Luhn, H.P. (1958). The Automatic Creation of Literature Abstract. IBM Journal of Research and Development, 2(4), 159–165.

    Google Scholar 

  • Meadow, C.T. (1992). Text Information Retrieval System. California: San Diego Academic Press.

    Google Scholar 

  • Melucci, M. (1998). Passage Retrieval: A Probabilistic Technique. Information Processing & Management, 34(1), 43–68.

    Google Scholar 

  • Ponte, J.M. and Croft, W.B. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of 21st Annual International SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). Melbourne, Australia: ACM.

  • Rasmussen, E. (1992). Clustering Algorithms. In W.B. Frakes and R. Baeza-Yates (Eds.), Information Retrieval: Data Structures and Algorithms, Prentice Hall. (pp. 419–442). Englewood Cliffs, NJ

    Google Scholar 

  • Ro, J.S. (1988). An Evaluation of the Applicability of Ranking Algorithms to Improve the Effectiveness of Full-Text Retrieval. II. On the Effectiveness of Ranking Algorithms on Full-Text Retrieval. Journal of the American Society for Information Science, 39(3), 147–160.

    Google Scholar 

  • Robertson, A.M. and Willett, P. (1996). An Upperbound to the Performanceof Ranked-OutputSearching: Optimal Weighting of Query Terms Using a Genetic Algorithm. Journal of Documentation, 52, 405–420.

    Google Scholar 

  • Robertson, S.E., Thompson, C.L., and Macaskill, M.J. (1986). Weighting, Ranking and Relevance Feedback in a Front-end System. Journal of Information Science, 12(2), 71–75.

    Google Scholar 

  • Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., and Gatford, M. (1994). Okapi at TREC-2. In Proceedings of The Second Text Retrieval Conference (pp. 21–34). Gaithersburgh, MD: GPO.

  • Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. New York: Addison-Wesley.

  • Salton, G., Allan, J., and Singhal, A. (1996). Information Processing and Management, 32(2), 127–138.

    Google Scholar 

  • Salton, G. and Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523.

    Google Scholar 

  • Salton, G. and Yang, C.S. (1973). On the Specification of Term Values in Automatic Indexing. Journal of Documentation, 29(4), 351–372.

    Google Scholar 

  • Sparck Jones, K. (1972). A Statistical Interpretation of Term Specificity and Its Application in Information Retrieval. Journal of Documentation, 28, 11–21.

    Google Scholar 

  • Sparck Jones, K. (1973). Indexing Term Weighting. Information Storage and Retrieval, 9, 619–633.

    Google Scholar 

  • Umino, B. (1988). Some Principles of Weighting Methods Based on Word Frequencies for Automatic Indexing. Library and Information Science, 26, 67–88.

    Google Scholar 

  • van Rijsbergen, C.J. (1977). A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval. Journal of Documentation, 33(2), 106–119.

    Google Scholar 

  • van Rijsbergen, C.J. (1979). Information Retrieval, 2nd ed. London: Butterworths.

    Google Scholar 

  • Wilbur, W.J. (1993). Retrieval Testing with Hypergeometric Document Models: Global Term Weighting Approach. Journal of the American Society for Information Science, 44, 340–351.

    Google Scholar 

  • Zobel, J. and Moffat, A. (1998). Exploring the Similarity Space. ACM SIGIR Forum, 32(1), 18–34.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Nguyen, T.N. A New Term Significance Weighting Approach. J Intell Inf Syst 24, 61–85 (2005). https://doi.org/10.1007/s10844-005-0267-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-005-0267-y

Keywords

Navigation