Skip to main content

Further Investigations for Documents Information Retrieval Based on DWT

  • Conference paper
  • First Online:
Book cover Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016 (AISI 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 533))

Abstract

In most of the classical information retrieval models, documents are represented as bag-of words which takes into account the term frequencies (tf) and inverse document frequencies (idf) while they ignore the term proximity. Recently, term proximity among query terms has been observed to be beneficial for improving performance of document retrieval. Several applications of the retrieval have implemented tools to determine term proximity at the query formulation level. They rank documents based on the relative positions of the query terms within the documents. They must store all proximity data in the index, leading to a large index, which slows the search. Recently, many models use term signal representation to represent a query term, the query is transformed from the time domain to the frequency domain using transformation techniques such as wavelet. Discrete Wavelet Transform (DWT) uses multiple resolutions technique by which different frequencies are analyzed with different resolutions. The advantage of the DWT is to consider the spatial information of the query terms within the document rather than using only the count of terms. In this paper, in order to improve ranking score as well as improve the run-time efficiency to resolve the query, and maintain a reasonable space for the index, three different types of spectral analysis based on semantic segmentation are carried out namely: sentence-based segmentation, paragraph-based segmentation and fixed length segmentation; and also different term weighting is performed according to term position.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26, 1022–1036 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  2. Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  3. Kang, B., Kim, D., Kim, H.: Fuzzy information retrieval indexed by concept identification. In: International Conference on Text, Speech and Dialogue, pp. 179–186 (2005)

    Google Scholar 

  4. Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25 (1985)

    Google Scholar 

  5. Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 251–258 (2009)

    Google Scholar 

  6. Clarke, C.L.A., Cormack, G.V.: Shortest-substring retrieval and ranking. ACM Trans. Inf. Syst. (TOIS) 18, 44–78 (2000)

    Article  Google Scholar 

  7. Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences (1996)

    Google Scholar 

  8. Bhatia, M.P.S., Kumar Khalid, A.: Contextual proximity based term-weighting for improved web information retrieval. In: Zhang, Z., Siekmann, J. (eds.) KSEM 2007. LNCS (LNAI), vol. 4798, pp. 267–278. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76719-0_28

    Chapter  Google Scholar 

  9. Aref, W.G., Barbara, D., Johnson, S., Mehrotra, S.: Efficient processing of proximity queries for large databases. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 147–154 (1995)

    Google Scholar 

  10. El Mahdaouy, A., Gaussier, E., El Alaoui, S.O.: Exploring term proximity statistic for Arabic information retrieval. In: 2014 Third IEEE International Colloquium on Information Science and Technology (CIST). IEEE (2014)

    Google Scholar 

  11. Ye, Z., He, B., Wang, L., Luo, T.: Utilizing term proximity for blog post retrieval. J. Am. Soc. Inf. Sci. Technol. 64, 2278–2298 (2013)

    Article  Google Scholar 

  12. Costa, A., Melucci, M.: An information retrieval model based on discrete fourier transform. In: Cunningham, H., Hanbury, A., Rüger, S. (eds.) IRFC 2010. LNCS, vol. 6107, pp. 84–99. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13084-7_8

    Chapter  Google Scholar 

  13. Ramamohanarao, K., Park, L.A.F.: Spectral-based document retrieval. In: Maher, M.J. (ed.) ASIAN 2004. LNCS, vol. 3321, pp. 407–417. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30502-6_30

    Chapter  Google Scholar 

  14. Park, L.A.F., Ramamohanarao, K., Palaniswami, M.: Fourier domain scoring: a novel document ranking method. IEEE Trans. Knowl. Data Eng. 16, 529–539 (2004)

    Article  Google Scholar 

  15. Park, L.A.F., Palaniswami, M., Kotagiri, R.: Internet document filtering using fourier domain scoring. In: Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 362–373. Springer, Heidelberg (2001). doi:10.1007/3-540-44794-6_30

    Chapter  Google Scholar 

  16. Park, L.A., Palaniswami, M., Ramamohanarao, K.: A novel document ranking method using the discrete cosine transform. IEEE Trans. Pattern Anal. Mach. Intell. 27, 130–135 (2005)

    Article  Google Scholar 

  17. Park, L.A.F., Ramamohanarao, K., Palaniswami, M.: A novel document retrieval method using the discrete wavelet transform. ACM Trans. Inf. Syst. (TOIS) 23, 267–298 (2005)

    Article  Google Scholar 

  18. Arru, G., Feltoni Gurini, D., Gasparetti, F., Micarelli, A., Sansonetti, G.: Signal-based user recommendation on twitter. In: Proceedings of the 22nd International Conference on World Wide Web Steering Committee/ACM, pp. 941–944 (2013)

    Google Scholar 

  19. Yang, T., Lee, D.: T3: on mapping text to time series. In: Proceedings of the 3rd Alberto Mendelzon Int’l Workshop on Foundations of Data Management, Arequipa, Peru, May 2009

    Google Scholar 

  20. Zobel, J., Moffat, A.: Exploring the similarity space. In: ACM SIGIR Forum, pp. 18–34 (1998)

    Google Scholar 

  21. Daubechies, I.: Where do wavelets come from? A personal point of view. Proc. IEEE 84, 510–513 (1996)

    Article  Google Scholar 

  22. Haar, A.: Zur theorie der orthogonalen funktionen systeme. Mathematische Annalen 69, 331–371 (1910)

    Article  MathSciNet  MATH  Google Scholar 

  23. Diwali, A., Kamel, M., Dahab, M.: Arabic text-based chat topic classification using discrete wavelet transform. Int. J. Comput. Sci. Iss. (IJCSI) 12, 86 (2015)

    Google Scholar 

  24. He, B., Huang, J.X., Zhou, X.: Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181(14), 3017–3031 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This paper was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, under grant No. (33 611-D1433). The authors, therefore, acknowledge with thanks DSR technical and financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Yehia Dahab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dahab, M.Y., Kamel, M., Alnofaie, S. (2017). Further Investigations for Documents Information Retrieval Based on DWT. In: Hassanien, A., Shaalan, K., Gaber, T., Azar, A., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016. AISI 2016. Advances in Intelligent Systems and Computing, vol 533. Springer, Cham. https://doi.org/10.1007/978-3-319-48308-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48308-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48307-8

  • Online ISBN: 978-3-319-48308-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics