Skip to main content

Improvement in XML Keyword Search and Ranking for Data Analytics

  • Conference paper
  • First Online:
Smart Systems and IoT: Innovations in Computing

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 141))

Abstract

The success of web search engine for an ordinary user (Initially, search engine requires very precise query which only expert can write.) motivates the search engine for XML database. XML-based search engine requires DOM parser to parse the XML database. DOM parser produces a tree, which developed only in main memory. But generally XML database is larger than the main memory. Therefore, DOM parser has a disadvantage in case of large database. Instead of using DOM parser, Sax parser is used. SAX parser parses the XML file character by character. Means no requirement of the whole file in main memory, and unlikely DOM parser SAX parser requires no tree. SAX parser consumes less time than DOM Parser also. Searching take a lot of time by hitting the database again and again to fetch the same or recently used data. The solution is a simple cache memory. Cache memory is developed by storing recently used data into hashmap because hash map provides the O(1) search time complexity. Ranking use only use IDF*TF score to calculate the result. But this algorithm does not provide the best ranking. Ranking using cosine similarity algorithm is a better approach. (Basically, Cosine algorithm is used to find similarity between two documents.)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wikipedia dataset in form XML file. https://dumps.wikimedia.org/

  2. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: Flexpath: flexible structure and full-text querying for XML. In: Proceedings of the ACM SIGIR, pp. 151–158 (2003)

    Google Scholar 

  3. Bao, Z., Chen, B., Ling, T.W., Lu, J.: Effective XML keyword search with relevance oriented ranking. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 517–528 (2009)

    Google Scholar 

  4. Fuhr, N., Großjohann, K.: XIRQL: a query language for information retrieval in XML Documents. In: Proceedings of the ACM SIGIR, pp. 172–180 (2001)

    Google Scholar 

  5. Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Search XML documents via XML fragments. In: Proceedings of the ACM SIGIR, pp. 151–158 (2003)

    Google Scholar 

  6. Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in XML. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 389–396 (2005)

    Google Scholar 

  7. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 45–56 (2003)

    Google Scholar 

  8. Jarvelin, K., Kekalainen, J., Trans, A.C.M.: Cumulated gain based evaluation of IR techniques. Inf. Syst. 20, 422–446 (2002)

    Google Scholar 

  9. He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: ranked keyword searches on graphs. In: Proceedings of the ACM SIGMOD Conference, pp. 305–316 (2007)

    Google Scholar 

  10. Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the International Conference on World Wide Web (WWW) (2006)

    Google Scholar 

  11. Bao, Z., Lu, J., Ling, T.W.: Towards an effective XML keyword search. IEEE Trans. Knowl. Data Eng. 22(8) (2010)

    Google Scholar 

  12. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 367–378 (2003)

    Google Scholar 

  13. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4), 525–539 (2006)

    Article  Google Scholar 

  14. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: Proceedings of the ACM SIGMOD Conference (2003)

    Google Scholar 

  15. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable LCAs over XML documents. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 31–40 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pradeep Tomar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yadav, V., Tomar, P., Singh, P., Kaur, G. (2020). Improvement in XML Keyword Search and Ranking for Data Analytics. In: Somani, A.K., Shekhawat, R.S., Mundra, A., Srivastava, S., Verma, V.K. (eds) Smart Systems and IoT: Innovations in Computing. Smart Innovation, Systems and Technologies, vol 141. Springer, Singapore. https://doi.org/10.1007/978-981-13-8406-6_33

Download citation

Publish with us

Policies and ethics