Abstract
As the sources of chemical literature are vast, researchers and readers have to spend a great amount of time to obtain specific information. In addition to this, the ambiguity among the names of chemical entities often gives results which are irrelevant to the query. So it is better to provide the researchers and readers with the facility to carry out chemistry-oriented search. Nowadays, domain-specific search engines have become popular because they offer increased accuracy and additional functionality for that domain. In this paper, a domain-specific search application is developed to retrieve the relevant chemical documents from a large collection. This search application has advanced features such as specialized index and re-ranking of documents by chemical entities, functional groups, and phrases as keywords. At present, 85,680 patents and 66,425 chemical abstracts are indexed with 36,984 entities of 23 different types and 16 functional groups.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Concepts, D. M. (2006). Technique, Jiawei Han and Micheline Kamber. University of Illinois at Urbana-Champaign.
McCallumzy, A., Nigamy, K., Renniey, J., & Seymorey, K. (1999). Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace (pp. 28–39). Citeseer
Mitra, P., Giles, C. L., Sun, B., & Liu, Y. (2007, November). Chemxseer: a digital library and data repository for chemical kinetics. In Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience (pp. 7–10).
Pence, H. E., & Williams, A. (2010). ChemSpider: An online chemical information resource.
Sun, B., Mitra, P., Lee Giles, C., & Mueller, K. T. (2011). Identifying, indexing, and ranking chemical formulae and chemical names in digital documents. ACM Transactions on Information Systems (TOIS), 29(2), 1–38.
Sun, B., Tan, Q., Mitra, P., & Giles, C. L. (2007, May). Extraction and search of chemical formulae in text documents on the web. In Proceedings of the 16th international conference on World Wide Web (pp. 251–260).
Akhondi, S. A., Klenner, A. G., Tyrchan, C., Manchala, A. K., Boppana, K., Lowe, D., & Muresan, S. (2014). Annotated chemical patent corpus: A gold standard for text mining. PLoS ONE, 9(9), e107477.
Kolárik, C., Klinger, R., Friedrich, C. M., Hofmann-Apitius, M., & Fluck, J. (2008). Chemical names: Terminological resources and corpora annotation. In Workshop on Building and Evaluating Resources for Biomedical Text Mining (6th edition of the Language Resources and Evaluation Conference).
Friedrich, C. M., Revillion, T., Hofmann, M., & Fluck, J. (2006, April). Biomedical and chemical named entity recognition with conditional random fields: The advantage of dictionary features. In SMBM
Steele, R. J. (2001). Techniques for specialized search engines. In International Conference on Internet Computing. CSREA Press.
Mohd, M. (2011). Development of Search Engines using Lucene: An Experience. Procedia-Social and Behavioral Sciences, 18, 282–286.
Gondaliya, T. P., & Joshi, H. D. (2017). Journey of Information Retrieval to Information Retrieval Tools-IR&IRT A Review.
Rahayu, S. B., Noah, S. A., & Wardhana, A. A. (2011, June). User-centered evaluation for IR: Ranking annotated document algorithms. In International Conference on Software Engineering and Computer Systems (pp. 306–312). Berlin, Heidelberg: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hema, R. (2021). Extraction and Search of Relevant Chemical Documents from the Web. In: Peng, SL., Hao, RX., Pal, S. (eds) Proceedings of First International Conference on Mathematical Modeling and Computational Science. Advances in Intelligent Systems and Computing, vol 1292. Springer, Singapore. https://doi.org/10.1007/978-981-33-4389-4_24
Download citation
DOI: https://doi.org/10.1007/978-981-33-4389-4_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4388-7
Online ISBN: 978-981-33-4389-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)