Skip to main content

ReadFast: Structural Information Retrieval from Biomedical Big Text by Natural Language Processing

  • Chapter
  • First Online:
Information Reuse and Integration in Academia and Industry

Abstract

While the problem to find needed information on the Web is being solved by the major search engines, access to the information in Big text, large-scale text datasets, and documents (Biomedical literature, e-books, conference proceedings, etc.) is still very rudimentary (Lin and Cohen (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon). Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new information retrieval engine that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs (Gubanov et al. (2009) Ibm ufo repository. In: VLDB, Lyon; Gubanov et al. (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas) yields more relevant search results and improves user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adelberg B (1998) NoDoSE – a tool for semi-automatically extracting structured and semistructured data from text documents. In: SIGMOD record, Seattle

    Google Scholar 

  2. Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: ACM DL, San Antonio

    Google Scholar 

  3. Agichtein E, Ipeirotis P, Gravano L (2003) Modeling query-based access to text databases. In: WebDB, San Diego

    Google Scholar 

  4. Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behavior information. In: SIGIR, Seattle

    Google Scholar 

  5. Agrawal S, Chaudhuri S, Das G (2002) Dbxplorer: a system for keyword-based search over relational databases. In: ICDE, San Jose

    Google Scholar 

  6. Anyanwu K, Maduko A, Sheth A (2007) Sparq2l: towards support for subgraph extraction queries in rdf databases. In: WWW, Banff

    Google Scholar 

  7. Arocena GO, Mendelzon AO (1998) Weboql: restructuring documents, databases, and webs. In: ICDE, Orlando

    Google Scholar 

  8. Banko M, Brill E, Dumais S, Lin J (2002) Askmsr: question answering using the worldwide web. In: EMNLP, Philadelphia

    Google Scholar 

  9. Brin S (1998) Extracting patterns and relations from the world wide web. In: EDBT, Valencia

    Google Scholar 

  10. Cai Y, Dong XL, Halevy A, Liu JM, Madhavan J (2005) Personal information management with semex. In: SIGMOD, Baltimore

    Google Scholar 

  11. Califf ME, Mooney RJ (1998) Relational learning of pattern-match rules for information extraction. In: AAAI, Madison

    Google Scholar 

  12. Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: WWW, Banff

    Google Scholar 

  13. Cheng T, Yan X, Chang KCC (2007) Entityrank: searching entities directly and holistically. In: VLDB, Vienna

    Google Scholar 

  14. Crescenzi V, Mecca G (1998) Grammars have exceptions. J Inf Syst (Special issue on Semistructured Data) 23(9):539–565

    Google Scholar 

  15. Crescenzi V, Mecca G, Merialdo P (2001) Roadrunner: towards automatic data extraction from large web sites. In: VLDB, Roma

    Google Scholar 

  16. Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11:453

    Article  Google Scholar 

  17. Diederich J, Balke WT, Thaden U (2007) Demonstrating the semantic growbag: automatically creating topic facets for faceteddblp. In: JCDL, Vancouver

    Google Scholar 

  18. Dong X, Halevy A (2007) Indexing dataspaces. In: SIGMOD, Beijing

    Google Scholar 

  19. Downey D, Etzioni O, Soderland S, Weld D (2004) Learning text patterns for web information extraction and assessment. In: AAAI, San Jose

    Google Scholar 

  20. Embley DW, Campbell DM, Jiang YS, Liddle SW, Ng YK, Quass D, Smith, RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31:227–251

    Article  MATH  Google Scholar 

  21. Etzioni O, Cafarella M, Downey D, Kok S, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2004) Web-scale information extraction in knowitall. In: WWW, Manhattan

    Google Scholar 

  22. Freitag D (1998) Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University

    Google Scholar 

  23. Gubanov M, Shapiro L (2011) Using unified famous objects (ufo) to automate Alzheimer’s disease diagnosis. In: IEEE BIBM, Atlanta

    Google Scholar 

  24. Gubanov MN, Popa L, Ho H, Pirahesh H, Chang P, Chen L (2009) Ibm ufo repository. In: VLDB, Lyon

    Google Scholar 

  25. Gubanov M, Shapiro L, Pyayt A (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas

    Google Scholar 

  26. Hammer J, McHugh J, Garcia-Molina H (1997) Semistructured data: the TSIMMIS experience. In: Proceedings of the East-European workshop on advances in databases and information systems, St. Petersburg

    Google Scholar 

  27. He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: SIGMOD, Beijing

    Google Scholar 

  28. Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. Technical report S2K-92-09

    Google Scholar 

  29. Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: VLDB, Hong Kong

    Google Scholar 

  30. Hsu CN, Dung MT (1998) Generating finite-state transducers for semi-structured data extraction from the web. J Inf Syst (Special issue on Semistructured Data) 23(9):521–538

    Google Scholar 

  31. http://www.infocious.com

  32. Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: SIGIR, Athens

    Google Scholar 

  33. Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 60:493–502

    Article  Google Scholar 

  34. Klein D, Manning C (2007) Fast exact inference with a factored model for natural language parsing. In: NIPS, Vancouver

    Google Scholar 

  35. Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif Intell 118:15–68

    Article  MathSciNet  MATH  Google Scholar 

  36. Laender A, Ribeiro-Neto B, Silva A, Teixeira J (2002) A brief survey of web data extraction tools. In: SIGMOD record, Madison,

    Google Scholar 

  37. Laender AHF, Ribeiro-Neto B, da Silva AS (2002) Debye – date extraction by example. Data Knowl Eng 40(2):121–154

    Article  MATH  Google Scholar 

  38. Lin F, Cohen WW (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon

    Google Scholar 

  39. Liu L, Pu C, Han W (2000) XWRAP: an XML-enabled wrapper construction system for web information sources. In: ICDE, San Diego

    Google Scholar 

  40. Madhavan J, Cohen S, Dong X, Halevy A, Jeffery S, Ko D, Yu C (2007) Navigating the seas of structured web data. In: CIDR, Asilomar

    Google Scholar 

  41. Nie Z, Ma Y, Shi S, Wen JR, Ma WY (2007) Web object retrieval. In: WWW, Banff

    Google Scholar 

  42. Ribeiro-Neto BA, Laender AHF, da Silva AS (1999) Extracting semi-structured data through examples. In: CIKM, Kansas City

    Google Scholar 

  43. Sahuguet A, Azavant F (2001) Building intelligent web applications using lightweight wrappers. Data Knowl Eng 36:283–316

    Article  MATH  Google Scholar 

  44. Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18:613–620

    Article  MATH  Google Scholar 

  45. Sayyadian M, LeKhac H, Doan A, Gravano L (2007) Efficient keyword search across heterogeneous relational databases. In: ICDE, Istanbul

    Google Scholar 

  46. Sekine S (2006) On-demand information extraction. In: COLING/ACL, Sydney

    Google Scholar 

  47. Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34:233

    Article  MATH  Google Scholar 

  48. Udrea O, Getoor L, Miller RJ (2007) Leveraging data and structure in ontology integration. In: SIGMOD, Beijing

    Google Scholar 

  49. Vanderwende L, Kacmarcik G, Suzuki H, Menezes A (2005) Mindnet: an automatically-created lexical resource. In: HLT/EMNLP, Vancouver

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Gubanov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Wien

About this chapter

Cite this chapter

Gubanov, M., Shapiro, L., Pyayt, A. (2013). ReadFast: Structural Information Retrieval from Biomedical Big Text by Natural Language Processing. In: Özyer, T., Kianmehr, K., Tan, M., Zeng, J. (eds) Information Reuse and Integration in Academia and Industry. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1538-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-1538-1_9

  • Published:

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-1537-4

  • Online ISBN: 978-3-7091-1538-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics