Skip to main content

Challenges for Dataset Search

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8421)

Abstract

Ranked search of datasets has emerged as a need as shared scientific archives grow in size and variety. Our own have shown that IR-style, feature-based relevance scoring can be an effective tool for data discovery in scientific archives. However, maintaining interactive response times as archives scale will be a challenge. We report here on our exploration of performance techniques for Data Near Here, a dataset search service. We present a sample of results evaluating filter-restart techniques in our system, including two variations, adaptive relaxation and contraction. We then outline further directions for research in this domain.

Keywords

  • data discovery
  • querying scientific data
  • ranked search

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ageev, M., et al.: Find it if you can: A game for modeling different types of web search success using interaction data. In: Proceedings of SIGIR (2011)

    Google Scholar 

  2. Aula, A., et al.: How does search behavior change as search becomes more difficult? In: Proc. of the 28th International Conference on Human Factors in Computing Systems, pp. 35–44 (2010)

    Google Scholar 

  3. Bruno, N., et al.: Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. TODS 27(2), 153–187 (2002)

    CrossRef  Google Scholar 

  4. Carey, M.J., Kossmann, D.: On saying “enough already!” in SQL. ACM SIGMOD Rec. 26(2), 219–230 (1997)

    CrossRef  Google Scholar 

  5. Chaudhuri, S., et al.: Integrating DB and IR technologies: What is the sound of one hand clapping. In: CIDR 2005, pp. 1–12 (2005)

    Google Scholar 

  6. Gaasterland, T.: Cooperative answering through controlled query relaxation. IEEE Expert 12(5), 48–59 (1997)

    CrossRef  Google Scholar 

  7. Hellerstein, J.M., Pfeffer, A.: The RD-tree: An index structure for sets. University of Wisconsin-Madison (1994).

    Google Scholar 

  8. Ilyas, I.F., et al.: A survey of top-k query processing techniques in relational da-tabase systems. ACM Comput. Surv. CSUR. 40(4), 11 (2008)

    Google Scholar 

  9. Jansen, B.J., et al.: Real life, real users, and real needs: A study and analysis of user queries on the web. Inf. Process. Manag. 36(2), 207–227 (2000)

    CrossRef  Google Scholar 

  10. Koposov, S., Bartunov, O.: Q3C, Quad Tree Cube: The new sky-indexing con-cept for huge astronomical catalogues and its realization for main astronomical queries (cone search and Xmatch) in open source database PostgreSQL. In: Astronomical Data Analysis Software and Systems XV. pp. 735–738 (2006)

    Google Scholar 

  11. Kunszt, P., et al.: The indexing of the SDSS science archive. Astron. Data Anal. Softw. Syst. 216 (2000)

    Google Scholar 

  12. Lemson, G., et al.: Implementing a general spatial indexing library for relational databases of large numerical simulations. Scientific and Statistical Database Management, 509–526 (2011)

    Google Scholar 

  13. Megler, V.M.: Ranked Similarity Search of Scientific Datasets: An Information Retrieval Approach (PhD Dissertation in preparation) (2014)

    Google Scholar 

  14. Megler, V.M.: Taming the metadata mess. IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 286–289. IEEE Computer Society, Brisbane (2013)

    Google Scholar 

  15. Megler, V.M., Maier, D.: Finding haystacks with needles: Ranked search for data using geospatial and temporal characteristics. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 55–72. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  16. Singh, G., et al.: A metadata catalog service for data intensive applications. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 33 (2003)

    Google Scholar 

  17. Wang, X., et al.: Liferaft: Data-driven, batch processing for the exploration of scientific databases. In: CIDR (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Maier, D., Megler, V.M., Tufte, K. (2014). Challenges for Dataset Search. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8421. Springer, Cham. https://doi.org/10.1007/978-3-319-05810-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05810-8_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05809-2

  • Online ISBN: 978-3-319-05810-8

  • eBook Packages: Computer ScienceComputer Science (R0)