SONCA: Scalable Semantic Processing of Rapidly Growing Document Stores

Grzegorowski, Marek; Pardel, Przemysław Wiktor; Stawicki, Sebastian; Stencel, Krzysztof

doi:10.1007/978-3-642-32518-2_9

Marek Grzegorowski³,
Przemysław Wiktor Pardel^3,4,
Sebastian Stawicki³ &
…
Krzysztof Stencel³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 185))

1399 Accesses
1 Citations

Abstract

Scientific data constitutes a great asset. However, its volume is far bigger than any human can comprehend. Therefore, automatic analytical, search and indexing solutions are called for. In this paper we present the architecture and the data model of such a system. SONCA (Search based on ONtologies and Compound Analytics) is a platform to implement and exploit intelligent algorithms identifying relations between various types of objects (publications, inventions, scientists and institutions). The results of these algorithms can be used to build semantic search engines but also can be fed into further analytical algorithms in order to find even more associations.We also show experimental evaluation of the performance of SONCA. Its results are promising and we argue that SONCA’s architecture is robust.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adar, E., Teevan, J., Agichtein, E., Maarek, Y. (eds.): Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8-12. ACM (2012)
Google Scholar
Agrawal, R., et al.: The claremont report on database research. Commun. ACM 52(6), 56–65 (2009)
Article Google Scholar
Burzańska, M., Stencel, K., Suchomska, P., Szumowska, A., Wiśniewski, P.: Recursive Queries Using Object Relational Mapping. In: Kim, T.-H., Lee, Y.-H., Kang, B.-H., Ślęzak, D. (eds.) FGIT 2010. LNCS, vol. 6485, pp. 42–50. Springer, Heidelberg (2010)
Chapter Google Scholar
Cuzzocrea, A., Serafino, P.: LCS-hist: taming massive high-dimensional data cube compression. In: Kersten, M.L., Novikov, B., Teubner, J., Polutin, V., Manegold, S. (eds.) EBDT. ACM International Conference Proceeding Series, vol. 360, pp. 768–779. ACM (2009)
Google Scholar
Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)
Google Scholar
Kersten, M.L., Manegold, S.: Revolutionary database technology for data intensive research. ERCIM News (89) (2012)
Google Scholar
Meina, M.: Query-context search result clustering basing on graphs. In: Szczuka, M., Czaja, L., Skowron, A., Kacprzak, M. (eds.) CS&P, Puttusk, Poland, pp. 346–352. Białystok University of Technology (2011) Electronic edition
Google Scholar
Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)
Chapter Google Scholar
Poelmans, J., Ignatov, D., Kuznetsov, S., Dedene, G., Elzinga, P., Viaene, S.: Formal concept analysis in knowledge processing: A survey on applications. Inf. Sci. (2012)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Min. Knowl. Discov. 4(2/3), 89–125 (2000)
Article Google Scholar
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Chapter Google Scholar
Ślęzak, D., Synak, P., Borkowski, J., Wróblewski, J., Toppin, G.: A rough-columnar RDBMS engine – a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012)
Google Scholar
Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)
Google Scholar
Szczuka, M., Betliński, P., Herba, K.: Named Entity Matching in Publication Databases. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 172–179. Springer, Heidelberg (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland
Marek Grzegorowski, Przemysław Wiktor Pardel, Sebastian Stawicki & Krzysztof Stencel
Institute of Computer Science, University of Rzeszów, ul. Dekerta 2, 35-030, Rzeszów, Poland
Przemysław Wiktor Pardel

Authors

Marek Grzegorowski
View author publications
You can also search for this author in PubMed Google Scholar
Przemysław Wiktor Pardel
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Stawicki
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Stencel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Grzegorowski .

Editor information

Editors and Affiliations

, Department of Computer Science, Eindhoven University of Technology, Eindhoven, 5600, Netherlands
Mykola Pechenizkiy
Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 2, Poznan, 60-965, Poland
Marek Wojciechowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grzegorowski, M., Pardel, P.W., Stawicki, S., Stencel, K. (2013). SONCA: Scalable Semantic Processing of Rapidly Growing Document Stores. In: Pechenizkiy, M., Wojciechowski, M. (eds) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32518-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-32518-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32517-5
Online ISBN: 978-3-642-32518-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics