SONCA: Scalable Semantic Processing of Rapidly Growing Document Stores

  • Marek Grzegorowski
  • Przemysław Wiktor Pardel
  • Sebastian Stawicki
  • Krzysztof Stencel
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 185)


Scientific data constitutes a great asset. However, its volume is far bigger than any human can comprehend. Therefore, automatic analytical, search and indexing solutions are called for. In this paper we present the architecture and the data model of such a system. SONCA (Search based on ONtologies and Compound Analytics) is a platform to implement and exploit intelligent algorithms identifying relations between various types of objects (publications, inventions, scientists and institutions). The results of these algorithms can be used to build semantic search engines but also can be fed into further analytical algorithms in order to find even more associations.We also show experimental evaluation of the performance of SONCA. Its results are promising and we argue that SONCA’s architecture is robust.


Intelligent Algorithm Semantic Search Explicit Semantic Analysis Indexing Solution Data Model Performance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adar, E., Teevan, J., Agichtein, E., Maarek, Y. (eds.): Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8-12. ACM (2012)Google Scholar
  2. 2.
    Agrawal, R., et al.: The claremont report on database research. Commun. ACM 52(6), 56–65 (2009)CrossRefGoogle Scholar
  3. 3.
    Burzańska, M., Stencel, K., Suchomska, P., Szumowska, A., Wiśniewski, P.: Recursive Queries Using Object Relational Mapping. In: Kim, T.-H., Lee, Y.-H., Kang, B.-H., Ślęzak, D. (eds.) FGIT 2010. LNCS, vol. 6485, pp. 42–50. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Cuzzocrea, A., Serafino, P.: LCS-hist: taming massive high-dimensional data cube compression. In: Kersten, M.L., Novikov, B., Teubner, J., Polutin, V., Manegold, S. (eds.) EBDT. ACM International Conference Proceeding Series, vol. 360, pp. 768–779. ACM (2009)Google Scholar
  5. 5.
    Janusz, A., Świeboda, W., Krasuski, A., Nguyen, H.S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 156–165. Springer, Heidelberg (2012)Google Scholar
  6. 6.
    Kersten, M.L., Manegold, S.: Revolutionary database technology for data intensive research. ERCIM News (89) (2012)Google Scholar
  7. 7.
    Meina, M.: Query-context search result clustering basing on graphs. In: Szczuka, M., Czaja, L., Skowron, A., Kacprzak, M. (eds.) CS&P, Puttusk, Poland, pp. 346–352. Białystok University of Technology (2011) Electronic editionGoogle Scholar
  8. 8.
    Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended Document Representation for Search Result Clustering. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Niezgódka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Poelmans, J., Ignatov, D., Kuznetsov, S., Dedene, G., Elzinga, P., Viaene, S.: Formal concept analysis in knowledge processing: A survey on applications. Inf. Sci. (2012)Google Scholar
  10. 10.
    Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Data Min. Knowl. Discov. 4(2/3), 89–125 (2000)CrossRefGoogle Scholar
  11. 11.
    Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic Analytics of PubMed Content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Ślęzak, D., Synak, P., Borkowski, J., Wróblewski, J., Toppin, G.: A rough-columnar RDBMS engine – a case study of correlated subqueries. IEEE Data Eng. Bull. 35(1), 34–39 (2012)Google Scholar
  13. 13.
    Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: an analytic data warehouse for ad-hoc queries. PVLDB 1(2), 1337–1345 (2008)Google Scholar
  14. 14.
    Szczuka, M., Betliński, P., Herba, K.: Named Entity Matching in Publication Databases. In: Yan, J., et al. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 172–179. Springer, Heidelberg (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marek Grzegorowski
    • 1
  • Przemysław Wiktor Pardel
    • 1
    • 2
  • Sebastian Stawicki
    • 1
  • Krzysztof Stencel
    • 1
  1. 1.Faculty of Mathematics, Informatics and MechanicsUniversity of WarsawWarsawPoland
  2. 2.Institute of Computer ScienceUniversity of RzeszówRzeszówPoland

Personalised recommendations