Skip to main content

Leveraging High-Performance Computing Infrastructures to Web Data Analytic Applications by Means of Message-Passing Interface

  • Chapter
Modeling and Processing for Next-Generation Big-Data Technologies

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 4))

  • 3511 Accesses

Abstract

Modern computing technologies are increasingly getting data-centric, addressing a variety of challenges in storing, accessing, processing, and streaming massive amounts of structured and unstructured data effectively. An important analytical task in a number of scientific and technological domains is to retrieve information from all these data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major issue is the size, structural complexity, and frequency of the analyzed data’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques, tools, and infrastructures ineffective. We introduce an innovative approach to parallelise data-centric applications based on the Message-Passing Interface. In contrast to other known parallelisation technologies, our approach enables a very high-utilization rate and thus low costs of using productional high-performance computing and Cloud computing infrastructures. The advantages of the technique are demonstrated on a challenging Semantic Web application that is performing web-scale reasoning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Gonzalez, R.: Closing in on a million open government data sets (2012), http://semanticweb.com/closinginona-millionopengovernmentdatasets_b29994

  2. Linked Life Data repository website, http://linkedlifedata.com/

  3. OpenPHACTS project website, http://www.openphacts.org/

  4. Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Communications of ACM 47, 45–47 (2004)

    Article  Google Scholar 

  5. Linked Open Data initiative, http://lod-cloud.net

  6. Cheptsov, A., Koller, B.: A service-oriented approach to facilitate big data analytics on the Web. In: Topping, B.H.V., Iványi, P. (eds.) Proceedings of the Fourteenth International Conference on Civil, Structural and Environmental Engineering Computing. Civil-Comp Press, Stirlingshire (2013)

    Google Scholar 

  7. Cheptsov, A.: Semantic Web Reasoning on the internet scale with Large Knowledge Collider. International Journal of Computer Science and Applications, Technomathematics Research Foundation 8(2), 102–117 (2011)

    Google Scholar 

  8. Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Computing 37, 610–632 (2011)

    Article  Google Scholar 

  9. Castain, R.H., Tan, W.: MR+. A technical overview (2012), http://www.open-mpi.de/video/mrplus/Greenplum_RalphCastain-2up.pdf

  10. Cheptsov, A.: Enabling High Performance Computing for Semantic Web applications by means of Open MPI Java bindings. In: Proc. the Sixth International Conference on Advances in Semantic Processing (SEMAPRO 2012) Conference, Barcelona, Spain (2012)

    Google Scholar 

  11. McCarthy, P.: Introduction to Jena. IBM Developer Works (2013), http://www.ibm.com/developerworks/xml/library/j-jena

  12. Gonzalez, R.: Two kinds of big data (2011), http://semanticweb.com/two-kinds-ofbig-datb21925

  13. Hadoop framework website, http://hadoop.apache.org/mapreduce

  14. Bornemann, M., van Nieuwpoort, R., Kielmann, T.: Mpj/ibis: A flexible and efficient message passing platform for Java. Concurrency and Computation: Practice and Experience 17, 217–224 (2005)

    Google Scholar 

  15. MPI: A Message-Passing Interface standard. Message Passing Interface Forum (2005), http://www.mcs.anl.gov/research/projects/mpi/mpistandard/mpi-report-1.1/mpi-report.htm

  16. Gabriel, E., et al.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. Baker, M., et al.: MPI-Java: An object-oriented Java interface to MPI. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 748–762. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  18. van Nieuwpoort, R., et al.: Ibis: a flexible and efficient Java based grid programming environment. Concurrency and Computation: Practice and Experience 17, 1079–1107 (2005)

    Article  Google Scholar 

  19. Dean, J., Ghemawat, S.: MapReduce - simplified data processing on large clusters. In: Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)

    Google Scholar 

  20. Resource Description Framework (RDF). RDF Working Group (2004), http://www.w3.org/RDF/

  21. Lustre file system - high-performance storage architecture and scalable cluster file system. White Paper. Sun Microsystems, Inc. (December 2007)

    Google Scholar 

  22. Portable Batch System (PBS) documentation, http://www.pbsworks.com/

  23. Dimovski, A., Velinov, G., Sahpaski, D.: Horizontal partitioning by predicate abstraction and its application to data warehouse design. In: Catania, B., Ivanović, M., Thalheim, B. (eds.) ADBIS 2010. LNCS, vol. 6295, pp. 164–175. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable Semantic Web data management using vertical partitioning. In: Proc. The 33rd International Conference on Very Large Data Bases (VLDB 2007) (2007)

    Google Scholar 

  25. Curino, C., et al.: Workload-aware database monitoring and consolidation. In: Proc. SIGMOD Conference, pp. 313–324 (2011)

    Google Scholar 

  26. OMPIJava tool website, http://sourceforge.net/projects/mpijava/

  27. Cheptsov, A., et al.: Enabling high performance computing for Java applications using the Message-Passing Interface. In: Proc. of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering (PARENG 2011) (2011)

    Google Scholar 

  28. Carpenter, B., et al.: mpiJava 1.2: API specification. Northeast Parallel Architecture Center. Paper 66 (1999), http://surface.syr.edu/npac/66

  29. Kielmann, T., et al.: Enabling Java for High-Performance Computing: Exploiting distributed shared memory and remote method invocation. Communications of the ACM (2001)

    Google Scholar 

  30. Baker, M., Carpenter, B., Shafi, A.: MPJ Express: Towards thread safe Java HPC. In: Proc. IEEE International Conference on Cluster Computing (Cluster 2006) (2006)

    Google Scholar 

  31. Judd, G., et al.: Design issues for efficient implementation of MPI in Java. In: Proc. of the 1999 ACM Java Grande Conference, pp. 58–65 (1999)

    Google Scholar 

  32. Carpenter, B., et al.: MPJ: MPI-like message passing for Java. Concurrency and Computation - Practice and Experience 12(11), 1019–1038 (2000)

    Article  MATH  Google Scholar 

  33. Open MPI project website, http://www.openmpi.org

  34. MPICH2 project website, http://www.mcs.anl.gov/research/projects/mpich2/

  35. HP-JAVA project website, http://www.hpjava.org

  36. Liang, S.: Java Native Interface: Programmer’s Guide and Reference. Addison-Wesley (1999)

    Google Scholar 

  37. Vodel, M., Sauppe, M., Hardt, W.: Parallel high performance applications with mpi2java - a capable Java interface for MPI 2.0 libraries. In: Proc. of the 16th Asia-Pacific Conference on Communications (APCC), Nagoya, Japan, pp. 509–513 (2010)

    Google Scholar 

  38. NetPIPE parallel benchmark website, http://www.scl.ameslab.gov/netpipe/

  39. Bailey, D., et al.: The NAS Parallel Benchmarks. RNR Technical Report RNR-94.007 (March 1994), http://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf

  40. MPJ-Express tool benchmarking results, http://mpj-express.org/performance.html

  41. Sahlgren, M.: An introduction to random indexing. In: Proc. Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), pp. 1–9 (2005)

    Google Scholar 

  42. Jurgens, D.: The S-Space package: An open source package for word space models. In: Proc. of the ACL 2010 System Demonstrations, pp. 30–35 (2010)

    Google Scholar 

  43. Assel, M., et al.: MPI realization of high performance search for querying large RDF graphs using statistical semantics. In: Proc. The 1st Workshop on High-Performance Computing for the Semantic Web, Heraklion, Greece (May 2011)

    Google Scholar 

  44. Extrae performance trace generation library website, http://www.bsc.es/computer-sciences/extrae

  45. Paraver performance analysis tool website, http://www.bsc.es/computer-sciences/performance-tools/paraver/general-overview

  46. Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2), 95–96 (2007)

    Article  Google Scholar 

  47. Weaver, J., Hendler, J.A.: Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 682–697. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  48. Sirin, E., et al.: Pellet: a practical owl-dl reasoner. Journal of Web Semantics (2013), http://www.mindswap.org/papers/PelletJWS.pdf

  49. Cheptsov, A., Koller, B.: JUNIPER takes aim at Big Data. inSiDE - Journal of Innovatives Supercomputing in Deutschland 11(1), 68–69 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Cheptsov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Cheptsov, A., Koller, B. (2015). Leveraging High-Performance Computing Infrastructures to Web Data Analytic Applications by Means of Message-Passing Interface. In: Xhafa, F., Barolli, L., Barolli, A., Papajorgji, P. (eds) Modeling and Processing for Next-Generation Big-Data Technologies. Modeling and Optimization in Science and Technologies, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09177-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09177-8_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09176-1

  • Online ISBN: 978-3-319-09177-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics