Abstract
Modern computing technologies are increasingly getting data-centric, addressing a variety of challenges in storing, accessing, processing, and streaming massive amounts of structured and unstructured data effectively. An important analytical task in a number of scientific and technological domains is to retrieve information from all these data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major issue is the size, structural complexity, and frequency of the analyzed data’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques, tools, and infrastructures ineffective. We introduce an innovative approach to parallelise data-centric applications based on the Message-Passing Interface. In contrast to other known parallelisation technologies, our approach enables a very high-utilization rate and thus low costs of using productional high-performance computing and Cloud computing infrastructures. The advantages of the technique are demonstrated on a challenging Semantic Web application that is performing web-scale reasoning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gonzalez, R.: Closing in on a million open government data sets (2012), http://semanticweb.com/closinginona-millionopengovernmentdatasets_b29994
Linked Life Data repository website, http://linkedlifedata.com/
OpenPHACTS project website, http://www.openphacts.org/
Coffman, T., Greenblatt, S., Marcus, S.: Graph-based technologies for intelligence analysis. Communications of ACM 47, 45–47 (2004)
Linked Open Data initiative, http://lod-cloud.net
Cheptsov, A., Koller, B.: A service-oriented approach to facilitate big data analytics on the Web. In: Topping, B.H.V., Iványi, P. (eds.) Proceedings of the Fourteenth International Conference on Civil, Structural and Environmental Engineering Computing. Civil-Comp Press, Stirlingshire (2013)
Cheptsov, A.: Semantic Web Reasoning on the internet scale with Large Knowledge Collider. International Journal of Computer Science and Applications, Technomathematics Research Foundation 8(2), 102–117 (2011)
Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Computing 37, 610–632 (2011)
Castain, R.H., Tan, W.: MR+. A technical overview (2012), http://www.open-mpi.de/video/mrplus/Greenplum_RalphCastain-2up.pdf
Cheptsov, A.: Enabling High Performance Computing for Semantic Web applications by means of Open MPI Java bindings. In: Proc. the Sixth International Conference on Advances in Semantic Processing (SEMAPRO 2012) Conference, Barcelona, Spain (2012)
McCarthy, P.: Introduction to Jena. IBM Developer Works (2013), http://www.ibm.com/developerworks/xml/library/j-jena
Gonzalez, R.: Two kinds of big data (2011), http://semanticweb.com/two-kinds-ofbig-datb21925
Hadoop framework website, http://hadoop.apache.org/mapreduce
Bornemann, M., van Nieuwpoort, R., Kielmann, T.: Mpj/ibis: A flexible and efficient message passing platform for Java. Concurrency and Computation: Practice and Experience 17, 217–224 (2005)
MPI: A Message-Passing Interface standard. Message Passing Interface Forum (2005), http://www.mcs.anl.gov/research/projects/mpi/mpistandard/mpi-report-1.1/mpi-report.htm
Gabriel, E., et al.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)
Baker, M., et al.: MPI-Java: An object-oriented Java interface to MPI. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 748–762. Springer, Heidelberg (1999)
van Nieuwpoort, R., et al.: Ibis: a flexible and efficient Java based grid programming environment. Concurrency and Computation: Practice and Experience 17, 1079–1107 (2005)
Dean, J., Ghemawat, S.: MapReduce - simplified data processing on large clusters. In: Proc. OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)
Resource Description Framework (RDF). RDF Working Group (2004), http://www.w3.org/RDF/
Lustre file system - high-performance storage architecture and scalable cluster file system. White Paper. Sun Microsystems, Inc. (December 2007)
Portable Batch System (PBS) documentation, http://www.pbsworks.com/
Dimovski, A., Velinov, G., Sahpaski, D.: Horizontal partitioning by predicate abstraction and its application to data warehouse design. In: Catania, B., Ivanović, M., Thalheim, B. (eds.) ADBIS 2010. LNCS, vol. 6295, pp. 164–175. Springer, Heidelberg (2010)
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable Semantic Web data management using vertical partitioning. In: Proc. The 33rd International Conference on Very Large Data Bases (VLDB 2007) (2007)
Curino, C., et al.: Workload-aware database monitoring and consolidation. In: Proc. SIGMOD Conference, pp. 313–324 (2011)
OMPIJava tool website, http://sourceforge.net/projects/mpijava/
Cheptsov, A., et al.: Enabling high performance computing for Java applications using the Message-Passing Interface. In: Proc. of the Second International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering (PARENG 2011) (2011)
Carpenter, B., et al.: mpiJava 1.2: API specification. Northeast Parallel Architecture Center. Paper 66 (1999), http://surface.syr.edu/npac/66
Kielmann, T., et al.: Enabling Java for High-Performance Computing: Exploiting distributed shared memory and remote method invocation. Communications of the ACM (2001)
Baker, M., Carpenter, B., Shafi, A.: MPJ Express: Towards thread safe Java HPC. In: Proc. IEEE International Conference on Cluster Computing (Cluster 2006) (2006)
Judd, G., et al.: Design issues for efficient implementation of MPI in Java. In: Proc. of the 1999 ACM Java Grande Conference, pp. 58–65 (1999)
Carpenter, B., et al.: MPJ: MPI-like message passing for Java. Concurrency and Computation - Practice and Experience 12(11), 1019–1038 (2000)
Open MPI project website, http://www.openmpi.org
MPICH2 project website, http://www.mcs.anl.gov/research/projects/mpich2/
HP-JAVA project website, http://www.hpjava.org
Liang, S.: Java Native Interface: Programmer’s Guide and Reference. Addison-Wesley (1999)
Vodel, M., Sauppe, M., Hardt, W.: Parallel high performance applications with mpi2java - a capable Java interface for MPI 2.0 libraries. In: Proc. of the 16th Asia-Pacific Conference on Communications (APCC), Nagoya, Japan, pp. 509–513 (2010)
NetPIPE parallel benchmark website, http://www.scl.ameslab.gov/netpipe/
Bailey, D., et al.: The NAS Parallel Benchmarks. RNR Technical Report RNR-94.007 (March 1994), http://www.nas.nasa.gov/assets/pdf/techreports/1994/rnr-94-007.pdf
MPJ-Express tool benchmarking results, http://mpj-express.org/performance.html
Sahlgren, M.: An introduction to random indexing. In: Proc. Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), pp. 1–9 (2005)
Jurgens, D.: The S-Space package: An open source package for word space models. In: Proc. of the ACL 2010 System Demonstrations, pp. 30–35 (2010)
Assel, M., et al.: MPI realization of high performance search for querying large RDF graphs using statistical semantics. In: Proc. The 1st Workshop on High-Performance Computing for the Semantic Web, Heraklion, Greece (May 2011)
Extrae performance trace generation library website, http://www.bsc.es/computer-sciences/extrae
Paraver performance analysis tool website, http://www.bsc.es/computer-sciences/performance-tools/paraver/general-overview
Fensel, D., van Harmelen, F.: Unifying reasoning and search to web scale. IEEE Internet Computing 11(2), 95–96 (2007)
Weaver, J., Hendler, J.A.: Parallel materialization of the finite RDFS closure for hundreds of millions of triples. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 682–697. Springer, Heidelberg (2009)
Sirin, E., et al.: Pellet: a practical owl-dl reasoner. Journal of Web Semantics (2013), http://www.mindswap.org/papers/PelletJWS.pdf
Cheptsov, A., Koller, B.: JUNIPER takes aim at Big Data. inSiDE - Journal of Innovatives Supercomputing in Deutschland 11(1), 68–69 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Cheptsov, A., Koller, B. (2015). Leveraging High-Performance Computing Infrastructures to Web Data Analytic Applications by Means of Message-Passing Interface. In: Xhafa, F., Barolli, L., Barolli, A., Papajorgji, P. (eds) Modeling and Processing for Next-Generation Big-Data Technologies. Modeling and Optimization in Science and Technologies, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09177-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-09177-8_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09176-1
Online ISBN: 978-3-319-09177-8
eBook Packages: EngineeringEngineering (R0)