Abstract
There is an increasing interest in the field of parallel and distributed data mining in grid environment over the past decade. As an important branch of spatial data mining, spatial outlier mining can be used to find out some interesting and unexpected spatial patterns in many applications. In this paper, a new parallel & distributed spatial outlier mining algorithm (PD-SOM) is proposed to simultaneously detect global and local outliers in a grid environment. PD-SOM is a Delaunay triangulation (D-TIN) based approach, which was encapsulated and deployed in a distributed platform to provide parallel and distributed spatial outlier mining service. Subsequently, a distributed system framework for PD-SOM is designed on top of a geographical knowledge service grid (GeoKSGrid) developed by our research group, a two-step strategy for spatial outlier detection is put forward to support the encapsulation and distributed deployment of the geographical knowledge service, and two key techniques of the geographical knowledge service: parallel and distributed computing of Delaunay triangulation and the implementation of PD-SOM algorithm are discussed. Finally, the efficiency of the spatial outlier mining service is analyzed in theory, the practicality is confirmed by a demonstrative application on the abnormality analyzing of soil geochemical investigation samples from Fujian eastern coastal zone area in China, and the effectiveness and superiority of PD-SOM in a balanced, scalable grid environment are verified through the comparison with the popular spatial outlier mining algorithm SLOM, for the involvement of large amount of computing cores.
Similar content being viewed by others
References
Lu, C.-T., Chen, D., Kou, Y.: Algorithms for spatial outlier detection. In: 2003 Third IEEE International Conference on Data Mining (ICDM’03), pp. 597–600. Melbourne, Nov 19–22 (2003)
Shekhar, S., Chawla, S.: A Tour of Spatial Databases. Prentice Hall, Minneapolis (2001)
Shekhar, S., Zhang, P.: A unified approach to detecting spatial outliers. GeoInformatica 7(2), 139–166 (2003)
Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Futur. Gener. Comput. Syst. 18(8), 1101–1112 (2002)
Jiang, W.-S., Yu, J.-H.: Distributed data mining on the grid. In: 2005 Fourth International Conference on Machine Learning and Cybernetics (ICMLC’2005), pp. 2010–2014. Guangzhou, August 18–21 (2005)
Zaki, M.J.: Parallel and distributed data mining: an introduction. In: Zaki, M., Ho, C.-T. (eds.) Large-Scale Parallel Data Mining, vol. 1759, pp. 804–804. Springer, Berlin / Heidelberg (2000)
Park, B.H., KarguPta, H.: Handbook of Data Mining. In: Ye, N. (ed.) Distributed data mining: algorithms, systems, and applications, pp. 341–358. Lawrence Erlbaum, Hillsdale (2002)
Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(01), 89–93 (2003)
Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: Knowledge Map: toward a new approach supporting the knowledge management in distributed data mining. In: 2007 Third International Conference on Autonomic and Autonomous Systems (ICAS’07), pp. 67–72. Athens, June 19–25 (2007)
Huang, F., Li, Z., Sun, X.: A data mining model in knowledge grid. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM’08), pp. 1–4. Dalian, 12–14 Oct. (2008)
Zhuge, H.: China’s e-science knowledge grid environment. IEEE Intell. Syst. 19(1), 13–17 (2004)
Luc, A.: Local indicators of spatial association-LISA. Geogr. Anal. 27(2), 93–115 (1995)
Lin, J.X., Ye, D.Y., Chen, C.C., Gao, M.: Minimum spanning tree based spatial outlier mining and its applications. In: Lecture Notes in Computer Science: Rough Sets and Knowledge Technology, vol. 5009, pp. 508–515. Springer (2008)
Kou, Y., Lu, C.-T., Chen, B.: Spatial weighted outlier detection. In: 2006 Sixth SIAM International Conference on Data Mining, pp. 614–618. Bethesda, April 20–22 (2006)
Lu, C.-T., Chen, D., Kou, Y.: Detecting spatial outliers with multiple attributes. In: 2003 International Conference on Tools with Artificial Intelligence, pp. 122–128. Sacramento, 3–5 November (2003)
Wang, Z., Wang, S., Hong, T., Wan, X.: A spatial outlier detection algorithm based multi-attributive correlation. In: 2004 International Conference on Machine Learning and Cybernetics (ICMLC 2004), pp. 1727–1732. Shanghai, 26–29 August (2004)
Wang, Z., Li, J., Yu, H., Chen, H.: Research of spatial outlier detection based on quantitative value of attributive correlation. In: 2006 World Congress on Intelligent Control and Automation (WCICA 2006), pp. 5906–5910. Dalian, China, 21–23 June (2006)
Sun, P., Chawla, S.: On local spatial outliers. In: 2004 Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 209–216. Brighton, 1–4 Nov (2004)
Cai, Q., He, H., Man, H.: SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes. In: 2009 International Joint Conference on Neural Networks (IJCNN 2009), pp. 425–431. Atlanta, 14–19 June (2009)
Karmaker, A., Rahman, S.: Outlier detection in spatial databases using clustering data mining. In: 2009 Sixth International Conference on Information Technology: New Generations, pp. 1657–1658. Las Vegas, 27–29 April (2009)
Rasheed, F., Peng, P., Alhajj, R., Rokne, J.: Fourier transform based spatial outlier mining. In: Lecture Notes in Computer Science: Intelligent Data Engineering and Automated Learning (IDEAL 2009), vol. 5788, pp. 317–324 (2009)
Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15(5), 1170–1187 (2003)
Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 17(2), 203–215 (2005)
Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE Trans. Knowl. Data Eng. 18(2), 145–160 (2006)
Grané, A., Veiga, H.: Wavelet-based detection of outliers in financial time series. Comput. Stat. Data Anal. 54(11), 2580–2593 (2010)
Gumedze, F.N., Welham, S.J., Gogel, B.J., R. Thompson: A variance shift model for detection of outliers in the linear mixed model. Comput. Stat. Data Anal. 54(9), 2128–2144 (2010)
Unnikrishnan, N.K.: Bayesian analysis for outliers in survey sampling. Comput. Stat. Data Anal. 54(8), 1962–1974 (2010)
Pappachen James, A., Dimitrijev, S.: Inter-image outliers and their application to image classification. Pattern Recognit. 43(12), 4101–4112 (2010)
Kim, S., Cho, N.W., B. Kang, Kang, S.-H.: Fast outlier detection for very large log data. Expert Syst. Appl. 38(8), 9587–9596 (2011)
Zhang, N., Bao, H.: Research on distributed data mining technology based on grid. In: 2009 First International Workshop on Database Technology and Applications, pp. 440–443. Wuhan, April 25–26 (2009)
Celis, S., Musicant, D.R.: Weka-parallel: machine learning in parallel. Carleton College, CS TR, Northfield 55057: Null (2002)
Talia, D.: Knowledge discovery services and tools on grids. Lecture Notes in Computer Science—Foundations of Intelligent Systems, vol. 2871, pp. 14–23 (2003)
Talia, D., Trunfio, P., Verta, O.: Weka4WS: a WSRF-enabled Weka toolkit for distributed data mining on grids. In: 2005 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 309–320. Porto, October 3–7 (2005)
Talia, D.: Distributed data mining tasks and patterns as services. Lecture Notes in Computer Science: Euro-Par 2008 Workshops—Parallel Processing, vol. 5415, pp. 415–422 (2009)
Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: a toolkit for machine learning on the grid. ERCIM News 59, 47–48 (2004)
Ali, A.S., Rana, O.F., Taylor, I.J.: Web services composition for distributed data mining. In: 2005 International Conference on Parallel Processing Workshops (ICPPW’05), pp. 11–18. Oslo, Norway, June 14–17 (2005)
Ćurčin, V., Ghanem, M., Guo, Y., Köhler, M., Rowe, A., et al.: Discovery net: Towards a grid of knowledge discovery. In: 2002 Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 658–663. Edmonton, Alberta, Canada, July 23–26 (2002)
Brezany, P., Hofer, J., Min Tjoa, A., Wöhrer, A.: Gridminer: an infrastructure for data mining on computational grids. In: 2003 APAC Conference and Exhibition on Advanced Computing, Grid Applications and eResearch, pp. 1–19. Queensland, September 29–October 2 (2003)
Brezany, P., Janciak, I., Wöhrer, A., Min Tjoa, A.: GridMiner: a framework for knowledge discovery on the grid-from a vision to design and implementation. In: 2004 Cracow Grid Workshop, pp. 12–15. Cracow, December 12–15 (2004)
Brezany, P., Janciak, I., Min Tjoa, A.: GridMiner: a fundamental infrastructure for building intelligent grid systems. In: 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 150–156. Compiegne, September 19–22 (2005)
Brezany, P., Janciak, I., Brezanyova, J., Tjoa, A.M.: GridMiner: an advanced grid-based support for brain informatics data mining tasks. In: Lecture Notes in Computer Science: Web Intelligence Meets Brain Informatics, vol. 4845, pp. 353–366 (2007)
Hermes, S., Eduardo, R.H., Fabrício, A.B.S., Liria, M.S., Calebe, P.B., et al.: Inhambu: data mining using idle cycles in clusters of PCs. In: Lecture Notes in Computer Science: Network and Parallel Computing, vol. 3222, pp. 213–220 (2004)
Deng, S., Wang, R., Yang, M.: Distributed data mining based on grid services pool. Chin. J. Electron. 18(02), 220–224 (2009)
Cannataro, M., Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Distributed data mining on grids: services, tools, and applications. IEEE Trans. Syst. Man Cybern.—Part B: Cybern. 34 (06), 2451–2465 (2004)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)
Cesario, E., Mastroianni, C., Talia, D.: A multi-domain architecture for mining frequent items and itemsets from distributed data streams. J. Grid Comput. 12(1), 153–168 (2014)
Hess, A.: GridWeka2. Available: http://www.andreas-hess.info/projects/gridweka2/index.html (2007)
Cuzzocrea, A.: Models and algorithms for high-performance data management and mining on computational grids and clouds. J. Grid Comput. 12(3), 443–445 (2014)
Wang, H., Nie, G., Fu, K.: Distributed data mining based on semantic web and grid. In: 2009 International Conference on Computational Intelligence and Natural Computing (CINC’09), pp. 232–234. Wuhan, June 06–07 (2009)
Kim, S., Kim, J., Weissman, J.B.: A security-enabled grid system for MINDS distributed data mining. J. Grid Comput. 12(3), 521–542 (2014)
Bueti, G., Cansado, A., Talia, D.: Developing distributed data mining applications in the knowledge grid framework. In: Lecture Notes in Computer Science: High Performance Computing for Computational Science (VECPAR 2004), vol. 3402, pp. 156–169 (2005)
Rings, T., Caryer, G., Gallop, J., Grabowski, J., Kovacikova, T., et al.: Grid and cloud computing: opportunities for integration with the next generation network. J. Grid Comput. 7(3), 375–393 (2009)
Cerri, D., Valle, E.D., Marcos, D.D.F., Giunchiglia, F., Naor, D., et al.: Towards knowledge in the cloud. In: On the Move to Meaningful Internet Systems: OTM 2008 Workshops, vol. 5333, pp. 986–995 (2008)
Delic, K.A., Riley, J.A.: Enterprise knowledge clouds: next generation KM systems? In: 2009 International Conference on Information, Process, and Knowledge Management (eKNOW ’09), pp. 49–53. Cancun, 1–7 Feb. (2009)
Maurer, M., Brandic, I., Emeakaroha, V.C., Dustdar, S.: Towards knowledge management in self-adaptable clouds. In: 2010 6th World Congress on Services (SERVICES-1), pp. 527–534. Miami, Florida, USA, 5–10 July (2010)
Huang, J.-W., Lin, S.-C., Chen, M.-S.: DPSP: Distributed progressive sequential pattern mining on the cloud. Adv. Knowl. Discov. Data Min. 6119, 27–34 (2010)
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur. Gener. Comput. Syst. 25(Compendex), 599–616 (2009)
Lin, J.X., Chen, C.C., Wu, X.Z., Wu, J.W., Wang, W.B.: GeoKSGrid: a geographical knowledge grid with functions of spatial data mining and spatial decision. In: 2011 1st IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM 2011), pp. 143–114. Fuzhou, June 29–July 1 (2011)
Wu, X.Z., Chen, C.C.: The design development and application of geographical knowledge service grid portal. In: 2009 17th International Conference on Geoinformatics, pp. 1–7. Fairfax, Aug 12–14 (2009)
Aloisio, G., Cafaro, M.: An introduction to the Globus toolkit. Proc. CERN 13(2000), 117–131 (2000)
Edgewall Software. GDT service generator tutorials. Available: http://mage.uni-marburg.de/trac/gdt/wiki/ServiceGeneratorTutorials
Guibas, L., Stolfi, J.: Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Trans. Graph. 04(04), 74–123 (1985)
Merrett, T.H.: Quad-edge data structures in two and three dimensions. School of Computer Science, McGill University, Montreal (2005)
Chawla, S., Sun, P.: SLOM: a new measure for local spatial outliers. Knowl. Inf. Syst. 09(04), 412–429 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
The first two authors contributed equally to this work.
Rights and permissions
About this article
Cite this article
Chen, C., Lin, J., Wu, X. et al. Parallel and Distributed Spatial Outlier Mining in Grid: Algorithm, Design and Application. J Grid Computing 13, 139–157 (2015). https://doi.org/10.1007/s10723-015-9326-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-015-9326-y