Skip to main content
Log in

Parallel and Distributed Spatial Outlier Mining in Grid: Algorithm, Design and Application

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

There is an increasing interest in the field of parallel and distributed data mining in grid environment over the past decade. As an important branch of spatial data mining, spatial outlier mining can be used to find out some interesting and unexpected spatial patterns in many applications. In this paper, a new parallel & distributed spatial outlier mining algorithm (PD-SOM) is proposed to simultaneously detect global and local outliers in a grid environment. PD-SOM is a Delaunay triangulation (D-TIN) based approach, which was encapsulated and deployed in a distributed platform to provide parallel and distributed spatial outlier mining service. Subsequently, a distributed system framework for PD-SOM is designed on top of a geographical knowledge service grid (GeoKSGrid) developed by our research group, a two-step strategy for spatial outlier detection is put forward to support the encapsulation and distributed deployment of the geographical knowledge service, and two key techniques of the geographical knowledge service: parallel and distributed computing of Delaunay triangulation and the implementation of PD-SOM algorithm are discussed. Finally, the efficiency of the spatial outlier mining service is analyzed in theory, the practicality is confirmed by a demonstrative application on the abnormality analyzing of soil geochemical investigation samples from Fujian eastern coastal zone area in China, and the effectiveness and superiority of PD-SOM in a balanced, scalable grid environment are verified through the comparison with the popular spatial outlier mining algorithm SLOM, for the involvement of large amount of computing cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Lu, C.-T., Chen, D., Kou, Y.: Algorithms for spatial outlier detection. In: 2003 Third IEEE International Conference on Data Mining (ICDM’03), pp. 597–600. Melbourne, Nov 19–22 (2003)

  2. Shekhar, S., Chawla, S.: A Tour of Spatial Databases. Prentice Hall, Minneapolis (2001)

    Google Scholar 

  3. Shekhar, S., Zhang, P.: A unified approach to detecting spatial outliers. GeoInformatica 7(2), 139–166 (2003)

    Article  Google Scholar 

  4. Cannataro, M., Talia, D., Trunfio, P.: Distributed data mining on the grid. Futur. Gener. Comput. Syst. 18(8), 1101–1112 (2002)

    Article  MATH  Google Scholar 

  5. Jiang, W.-S., Yu, J.-H.: Distributed data mining on the grid. In: 2005 Fourth International Conference on Machine Learning and Cybernetics (ICMLC’2005), pp. 2010–2014. Guangzhou, August 18–21 (2005)

  6. Zaki, M.J.: Parallel and distributed data mining: an introduction. In: Zaki, M., Ho, C.-T. (eds.) Large-Scale Parallel Data Mining, vol. 1759, pp. 804–804. Springer, Berlin / Heidelberg (2000)

    Chapter  Google Scholar 

  7. Park, B.H., KarguPta, H.: Handbook of Data Mining. In: Ye, N. (ed.) Distributed data mining: algorithms, systems, and applications, pp. 341–358. Lawrence Erlbaum, Hillsdale (2002)

    Google Scholar 

  8. Cannataro, M., Talia, D.: The knowledge grid. Commun. ACM 46(01), 89–93 (2003)

    Article  Google Scholar 

  9. Le-Khac, N.-A., Aouad, L.M., Kechadi, M.-T.: Knowledge Map: toward a new approach supporting the knowledge management in distributed data mining. In: 2007 Third International Conference on Autonomic and Autonomous Systems (ICAS’07), pp. 67–72. Athens, June 19–25 (2007)

  10. Huang, F., Li, Z., Sun, X.: A data mining model in knowledge grid. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM’08), pp. 1–4. Dalian, 12–14 Oct. (2008)

  11. Zhuge, H.: China’s e-science knowledge grid environment. IEEE Intell. Syst. 19(1), 13–17 (2004)

    Article  Google Scholar 

  12. Luc, A.: Local indicators of spatial association-LISA. Geogr. Anal. 27(2), 93–115 (1995)

    MATH  Google Scholar 

  13. Lin, J.X., Ye, D.Y., Chen, C.C., Gao, M.: Minimum spanning tree based spatial outlier mining and its applications. In: Lecture Notes in Computer Science: Rough Sets and Knowledge Technology, vol. 5009, pp. 508–515. Springer (2008)

  14. Kou, Y., Lu, C.-T., Chen, B.: Spatial weighted outlier detection. In: 2006 Sixth SIAM International Conference on Data Mining, pp. 614–618. Bethesda, April 20–22 (2006)

  15. Lu, C.-T., Chen, D., Kou, Y.: Detecting spatial outliers with multiple attributes. In: 2003 International Conference on Tools with Artificial Intelligence, pp. 122–128. Sacramento, 3–5 November (2003)

  16. Wang, Z., Wang, S., Hong, T., Wan, X.: A spatial outlier detection algorithm based multi-attributive correlation. In: 2004 International Conference on Machine Learning and Cybernetics (ICMLC 2004), pp. 1727–1732. Shanghai, 26–29 August (2004)

  17. Wang, Z., Li, J., Yu, H., Chen, H.: Research of spatial outlier detection based on quantitative value of attributive correlation. In: 2006 World Congress on Intelligent Control and Automation (WCICA 2006), pp. 5906–5910. Dalian, China, 21–23 June (2006)

  18. Sun, P., Chawla, S.: On local spatial outliers. In: 2004 Fourth IEEE International Conference on Data Mining (ICDM ’04), pp. 209–216. Brighton, 1–4 Nov (2004)

  19. Cai, Q., He, H., Man, H.: SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes. In: 2009 International Joint Conference on Neural Networks (IJCNN 2009), pp. 425–431. Atlanta, 14–19 June (2009)

  20. Karmaker, A., Rahman, S.: Outlier detection in spatial databases using clustering data mining. In: 2009 Sixth International Conference on Information Technology: New Generations, pp. 1657–1658. Las Vegas, 27–29 April (2009)

  21. Rasheed, F., Peng, P., Alhajj, R., Rokne, J.: Fourier transform based spatial outlier mining. In: Lecture Notes in Computer Science: Intelligent Data Engineering and Automated Learning (IDEAL 2009), vol. 5788, pp. 317–324 (2009)

  22. Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng. 15(5), 1170–1187 (2003)

    Article  Google Scholar 

  23. Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 17(2), 203–215 (2005)

    Article  MathSciNet  Google Scholar 

  24. Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE Trans. Knowl. Data Eng. 18(2), 145–160 (2006)

    Article  Google Scholar 

  25. Grané, A., Veiga, H.: Wavelet-based detection of outliers in financial time series. Comput. Stat. Data Anal. 54(11), 2580–2593 (2010)

    Article  MATH  Google Scholar 

  26. Gumedze, F.N., Welham, S.J., Gogel, B.J., R. Thompson: A variance shift model for detection of outliers in the linear mixed model. Comput. Stat. Data Anal. 54(9), 2128–2144 (2010)

    Article  MATH  Google Scholar 

  27. Unnikrishnan, N.K.: Bayesian analysis for outliers in survey sampling. Comput. Stat. Data Anal. 54(8), 1962–1974 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  28. Pappachen James, A., Dimitrijev, S.: Inter-image outliers and their application to image classification. Pattern Recognit. 43(12), 4101–4112 (2010)

    Article  MATH  Google Scholar 

  29. Kim, S., Cho, N.W., B. Kang, Kang, S.-H.: Fast outlier detection for very large log data. Expert Syst. Appl. 38(8), 9587–9596 (2011)

    Article  Google Scholar 

  30. Zhang, N., Bao, H.: Research on distributed data mining technology based on grid. In: 2009 First International Workshop on Database Technology and Applications, pp. 440–443. Wuhan, April 25–26 (2009)

  31. Celis, S., Musicant, D.R.: Weka-parallel: machine learning in parallel. Carleton College, CS TR, Northfield 55057: Null (2002)

  32. Talia, D.: Knowledge discovery services and tools on grids. Lecture Notes in Computer Science—Foundations of Intelligent Systems, vol. 2871, pp. 14–23 (2003)

  33. Talia, D., Trunfio, P., Verta, O.: Weka4WS: a WSRF-enabled Weka toolkit for distributed data mining on grids. In: 2005 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 309–320. Porto, October 3–7 (2005)

  34. Talia, D.: Distributed data mining tasks and patterns as services. Lecture Notes in Computer Science: Euro-Par 2008 Workshops—Parallel Processing, vol. 5415, pp. 415–422 (2009)

  35. Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: a toolkit for machine learning on the grid. ERCIM News 59, 47–48 (2004)

    Google Scholar 

  36. Ali, A.S., Rana, O.F., Taylor, I.J.: Web services composition for distributed data mining. In: 2005 International Conference on Parallel Processing Workshops (ICPPW’05), pp. 11–18. Oslo, Norway, June 14–17 (2005)

  37. Ćurčin, V., Ghanem, M., Guo, Y., Köhler, M., Rowe, A., et al.: Discovery net: Towards a grid of knowledge discovery. In: 2002 Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 658–663. Edmonton, Alberta, Canada, July 23–26 (2002)

  38. Brezany, P., Hofer, J., Min Tjoa, A., Wöhrer, A.: Gridminer: an infrastructure for data mining on computational grids. In: 2003 APAC Conference and Exhibition on Advanced Computing, Grid Applications and eResearch, pp. 1–19. Queensland, September 29–October 2 (2003)

  39. Brezany, P., Janciak, I., Wöhrer, A., Min Tjoa, A.: GridMiner: a framework for knowledge discovery on the grid-from a vision to design and implementation. In: 2004 Cracow Grid Workshop, pp. 12–15. Cracow, December 12–15 (2004)

  40. Brezany, P., Janciak, I., Min Tjoa, A.: GridMiner: a fundamental infrastructure for building intelligent grid systems. In: 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 150–156. Compiegne, September 19–22 (2005)

  41. Brezany, P., Janciak, I., Brezanyova, J., Tjoa, A.M.: GridMiner: an advanced grid-based support for brain informatics data mining tasks. In: Lecture Notes in Computer Science: Web Intelligence Meets Brain Informatics, vol. 4845, pp. 353–366 (2007)

  42. Hermes, S., Eduardo, R.H., Fabrício, A.B.S., Liria, M.S., Calebe, P.B., et al.: Inhambu: data mining using idle cycles in clusters of PCs. In: Lecture Notes in Computer Science: Network and Parallel Computing, vol. 3222, pp. 213–220 (2004)

  43. Deng, S., Wang, R., Yang, M.: Distributed data mining based on grid services pool. Chin. J. Electron. 18(02), 220–224 (2009)

    Google Scholar 

  44. Cannataro, M., Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Distributed data mining on grids: services, tools, and applications. IEEE Trans. Syst. Man Cybern.—Part B: Cybern. 34 (06), 2451–2465 (2004)

    Article  Google Scholar 

  45. Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)

    Article  Google Scholar 

  46. Cesario, E., Mastroianni, C., Talia, D.: A multi-domain architecture for mining frequent items and itemsets from distributed data streams. J. Grid Comput. 12(1), 153–168 (2014)

    Article  Google Scholar 

  47. Hess, A.: GridWeka2. Available: http://www.andreas-hess.info/projects/gridweka2/index.html (2007)

  48. Cuzzocrea, A.: Models and algorithms for high-performance data management and mining on computational grids and clouds. J. Grid Comput. 12(3), 443–445 (2014)

    Article  Google Scholar 

  49. Wang, H., Nie, G., Fu, K.: Distributed data mining based on semantic web and grid. In: 2009 International Conference on Computational Intelligence and Natural Computing (CINC’09), pp. 232–234. Wuhan, June 06–07 (2009)

  50. Kim, S., Kim, J., Weissman, J.B.: A security-enabled grid system for MINDS distributed data mining. J. Grid Comput. 12(3), 521–542 (2014)

    Article  Google Scholar 

  51. Bueti, G., Cansado, A., Talia, D.: Developing distributed data mining applications in the knowledge grid framework. In: Lecture Notes in Computer Science: High Performance Computing for Computational Science (VECPAR 2004), vol. 3402, pp. 156–169 (2005)

  52. Rings, T., Caryer, G., Gallop, J., Grabowski, J., Kovacikova, T., et al.: Grid and cloud computing: opportunities for integration with the next generation network. J. Grid Comput. 7(3), 375–393 (2009)

    Article  Google Scholar 

  53. Cerri, D., Valle, E.D., Marcos, D.D.F., Giunchiglia, F., Naor, D., et al.: Towards knowledge in the cloud. In: On the Move to Meaningful Internet Systems: OTM 2008 Workshops, vol. 5333, pp. 986–995 (2008)

  54. Delic, K.A., Riley, J.A.: Enterprise knowledge clouds: next generation KM systems? In: 2009 International Conference on Information, Process, and Knowledge Management (eKNOW ’09), pp. 49–53. Cancun, 1–7 Feb. (2009)

  55. Maurer, M., Brandic, I., Emeakaroha, V.C., Dustdar, S.: Towards knowledge management in self-adaptable clouds. In: 2010 6th World Congress on Services (SERVICES-1), pp. 527–534. Miami, Florida, USA, 5–10 July (2010)

  56. Huang, J.-W., Lin, S.-C., Chen, M.-S.: DPSP: Distributed progressive sequential pattern mining on the cloud. Adv. Knowl. Discov. Data Min. 6119, 27–34 (2010)

    Article  Google Scholar 

  57. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur. Gener. Comput. Syst. 25(Compendex), 599–616 (2009)

    Article  Google Scholar 

  58. Lin, J.X., Chen, C.C., Wu, X.Z., Wu, J.W., Wang, W.B.: GeoKSGrid: a geographical knowledge grid with functions of spatial data mining and spatial decision. In: 2011 1st IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM 2011), pp. 143–114. Fuzhou, June 29–July 1 (2011)

  59. Wu, X.Z., Chen, C.C.: The design development and application of geographical knowledge service grid portal. In: 2009 17th International Conference on Geoinformatics, pp. 1–7. Fairfax, Aug 12–14 (2009)

  60. Aloisio, G., Cafaro, M.: An introduction to the Globus toolkit. Proc. CERN 13(2000), 117–131 (2000)

    Google Scholar 

  61. Edgewall Software. GDT service generator tutorials. Available: http://mage.uni-marburg.de/trac/gdt/wiki/ServiceGeneratorTutorials

  62. Guibas, L., Stolfi, J.: Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Trans. Graph. 04(04), 74–123 (1985)

    Article  Google Scholar 

  63. Merrett, T.H.: Quad-edge data structures in two and three dimensions. School of Computer Science, McGill University, Montreal (2005)

  64. Chawla, S., Sun, P.: SLOM: a new measure for local spatial outliers. Knowl. Inf. Syst. 09(04), 412–429 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaxiang Lin.

Additional information

The first two authors contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, C., Lin, J., Wu, X. et al. Parallel and Distributed Spatial Outlier Mining in Grid: Algorithm, Design and Application. J Grid Computing 13, 139–157 (2015). https://doi.org/10.1007/s10723-015-9326-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-015-9326-y

Keywords

Navigation