Advertisement

Knowledge and Information Systems

, Volume 52, Issue 2, pp 411–443 | Cite as

FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applications

  • Tao Li
  • Chunqiu Zeng
  • Wubai Zhou
  • Wei Xue
  • Yue Huang
  • Zheng Liu
  • Qifeng Zhou
  • Bin Xia
  • Qing Wang
  • Wentao Wang
  • Xiaolong Zhu
Regular Paper

Abstract

The advent of Big Data era drives data analysts from different domains to use data mining techniques for data analysis. However, performing data analysis in a specific domain is not trivial; it often requires complex task configuration, onerous integration of algorithms, and efficient execution in distributed environments. Few efforts have been paid on developing effective tools to facilitate data analysts in conducting complex data analysis tasks. In this paper, we design and implement FIU-Miner, a Fast, Integrated, and User-friendly system to ease data analysis. FIU-Miner allows users to rapidly configure a complex data analysis task without writing a single line of code. It also helps users conveniently import and integrate different analysis programs. Further, it significantly balances resource utilization and task execution in heterogeneous environments. Case studies of real-world applications demonstrate the efficacy and effectiveness of our proposed system.

Keywords

Feature Selection Inventory Management Feature Selection Method Data Mining Algorithm Runtime Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We would like to thank the following former members of Knowledge Discovery Research Group (KDRG) at FIU: Dr. Li Zheng, Dr. Lei Li, Dr. Yexi Jiang, Dr. Liang Tang, Dr. Chao Shen, and Dr. Jingxuan Li, for their contributions to the FIU-Miner project. We would also like to thank the High Performance Database Research Center at FIU for the cooperation on spatial data analysis. This project was partially supported by the National Science Foundation under Grants HRD-0833093, CNS-1126619, IIS-1213026, and CNS-1461926, the US Department of Homeland Security’s VACCINE Center under Award Number 2009-ST-061-CI0001, Nanjing University of Posts and Telecommunications under Grants NY214135 and NY215045, Scientific and Technological Support Project (Society) of Jiangsu Province No. BE2016776, Chinese National Natural Science Foundation under Grant 91646116, and an FIU Dissertation Year Fellowship.

References

  1. 1.
    Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):93–115CrossRefGoogle Scholar
  2. 2.
    Belz R, Mertens P (1996) Combining knowledge-based systems and simulation to solve rescheduling problems. Decis Support Syst 17(2):141–157CrossRefGoogle Scholar
  3. 3.
    Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca RatonGoogle Scholar
  4. 4.
    Chang C-C, Lin Chih-Jen (2011) Libsvm: a library for support vector machines. TIST 2(3):27CrossRefGoogle Scholar
  5. 5.
    Chen Injazz J (2001) Planning for ERP systems: analysis and future trend. Bus Process Manag J 7(5):374–386CrossRefGoogle Scholar
  6. 6.
    Chen W-C, Tseng S-S, Wang Ching-Yao (2005) A novel manufacturing defect detection method using association rule mining techniques. Exp Syst Appl 29(4):807–815CrossRefGoogle Scholar
  7. 7.
    Davis Chad A, Gerick Fabian, Hintermair Volker, Friedel Caroline C, Fundel Katrin, Küffner Robert, Zimmer Ralf (2006) Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19):2356–2363CrossRefGoogle Scholar
  8. 8.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232Google Scholar
  9. 9.
    Groger C, Niedermann F, Schwarz H, Mitschang B (2012) Supporting manufacturing design by analytics, continuous collaborative process improvement enabled by the advanced manufacturing analytics platform. In: CSCWD, pp 793–799. IEEEGoogle Scholar
  10. 10.
    Gröger C, Niedermann F, Mitschang B (2012) Data mining-driven manufacturing process optimization. Proc World Congr Eng 3:4–6Google Scholar
  11. 11.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD explorations newsletter 11(1):10–18Google Scholar
  12. 12.
    Jiang Y, Perng C-S, Sailer A, Silva-Lepe I, Zhou Yang, Li Tao (2016) CSM: a cloud service marketplace for complex service acquisition. ACM TIST 8(1):8Google Scholar
  13. 13.
    Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116CrossRefGoogle Scholar
  14. 14.
    Li H, Calder CA, Cressie N (2007) Beyond Moran’s I: testing for spatial dependence based on the spatial autoregressive model. Geogr Anal 39(4):357–375CrossRefGoogle Scholar
  15. 15.
    Lei L, Wei P, Saurabh K, Tong S, Tao L (2015) Recommending users and communities in social media. ACM Trans Knowl Discov Data 10(2):17:1–17:27Google Scholar
  16. 16.
    Li L, Shen C, Wang L, Zheng L, Jiang Y, Tang L, Li H, Zhang L, Zeng C, Li T, Tang J, Liu D (2014) Iminer: mining inventory data for intelligent management. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM ’14, pp 2057–2059, New York, ACMGoogle Scholar
  17. 17.
    Liu H, Motoda H (2008) Computational methods of feature selection. Chapman & Hall, LondonzbMATHGoogle Scholar
  18. 18.
    Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. In: SIGKDD, pp 567–576. ACMGoogle Scholar
  19. 19.
    Lu Y, Zhang M, Li T, Guang Y, Rishe N (2013) Online spatial data analysis and visualization system. In: Proceedings of the ACM SIGKDD workshop on interactive data exploration and analytics, pp 71–78. ACMGoogle Scholar
  20. 20.
  21. 21.
  22. 22.
    Oh S, Han J, Cho H (2001) Intelligent process control system for quality improvement by data mining in the process industry. In: Dan B (ed) Data mining for design and manufacturing, pp 289–309. Springer, BerlinGoogle Scholar
  23. 23.
    Owen S, Anil R, Dunning T, Friedman E (2011) Mahout in action. Manning, New YorkGoogle Scholar
  24. 24.
    Pang-Ning T, Steinbach M, Kumar V et al (2006) Introduction to data mining. Pearson Education, USAGoogle Scholar
  25. 25.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE PAMI 27(8):1226–1238Google Scholar
  26. 26.
    Pindyck RS, Rubinfeld DL (1998) Econometric models and economic forecasts. Irwin and McGraw-Hill, New YorkGoogle Scholar
  27. 27.
    Prekopcsak Z, Makrai G, Henk T, Gaspar-Papanek C (2011) Radoop: analyzing big data with rapidminer and hadoop. In: RCOMMGoogle Scholar
  28. 28.
    Rasmussen CE (2006) Gaussian processes for machine learning. MIT Press, CambridgeGoogle Scholar
  29. 29.
    Shen L, Francis EHT, Liangsheng Q, Yudi S (2000) Fault diagnosis using rough sets theory. Comput Ind 43(1):61–72CrossRefGoogle Scholar
  30. 30.
    Skormin VA, Gorodetski VI, Popyack LJ (2002) Data mining technology for failure prognostic of avionics. TAES 38(2):388–403Google Scholar
  31. 31.
    Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Education, USAGoogle Scholar
  32. 32.
    Tao L, Chunqiu Z, Wubai Z, Qifeng Z, Li Z (2015) Data mining in the era of big data: from the application perspective. Big Data Res 1(4):1–24Google Scholar
  33. 33.
    Topchy A, Jain AK, Punch W (2004) A mixture model of clustering ensembles. In: SDM, pp 379–390. doi: 10.1137/1.9781611972740.35
  34. 34.
    Unger DA, van den Dool H, O’Lenic E, Collins D (2009) Ensemble regression. Month Weather Rev 137(7):2365–2379CrossRefGoogle Scholar
  35. 35.
    Woznica A, Nguyen P, Kalousis A (2012) Model mining for robust feature selection. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining ACM, New YorkGoogle Scholar
  36. 36.
    Yu L, Zheng J, Wu B, Wang B, Shen C, Qian L, Zhang R (2012) Bc-pdm: data mining, social network analysis and text mining system based on cloud computing. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1496–1499). ACM, New YorkGoogle Scholar
  37. 37.
    Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 803–811. ACM, New YorkGoogle Scholar
  38. 38.
    Zeng C, Jiang Y, Zheng L, Li J, Li L, Li H, Shen C, Zhou W, Li T, Duan B, Lei M, Wang P (2013) FIU-Miner: international conference on knowledge discovery and data mining, pp 1506–1509Google Scholar
  39. 39.
    Zeng C, Li H, Wang H, Guang Y, Liu C, Li T, Zhang M, Chen S-C, Rishe N (2014) Optimizing online spatial data analysis with sequential query patterns. In: Joshi J, Bertino E, Thuraisingham BM, Liu L (eds) IRI, pp 253–260. IEEEGoogle Scholar
  40. 40.
    Zhang M, Wang H, Lu Y, Li T, Guang Y, Liu C, Edrosa E, Li H, Rishe N (2015) Terrafly geocloud: an online spatial data analysis and visualization system. ACM Trans Intell Syst Technol 6(3):34:1–34:24Google Scholar
  41. 41.
    Zheng L, Shen C, Tang L, Zeng C, Li T, Luis S, Chen S-C (2013) Data mining meets the needs of disaster information management. IEEE Trans Hum-Mach Syst 43(5):451–464CrossRefGoogle Scholar
  42. 42.
    Zheng L, Zeng C, Li L, Jiang Y, Xue W, Li J, Shen C, Zhou W, Li H, Tang L, Li T, Duan B, Lei M, Wang P (2014) Applying data mining techniques to address critical process optimization needs in advanced manufacturing. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, pp 1739–1748, New York, ACMGoogle Scholar
  43. 43.
    Zipkin PH (2000) Foundations of inventory management, vol 2Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Tao Li
    • 1
    • 2
  • Chunqiu Zeng
    • 1
  • Wubai Zhou
    • 1
  • Wei Xue
    • 1
  • Yue Huang
    • 2
  • Zheng Liu
    • 2
  • Qifeng Zhou
    • 3
  • Bin Xia
    • 1
  • Qing Wang
    • 1
  • Wentao Wang
    • 1
  • Xiaolong Zhu
    • 1
  1. 1.School of Computing and Information SciencesFlorida International UniversityMiamiUS
  2. 2.School of Computer ScienceNanjing University of Posts and TelecommunicationsNanjingChina
  3. 3.Automation DepartmentXiamen UniversityXiamenChina

Personalised recommendations