Abstract
Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asfoor, H., et al.: Computing fuzzy rough approximations in large scale information systems. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 9–16. IEEE (2014)
Asfoor, H.M.: Fuzzy rough set approximations in large scale information systems. Master’s thesis, University of Washington (2015)
Baldi, P., Cranmer, K., Faucett, T., Sadowski, P., Whiteson, D.: Parameterized neural networks for high-energy physics. Eur. Phys. J. C 76(5) (2016). Article number 235
Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 4308 (2014)
Cattral, R., Oppacher, F., Deugo, D.: Evolutionary data mining with automatic rule generalization. Recent Adv. Comput. Comput. Commun. 1(1), 296–300 (2002)
Dua, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 17(2–3), 191–209 (1990)
Hu, Q., Zhang, L., Zhou, Y., Pedrycz, W.: Large-scale multimodality attribute reduction with multi-kernel fuzzy rough sets. IEEE Trans. Fuzzy Syst. 26(1), 226–238 (2018)
Jensen, R., Cornelis, C.: A new approach to fuzzy-rough nearest neighbour classification. In: Chan, C.-C., Grzymala-Busse, J.W., Ziarko, W.P. (eds.) RSCTC 2008. LNCS (LNAI), vol. 5306, pp. 310–319. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88425-5_32
Jensen, R., Mac Parthaláin, N.: Towards scalable fuzzy-rough feature selection. Inf. Sci. 323, 1–15 (2015)
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media Inc, Newton (2015)
Maillo, J., Luengo, J., GarcÃa, S., Herrera, F., Triguero, I.: Exact fuzzy k-nearest neighbor classification for big datasets. In: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–6. IEEE (2017)
Maillo, J., RamÃrez, S., Triguero, I., Herrera, F.: kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst. 117, 3–15 (2017)
Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015)
Ramentol, E., et al.: IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification. IEEE Trans. Fuzzy Syst. 23(5), 1622–1637 (2015)
Verbiest, N., Cornelis, C., Herrera, F.: OWA-FRPS: a prototype selection method based on ordered weighted average fuzzy rough set theory. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS (LNAI), vol. 8170, pp. 180–190. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41218-9_19
Verbiest, N., Cornelis, C., Jensen, R.: Fuzzy rough positive region based nearest neighbour classification. In: 2012 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7. IEEE (2012)
Vluymans, S., et al.: Distributed fuzzy rough prototype selection for big data regression. In: 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), pp. 1–6. IEEE (2015)
Vluymans, S., Fernández, A., Saeys, Y., Cornelis, C., Herrera, F.: Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl. Inf. Syst. 56(1), 55–84 (2018)
Vluymans, S., Sánchez Tarragó, D., Saeys, Y., Cornelis, C., Herrera, F.: Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recognit. 53, 36–45 (2016)
Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
Zeng, A., Li, T., Hu, J., Chen, H., Luo, C.: Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values. Inf. Sci. 378, 363–388 (2017)
Zeng, A., Li, T., Liu, D., Zhang, J., Chen, H.: A fuzzy rough set approach for incremental feature selection on hybrid information systems. Fuzzy Sets Syst. 258, 39–60 (2015)
Acknowledgement
The research reported in this paper was conducted with the financial support of the Odysseus programme of the Research Foundation – Flanders (FWO). D. Peralta is a Postdoctoral Fellow of the Research Foundation – Flanders (FWO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lenz, O.U., Peralta, D., Cornelis, C. (2019). A Scalable Approach to Fuzzy Rough Nearest Neighbour Classification with Ordered Weighted Averaging Operators. In: Mihálydeák, T., et al. Rough Sets. IJCRS 2019. Lecture Notes in Computer Science(), vol 11499. Springer, Cham. https://doi.org/10.1007/978-3-030-22815-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-22815-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22814-9
Online ISBN: 978-3-030-22815-6
eBook Packages: Computer ScienceComputer Science (R0)