Implementation of scalable fuzzy relational operations in MapReduce
One of the main restrictions of relational database models is their lack of support for flexible, imprecise and vague information in data representation and querying. The imprecision is pervasive in human language; hence, modeling imprecision is crucial for any system that stores and processes linguistic data. Fuzzy set theory provides an effective solution to model the imprecision inherent in the meaning of words and propositions drawn from natural language (Zadeh, Inf Control 8(3):338–353, doi: 10.1016/S0019-9958(65)90241-X, 1965; IGI Global, https://books.google.com/books?id=nt-WBQAAQBAJ, 2013). Several works in the last 20 years have used fuzzy set theory to extend relational database models to permit representation and retrieval of imprecise data. However, to our knowledge, such approaches have not been designed to scale-up to very large datasets. In this paper, the MapReduce framework is used to implement flexible fuzzy queries on a large-scale dataset. We develop MapReduce algorithms to enhance the standard relational operations with fuzzy conditional predicates expressed in natural language.
KeywordsRelational operations Fuzzy set theory MapReduce Fuzzy queries
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This work does not contain any studies with human participants or animals performed by any of the authors.
- Afrati FN, Sarma AD, Menestrina D, Parameswaran A, Ullman JD (2012) Fuzzy joins using mapreduce. In: 2012 IEEE 28th international conference on data engineering (ICDE). IEEE, pp 498–509Google Scholar
- Atta F, Viglas SD, Niazi S (2011) Sand join: a skew handling join algorithm for google’s mapreduce framework. In: 2011 IEEE 14th international multitopic conference (INMIC), pp 170–175. doi: 10.1109/INMIC.2011.6151466
- Elmeleegy K, Olston C, Reed B (2014) Spongefiles: mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data. SIGMOD ’14, pp. 551–562. ACM, New York. doi: 10.1145/2588555.2595634
- Galindo J (2005) Fuzzy databases: modeling, design and implementation: modeling, design and implementation IGI GlobalGoogle Scholar
- Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. In: 2012 IEEE 28th international conference on data engineering (ICDE), pp 522–533. doi: 10.1109/ICDE.2012.58
- Hassan MAH, Bamha M, Loulergue F (2014) Handling data-skew effects in join operations using mapreduce. Proc Comput Sci 29:145–158. doi: 10.1016/j.procs.2014.05.014. 2014 International conference on computational science
- Klir GJ, Clair UHS, Yuan B (1997) Fuzzy set theory: foundations and applications. Prentice Hall. https://books.google.com/books?id=DNxQAAAAMAAJ
- Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. SIGMOD ’12, pp. 25–36. ACM, New York. doi: 10.1145/2213836.2213840
- Ma ZM, Yan L (2010) A literature overview of fuzzy conceptual data modeling. J Inf Sci Eng 26(2):427–441Google Scholar
- Ma ZM, Zhang WJ, Ma WY (2000) Semantic measure of fuzzy data in extended possibility-based fuzzy relational databases. Int J Intell Syst 15(8):705–716. doi: 10.1002/1098-111X(200008)15:8705::AID-INT23.0.CO;2-4 CrossRefMATHGoogle Scholar
- Petry FE (ed) (1997) Fuzzy databases: principles and applications. Kluwer Academic Publishers, NorwellGoogle Scholar
- Ramakrishnan SR, Swart G, Urmanov A (2012) Balancing reducer skew in mapreduce workloads using progressive sampling. In: Proceedings of the 3rd ACM symposium on cloud computing. SoCC ’12, pp 16–11614. ACM, New York. doi: 10.1145/2391229.2391245
- US (2016) Department of transportation. Online; accessed 23 Feb 2016. https://www.transportation.gov/
- Vasant P (2013) Handbook of research on novel soft computing intelligent algorithms: theory and practical applications. Advances in computational intelligence and robotics (ACIR) book series. IGI Global. https://books.google.com/books?id=nt-WBQAAQBAJ
- Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp. 495–506. ACMGoogle Scholar
- Wang Y, Metwally A, Parthasarathy S (2013) Scalable all-pairs similarity search in metric spaces. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’13, pp. 829–837. ACM, New York. doi: 10.1145/2487575.2487625
- Zhang C, Li J, Wu L, Lin M, Liu W (2012) Sej: an even approach to multiway theta-joins using mapreduce. In: 2012 Second international conference on cloud and green computing (CGC), pp 73–80. doi: 10.1109/CGC.2012.9