, Volume 97, Issue 4, pp 403–423 | Cite as

MapReduce based location selection algorithm for utility maximization with capacity constraints

  • Yu Sun
  • Jianzhong Qi
  • Rui Zhang
  • Yueguo Chen
  • Xiaoyong Du


Given a set of facility objects and a set of client objects, where each client is served by her nearest facility and each facility is constrained by a service capacity, we study how to find all the locations on which if a new facility with a given capacity is established, the number of served clients is maximized (in other words, the utility of the facilities is maximized). This problem is intrinsically difficult. An existing algorithm with an exponential complexity is not scalable and cannot handle this problem on large data sets. Therefore, we propose to solve the problem through parallel computing, in particular using MapReduce. We propose an arc-based method to divide the search space into disjoint partitions. For load balancing, we propose a dynamic strategy to assign partitions to reducers so that the estimated load difference is within a threshold. We conduct extensive experiments using both real and synthetic data sets of large sizes. The results demonstrate the efficiency and scalability of the algorithm.


Location selection Capacity constraints  MapReduce 

Mathematics Subject Classification



  1. 1.
    Al-Khateeb A, Rashid NA, Abdullah R (2012) An enhanced meta-scheduling system for grid computing that considers the job type and priority. Computing, pp 389–410Google Scholar
  2. 2.
    Dean J, Ghemawat S (2004) Mapreduce: Simplified data processing on large clusters. OSDI, pp 137–150Google Scholar
  3. 3.
    Gufler B, Augsten N, Reiser A, Kemper A (2011) Handling data skew in mapreduce. In: The first international conference on cloud computing and services, science, pp 574–583Google Scholar
  4. 4.
    Gufler B, Augsten N, Reiser A, Kemper A (2012) Load balancing in mapreduce based on scalable cardinality estimates. ICDE, pp 522–533Google Scholar
  5. 5.
    Hale TS, Moberg CR (2003) Location science research: a review. Ann Oper Res 123(1–4):21–35CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Huang J, Wen Z, Pathan M, Taylor K, Xue Y, Zhang R (2011) Ranking locations for facility selection based on potential influences. In: The 37th annual conference on IEEE industrial electronics society, pp 2411–2416Google Scholar
  7. 7.
    Huang J, Wen Z, Qi J, Zhang R, Chen J, He Z (2011) Top-k most influential locations selection. CIKM, pp 2377–2380Google Scholar
  8. 8.
    Huang J, Zhang R, Buyya R, Chen J (2014) Melody-join: efficient earth mover’s distance similarity join using mapreduce. ICDEGoogle Scholar
  9. 9.
    Kahraman C, Ruan D, Doan I (2003) Fuzzy group decision-making for facility location selection. Inf Sci 157:135–153CrossRefMATHGoogle Scholar
  10. 10.
    Klose A, Drexl A (2005) Facility location models for distribution system design. Eur J Oper Res 162(1):4–29CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Kolb L, Thor A, Rahm E (2012) Load balancing for mapreduce-based entity resolution. ICDE, pp 618–629Google Scholar
  12. 12.
    Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. SIGMOD, pp 201–212Google Scholar
  13. 13.
    Kwon Y, Balazinska M, Howe B, Rolia J (2012) Skewtune: mitigating skew in mapreduce applications. SIGMOD, pp 25–36Google Scholar
  14. 14.
    Lu W, Shen Y, Chen S, Ooi BC (2012) Efficient processing of k nearest neighbor joins using mapreduce. Proc. VLDB Endow. 5(10):1016–1027CrossRefGoogle Scholar
  15. 15.
    Melkote S, Daskin MS (2001) Capacitated facility location/network design problems. Eur J Oper Res 129(3):481–495CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Melo M, Nickel S, Saldanha da Gama F (2006) Dynamic multi-commodity capacitated facility location: a mathematical modeling framework for strategic supply chain planning. Comput Oper Res 33(1):181–208CrossRefMATHGoogle Scholar
  17. 17.
    Melo MT, Nickel S, Saldanha-Da-Gama F (2009) Facility location and supply chain management-a review. Eur J Oper Res 196(2):401–412CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Nutanong S, Tanin E, Zhang R (2010) Incremental evaluation of visible nearest neighbor queries. TKDE 22(5):665–681Google Scholar
  19. 19.
    Nutanong S, Zhang R, Tanin E, Kulik L (2010) Analysis and evaluation of v*-knn: an efficient algorithm for moving knn queries. VLDB J 19(3):307–332Google Scholar
  20. 20.
    Qi J, Zhang R, Kulik L, Lin D, Xue Y (2012) The min-dist location selection query. ICDE, pp 366–377Google Scholar
  21. 21.
    Qiao Y, von Bochmann G (2012) Load balancing in peer-to-peer systems using a diffusive approach. Computing, pp 649–678Google Scholar
  22. 22.
    Quan X, Wenyin L, Dou W, Xiong H, Ge Y (2012) Link graph analysis for business site selection. Computer 45(3):64–69CrossRefGoogle Scholar
  23. 23.
    Revelle CS, Eiselt HA, Daskin MS (2008) A bibliography for some fundamental problem categories in discrete location science. Eur J Oper Res 184(3):817–848CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Sun Y, Huang J, Chen Y, Du X, Zhang R (2012) Top-k most incremental location selection with capacity constraint. WAIM, pp 165–171Google Scholar
  25. 25.
    Sun Y, Huang J, Chen Y, Zhang R, Du X (2012) Location selection for utility maximization with capacity constraints. CIKM, pp 2154–2158Google Scholar
  26. 26.
    Tao Y, Lin W, Xiao X (2013) Minimal mapreduce algorithms. SIGMODGoogle Scholar
  27. 27.
    Mouratidis LHUK, Yiu ML, Mamoulis N (2010) Optimal matching between spatial datasets under capacity constraints. TODS 35(2):9:1–9:44Google Scholar
  28. 28.
    Wong RC-W, Özsu MT, Fu AW-C, Yu PS, Liu L, Liu Y (2011) Maximizing bichromatic reverse nearest neighbor for l p -norm in two- and three-dimensional spaces. VLDB J 20(6):893–919CrossRefGoogle Scholar
  29. 29.
    Wong RC-W, Tao Y, Fu AW-C, Xiao X (2007) On efficient spatial matching. VLDB, pp 579–590Google Scholar
  30. 30.
    Xia T, Zhang D, Kanoulas E, Du Y (2005) On computing top-t most influential spatial sites. VLDB, pp 946–957Google Scholar
  31. 31.
    Yan D, Wong RC-W, Ng W (2011) Efficient methods for finding influential locations with adaptive grids. CIKM, pp 1475–1484Google Scholar
  32. 32.
    Yu C, Zhang R, Huang Y, Xiong H (2010) High-dimensional knn joins with incremental updates. GeoInformatica 14(1):55–82CrossRefGoogle Scholar
  33. 33.
    Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. KDD, pp 186–194Google Scholar
  34. 34.
    Zhan L, Zhang Y, Zhang W, Lin X (2012) Finding top k most influential spatial facilities over uncertain objects. CIKM, pp 922–931Google Scholar
  35. 35.
    Zhang D, Du Y, Xia T, Tao Y (2006) Progressive computation of the min-dist optimal-location query. VLDB, pp 643–654Google Scholar
  36. 36.
    Zhang R, Jagadish HV, Dai BT, Ramamohanarao K (2010) Optimized algorithms for predictive range and knn queries on moving objects. Inf Syst 35(8):911–932CrossRefGoogle Scholar
  37. 37.
    Zheng K, Huang Z, zhou A, Zhou X (2012) Discovering the most influential sites over uncertain data: a rank-based approach. TKDE 24(12):2156–2169Google Scholar
  38. 38.
    Zhou Z, Wu W, Li X, Lee ML, Hsu W (2011) Maxfirst for maxbrknn. ICDE, pp 828–839Google Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  • Yu Sun
    • 1
  • Jianzhong Qi
    • 1
  • Rui Zhang
    • 1
  • Yueguo Chen
    • 2
  • Xiaoyong Du
    • 2
  1. 1.Department of Computing and Information SystemsUniversity of MelbourneMelbourneAustralia
  2. 2.Key Laboratory of Data Engineering and Knowledge EngineeringRenmin University of ChinaBeijingChina

Personalised recommendations