MobilityMirror: Bias-Adjusted Transportation Datasets

  • Luke RodriguezEmail author
  • Babak Salimi
  • Haoyue Ping
  • Julia Stoyanovich
  • Bill Howe
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 926)


We describe customized synthetic datasets for publishing mobility data. Companies are providing new transportation modalities, and their data is of high value for integrative transportation research, policy enforcement, and public accountability. However, these companies are disincentivized from sharing data not only to protect the privacy of individuals (drivers and/or passengers), but also to protect their own competitive advantage. Moreover, demographic biases arising from how the services are delivered may be amplified if released data is used in other contexts.

We describe a model and algorithm for releasing origin-destination histograms that removes selected biases in the data using causality-based methods. We compute the origin-destination histogram of the original dataset then adjust the counts to remove undesirable causal relationships that can lead to discrimination or violate contractual obligations with data owners. We evaluate the utility of the algorithm on real data from a dockless bike share program in Seattle and taxi data in New York, and show that these adjusted transportation datasets can retain utility while removing bias in the underlying data.


  1. 1.
    Amazon doesn’t consider the race of its customers. should it? Bloomberg (2016)Google Scholar
  2. 2.
    Acs, G., Castelluccia, C., Chen, R.: Differentially private histogram publishing through lossy compression. In: 2012 IEEE 12th International Conference on Data Mining (ICDM), pp. 1–10. IEEE (2012)Google Scholar
  3. 3.
    Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: risk assessments in criminal sentencing. ProPublica, 23 May 2016Google Scholar
  4. 4.
    Barocas, S., Selbst, A.: Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016)Google Scholar
  5. 5.
    Brauneis, R., Goodman, E.P.: Algorithmic transparency for the smart city. Yale J. Law Technol., forthcomingGoogle Scholar
  6. 6.
    Brock, A.M., et al.: SIG: making maps accessible and putting accessibility in maps. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, p. SIG03. ACM (2018)Google Scholar
  7. 7.
    Chen, R., Fung, B.C., Yu, P.S., Desai, B.C.: Correlated network data publication via differential privacy. VLDB J. 23(4), 653–676 (2014)CrossRefGoogle Scholar
  8. 8.
    Cormode, G., Procopiuc, M., Srivastava, D., Tran, T.T.: Differentially private publication of sparse data. arXiv preprint arXiv:1103.0825 (2011)
  9. 9.
    Datta, A., Sen, S., Zick, Y.: Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: IEEE SP, pp. 598–617 (2016)Google Scholar
  10. 10.
    Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings. PoPETs 2015(1), 92–112 (2015)Google Scholar
  11. 11.
    Day, W.-Y., Li, N.: Differentially private publishing of high-dimensional data using sensitivity control. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pp. 451–462. ACM (2015)Google Scholar
  12. 12.
    de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)CrossRefGoogle Scholar
  13. 13.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). Scholar
  14. 14.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006). Scholar
  15. 15.
    Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM, New York (2015)Google Scholar
  16. 16.
    Ferris, B., Watkins, K., Borning, A.: OneBusAway: results from providing real-time arrival information for public transit. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1807–1816. ACM (2010)Google Scholar
  17. 17.
    Galhotra, S., Brun, Y., Meliou, A.: Fairness testing: testing software for discrimination. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, 4–8 September 2017, pp. 498–510 (2017)Google Scholar
  18. 18.
    Ge, Y., Knittel, C.R., MacKenzie, D., Zoepf, S.: Racial and gender discrimination in transportation network companies. Working Paper 22776, National Bureau of Economic Research, October 2016Google Scholar
  19. 19.
    Ghosh, A., Roughgarden, T., Sundararajan, M.: Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41(6), 1673–1693 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3(1–2), 1021–1032 (2010)CrossRefGoogle Scholar
  21. 21.
    Kilbertus, N., Carulla, M.R., Parascandolo, G., Hardt, M., Janzing, D., Schölkopf, B.: Avoiding discrimination through causal reasoning. In: Advances in Neural Information Processing Systems, pp. 656–666 (2017)Google Scholar
  22. 22.
    Kirkpatrick, K.: It’s not the algorithm, it’s the data. Commun. ACM 60(2), 21–23 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, 26–30 April 2010, pp. 571–580 (2010)Google Scholar
  24. 24.
    Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)Google Scholar
  25. 25.
    Lu, W., Miklau, G., Gupta, V.: Generating private synthetic databases for untrusted system evaluation. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 652–663. IEEE (2014)Google Scholar
  26. 26.
    Ma, S., Zheng, Y., Wolfson, O.: Real-time city-scale taxi ridesharing. IEEE Trans. Knowl. Data Eng. 27, 1782–1795 (2015)CrossRefGoogle Scholar
  27. 27.
    Markovsky, I.: Low Rank Approximation: Algorithms, Implementation, Applications. Springer, Heidelberg (2011). Scholar
  28. 28.
    McFarland, D.A., McFarland, H.R.: Big data and the danger of being precisely inaccurate. Big Data Soc. 2(2), 2053951715602495 (2015)CrossRefGoogle Scholar
  29. 29.
    Meng, X., Li, H., Cui, J.: Different strategies for differentially private histogram publication. J. Commun. Inf. Netw. 2(3), 68–77 (2017)CrossRefGoogle Scholar
  30. 30.
    MetroLab Network. First, do no harm: Ethical guidelines for applying predictive tools within human services (2018, forthcoming).
  31. 31.
    Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 2018, p. 1931. NIH Public Access (2018)Google Scholar
  32. 32.
    NYC Taxi and Limousine Commission. TLC trip record data (2018). Accessed 2 June 2018
  33. 33.
    Rastogi, V., Nath, S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 735–746. ACM (2010)Google Scholar
  34. 34.
    Rubin, D.B.: Causal inference using potential outcomes: design, modeling, decisions. J. Am. Stat. Assoc. 100(469), 322–331 (2005)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Salimi, B., Gehrke, J., Suciu, D.: Bias in OLAP queries: detection, explanation, and removal. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1021–1035. ACM (2018)Google Scholar
  36. 36.
    Sweeney, L.: Discrimination in online Ad delivery. Commun. ACM 56(5), 44–54 (2013)CrossRefGoogle Scholar
  37. 37.
    Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. IEEE Trans. Knowl. Data Eng. 23(8), 1200–1214 (2011)CrossRefGoogle Scholar
  38. 38.
    Xiao, Y., Xiong, L., Fan, L., Goryczka, S.: Dpcube: differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358 (2012)
  39. 39.
    Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G., Winslett, M.: Differentially private histogram publication. VLDB J. 22(6), 797–822 (2013)CrossRefGoogle Scholar
  40. 40.
    Zemel, R.S., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: ICML, pp. 325–333 (2013)Google Scholar
  41. 41.
    Zhang, Y., Thomas, T., Brussel, M., van Maarseveen, M.: Expanding bicycle-sharing systems: lessons learnt from an analysis of usage. PLoS One 11(12), e0168604 (2016)CrossRefGoogle Scholar
  42. 42.
    Zliobaite, I.: Measuring discrimination in algorithmic decision making. Data Min. Knowl. Discov. 31(4), 1060–1089 (2017)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Luke Rodriguez
    • 1
    Email author
  • Babak Salimi
    • 1
  • Haoyue Ping
    • 2
  • Julia Stoyanovich
    • 3
  • Bill Howe
    • 1
  1. 1.University of WashingtonSeattleUSA
  2. 2.Drexel UniversityPhiladelphiaUSA
  3. 3.New York UniversityNew York CityUSA

Personalised recommendations