On the Use of Social Trajectory-Based Clustering Methods for Public Transport Optimization

  • Jordi NinEmail author
  • David Carrera
  • Daniel Villatoro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8313)


Public transport optimisation is becoming everyday a more difficult and challenging task, because of the increasing number of transportation options as well as the increase of users. Many research contributions about this issue have been recently published under the umbrella of the smart cities research. In this work, we sketch a possible framework to optimize the tourist bus in the city of Barcelona. Our framework will extract information from Twitter and other web services, such as Foursquare to infer not only the most visited places in Barcelona, but also the trajectories and routes that tourist follow. After that, instead of using complex geospatial or trajectory clustering methods, we propose to use simpler clustering techniques as \(k\)-means or DBScan but using a real sequence of symbols as a distance measure to incorporate in theclustering process the trajectory information.


Smart cities Geospatial clustering Metric spaces OSA distance Cloud computing High performance computing 



This work is partially supported by the Ministry of Science and Technology of Spain under contract TIN2012-34557 and by the BSC-CNS Severo Ochoa program (SEV-2011-00067) and with the support of ACC1Ó, the Catalan Agency to promote applied research and innovation; and by the Spanish Centre for Development of Industrial Technology under the INNPRONTA program, project IPT-20111006, “CIUDAD2020”.


  1. 1.
    Achtert, E., Kriegel, H.-P., Zimek, A.: ELKI: a software system for evaluation of subspace clustering algorithms. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 580–585. Springer, Heidelberg (2008)Google Scholar
  2. 2.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998)CrossRefGoogle Scholar
  3. 3.
    Apache Software Foundation. Hadoop Distributed File System (HDFS) Architecture.Google Scholar
  4. 4.
    Berghel, H., Roach, D.: An extension of Ukkonen’s enhanced dynamic programming asm algorithm. ACM Trans. Inf. Syst. 14(1), 94–106 (1996)CrossRefGoogle Scholar
  5. 5.
    Chávez, E., Navarro, G., Baeza-yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33, 273–321 (1999)CrossRefGoogle Scholar
  6. 6.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)CrossRefzbMATHGoogle Scholar
  7. 7.
    de Montjoye, Y.A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the crowd: the privacy bounds of human mobility. Sci. R. 3, 1376 (2013)Google Scholar
  8. 8.
    Dong, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  9. 9.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)Google Scholar
  10. 10.
    Fitch, B.G., Rayshubskiy, A., Pitman, M.C., Christopher Ward, T.J., Germain, R.S.: Using the active storage fabrics model to address petascale storage challenges. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW ’09, pp. 47–54. ACM, New York (2009)Google Scholar
  11. 11.
    Fitzpatrick, B.: Distributed caching with memcached. Linux J. 124, 5 (2004)Google Scholar
  12. 12.
    Gabrielli, L., Rinzivillo, S., Ronzano, F., Villatoro, D.: From tweets to semantic trajectories: mining anomalous urban mobility patterns. In: Proceedings European Conference on Complex Systems ECCS: Barcelona. Springer, Spain (2013)Google Scholar
  13. 13.
    GeoNames. GeoNames geographical database. (2010). Accessed July 2013
  14. 14.
    Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003)CrossRefGoogle Scholar
  15. 15.
    Gómez-Alonso, C., Valls, A.: A similarity measure for sequences of categorical data based on the ordering of common elements. In: Torra, V., Narukawa, Y. (eds.) MDAI 2008. LNCS (LNAI), vol. 5285, pp. 134–145. Springer, Heidelberg (2008)Google Scholar
  16. 16.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  17. 17.
    Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 26(2), 147–160 (1950)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Herranz, J., Nin, J., Solé, M.: Optimal symbol alignment distance: a new distance for sequences of symbols. IEEE Trans. Knowl. Data Eng. (TKDE) 23(10), 1541–1554 (2011)CrossRefGoogle Scholar
  19. 19.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  20. 20.
    Jardine, N., Sibson, R.: The construction of hierarchic and non-hierarchic classifications. Comput. J. 11(2), 177–184 (1968)CrossRefzbMATHGoogle Scholar
  21. 21.
    Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84, 414–420 (1989)CrossRefGoogle Scholar
  22. 22.
    Lakshman, A., Malik, P.: Cassandra: a structured storage system on a p2p network. In: Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’09, p. 47. ACM, New York (2009)Google Scholar
  23. 23.
    Lee, J.G., Han, J., Li, X., Gonzalez, H.: Traclass: trajectory classification using hierarchical regionbased and trajectorybased clustering. In: ACM Very Large Data Base (VLDB) (2008)Google Scholar
  24. 24.
    Lee, J.G., Han, J., Whang, K.Y.: Trajectory clustering algorithms. In: International Conference on Management of Data (SIGMOD), pp. 593–604 (2007)Google Scholar
  25. 25.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys. Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  26. 26.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
  28. 28.
  29. 29.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  30. 30.
    Noulas, A., Scellato, S., Lambiotte, R., Pontil, M., Mascolo, C.: A tale of many cities: universal patterns in human urban mobility. PloS One 7(5), e37027 (2012)CrossRefGoogle Scholar
  31. 31.
    Ristad, E., Yianilos, P.: Learning string edit distance. IEEE Trans. Pattern Recogn. Mach. Intell. 20(5), 522–532 (1998)CrossRefGoogle Scholar
  32. 32.
    Selllers, P.: The theory and computation of evolutionary distances: pattern recognition. J. Algorithms 1(4), 359–373 (1980)CrossRefMathSciNetGoogle Scholar
  33. 33.
    Tung, A.K.H., Han, J., Lakshmanan, L.V.S., Ng, R.T.: Constraint-based clustering in large databases. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 405–419. Springer, Heidelberg (2000)Google Scholar
  34. 34.
    Ukkonen, E.: On approximate string matching. In: Karpinski, M. (ed.) FCT 1983. LNCS, vol. 158, pp. 487–495. Springer, Heidelberg (1983)Google Scholar
  35. 35.
    Villatoro, D., Serna, J., Rodríguez, V., Torrent-Moreno, M.: The TweetBeat of the city: microblogging used for discovering behavioural patterns during the MWC2012. In: Nin, J., Villatoro, D. (eds.) CitiSens 2012. LNCS, vol. 7685, pp. 43–56. Springer, Heidelberg (2013)Google Scholar
  36. 36.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Wang, X., Hamilton, H.J.: Dbrs: a density- based spatial clustering method with random sampling. In: 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 563–575 (2003)Google Scholar
  38. 38.
    Wang, X., Wang, J.: Using clustering methods in geospatial information systems. Geomatica 64(3), 347–361 (2010)Google Scholar
  39. 39.
    Zheng, Y., Zhang, L., Xie, X., Ma, W.-Y.: Mining interesting locations and travel sequences from gps trajectories. In: Proceedings of the 18th International Conference on World Wide Web, pp. 791–800. ACM, New York (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Barcelona Supercomputing Center (BSC)Universitat Politècnica de Catalunya (BarcelonaTech)BarcelonaSpain
  2. 2.Barcelona Digital Technology CentreBarcelonaSpain

Personalised recommendations