Pairing Users in Social Media via Processing Meta-data from Conversational Files
- 608 Downloads
Massive amounts of data today are being generated from users engaging on social media. Despite knowing that whatever they post on social media can be viewed, downloaded and analyzed by unauthorized entities, a large number of people are still willing to compromise their privacy today. On the other hand though, this trend may change. Improved awareness on protecting content on social media, coupled with governments creating and enforcing data protection laws, mean that in the near future, users may become increasingly protective of what they share. Furthermore, new laws could limit what data social media companies can use without explicit consent from users. In this paper, we present and address a relatively new problem in privacy-preserved mining of social media logs. Specifically, the problem here is the feasibility of deriving the topology of network communications (i.e., match senders and receivers in a social network), but with only meta-data of conversational files that are shared by users, after anonymizing all identities and content. More explicitly, if users are willing to share only (a) whether a message was sent or received, (b) the temporal ordering of messages and (c) the length of each message (after anonymizing everything else, including usernames from their social media logs), how can the underlying topology of sender-receiver patterns be generated. To address this problem, we present a Dynamic Time Warping based solution that models the meta-data as a time series sequence. We present a formal algorithm and interesting results in multiple scenarios wherein users may or may not delete content arbitrarily before sharing. Our performance results are very favorable when applied in the context of Twitter. Towards the end of the paper, we also present interesting practical applications of our problem and solutions. To the best of our knowledge, the problem we address and the solution we propose are unique, and could provide important future perspectives on learning from privacy-preserving mining of social media logs.
KeywordsSocial media Privacy Big-data Meta-data Dynamic Time Warping
This work was supported in part by US National Science Foundation (Grant # 1718071). Any opinions, findings and conclusions are those of the authors alone, and do not reflect views of the funding agency.
- 1.Melis, L., Song, C., De Cristofaro, E., Shmatikov, V.: Exploiting unintended feature leakage in collaborative learning. arXiv preprint arXiv:1805.04049 (2018)
- 2.Hunt, T., Song, C., Shokri, R., Shmatikov, V., Witchel, E.: Chiron: privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961 (2018)
- 3.Song, C., Ristenpart, T., Shmatikov, V.: Machine learning models that remember too much. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 587–601. ACM (2017)Google Scholar
- 4.Bost, R., Minaud, B., Ohrimenko, O.: Forward and backward private searchable encryption from constrained cryptographic primitives. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1465–1482. ACM (2017)Google Scholar
- 5.Demertzis, I., Papamanthou, C.: Fast searchable encryption with tunable locality. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1053–1067. ACM (2017)Google Scholar
- 8.Benton, A., Arora, R., Dredze, M.: Learning multiview embeddings of twitter users. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 14–19 (2016)Google Scholar
- 10.Vatsalan, D., Christen, P.: Privacy-preserving matching of similar patients. J. Biomed. Inform. 59, 285–298 (2016). https://doi.org/10.1016/j.jbi.2015.12.004. http://www.sciencedirect.com/science/article/pii/S1532046415002841CrossRefGoogle Scholar
- 11.Randall, S.M., Ferrante, A.M., Boyd, J.H., Bauer, J.K., Semmens, J.B.: Privacy-preserving record linkage on large real world datasets. J. Biomed. Inform. 50, 205–212 (2014). https://doi.org/10.1016/j.jbi.2013.12.003. http://www.sciencedirect.com/science/article/pii/S1532046413001949. Special Issue on Informatics Methods in Medical PrivacyCrossRefGoogle Scholar
- 12.Chi, Y., Hong, J., Jurek, A., Liu, W., O’Reilly, D.: Privacy preserving record linkage in the presence of missing values. Inf. Syst. 71, 199–210 (2017). https://doi.org/10.1016/j.is.2017.07.001. http://www.sciencedirect.com/science/article/pii/S030643791630504XCrossRefGoogle Scholar
- 14.SerrÃ, J., Arcos, J.L.: An empirical evaluation of similarity measures for time series classification. Knowl.-Based Syst. 67, 305–314 (2014). https://doi.org/10.1016/j.knosys.2014.04.035. http://www.sciencedirect.com/science/article/pii/S0950705114001658CrossRefGoogle Scholar
- 17.Senin, P.: Dynamic time warping algorithm review. Inf. Comput. Sci. 855(1–23), 40 (2008). Department University of Hawaii at Manoa Honolulu, USAGoogle Scholar