Abstract
It is widely common that mobile applications collect non-critical personally identifiable information (PII) from users’ devices to the cloud by application service providers (ASPs) in a positive manner to provide precise and recommending services. Meanwhile, Internet service providers (ISPs) or local network providers also have strong requirements to collect PIIs for finer-grained traffic control and security services. However, it is a challenge to locate PIIs accurately in the massive data of network traffic just like looking a needle in a haystack. In this paper, we address this challenge by presenting an efficient and light-weight approach, namely TPII, which can locate and track PIIs from the HTTP layer rebuilt from raw network traffics. This approach only collects three features from HTTP fields as users’ behaviors and then establishes a tree-based decision model to dig PIIs efficiently and accurately. Without any priori knowledge, TPII can identify any types of PIIs from any mobile applications, which has a broad vision of applications. We evaluate the proposed approach on a real dataset collected from a campus network with more than 13k users. The experimental results show that the precision and recall of TPII are 91.72% and 94.51% respectively and a parallel implementation of TPII can achieve 213 million records digging and labelling within one hour, reaching near to support 1Gbps wire-speed inspection in practice. Our approach provides network service providers a practical way to collect PIIs for better services.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Falahrastegar M, Haddadi H, Uhlig S, Mortier R. Tracking personal identifiers across the Web. In: Proceedings of International Conference on Passive and Active Network Measurement. 2016, 30–41
Felt A P, Ha E, Egelman S, Haney A, Chin E, Wagner D. Android permissions: user attention, comprehension, and behavior. In: Proceedings of the 8th Symposium on Usable Privacy and Security 2012, 1–14
Liu Y B, Gummadi K P, Krishnamurthy B, Mislove A. Analyzing facebook privacy settings:user expectations vs. reality. In: Proceedings of ACM Sigcomm Conference on Internet Measurement Conference. 2011, 61–70
Krishnamurthy B, Wills C E. On the leakage of personally identifiable information via online social networks. In: Proceedings of ACM Workshop on Online Social Networks. 2009, 7–12
Krishnamurthy B, Wills C E. Privacy diffusion on the Web: a longitudinal perspective. In: Proceedings of the 18th International Conference on World Wide Web. 2009, 541–550
Krishnamurthy B, Naryshkin K, Wills C E. Privacy leakage vs. protection measures: the growing disconnect. In: Proceedings of the Web Workshop on Security & Privacy 2011, 2–11
Roesner F, Kohno T, Wetherall D. Detecting and defending against third-party tracking on the web. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 12
Felt A P, Chin E, Hanna S, Song D, Wagner D. Android permissions demystified. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011, 17–21
Bartel A, Klein J, Traon Y L, Monperrus M. Automatically securing permission-based software by reducing the attack surface: an application to android. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 274–277
Au K W Y, Zhou Y F, Huang Z, Lie D. Pscout: analyzing the android permission specification. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 217–228
Atzeni A, Su T, Baltatu M, D’Alessandro R. How dangerous is your android app? An evaluation methodology. In: Proceedings of the 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. 2014, 130–139
Jeon J, Micinski K K, Vaughan J A, Fogel A, Reddy N, Foster J S. Dr. Android and Mr. Hide: fine-grained permissions in android application. In: Proceedings of ACM Workshop on Security and Privacy in Smartphones and Mobile Devices. 2012, 3–14
Backes M, Gerling S, Hammer C, Maffei M, Styp-Rekowsky P V. App-Guard — fine-grained policy enforcement for untrusted android applications. In: Proceedings of International Workshop on Data Privacy Management and Autonomous Spontaneous Security. 2013, 213
Xu R, Sadi H, Anderson R J. Aurasium: practical policy enforcement for android applications. In: Proceedings of Usenix Conference on Security Symposium. 2012, 27
Sun M, Tan G. Nativeguard: protecting android applications from third-party native libraries. In: Proceedings of the 2014 ACM Conference on Security and Privacy in Wireless & Mobile Networks. 2014, 165–176
Gerber P, Volkamer M, Renaud K. Usability versus privacy instead of usable privacy: Google’s balancing act between usability and privacy. ACM Sigcas Computers & Society, 2015, 45(1): 16–21
Schwartz E J, Avgerinos T, Brumley D. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: Proceedings of IEEE Symposium on Security and Privacy 2010, 317–331
Cheng W, Ports D R K, Blankstein A, Cowling J. Abstractions for usable information flow control in aeolus. In: Proceedings of USENIX Annual Technical Conference. 2012, 139–151
Gibler C, Crussell J, Erickson J, Hao C. AndroidLeaks: automatically detecting potential privacy leaks in android applications on a large scale. In: Proceedings of International Conference on Trust and Trustworthy Computing. 2012, 291–307
Lu L, Li Z, Wu Z, Lee W, Jiang G. Chex: statically vetting android apps for component hijacking vulnerabilities. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. 2012, 229–240
Bichhawat A, Rajani V, Garg D, Hammer C. Information flow control in WebKit’s JavaScript bytecode. In: Proceedings of International Conference on Principles of Security and Trust. 2014, 159–178
Efstathopoulos P, Krohn M, Vandebogart S, Frey C, Ziegler D, Kohler E. Labels and event processes in the asbestos operating system. ACM Transactions on Computer Systems, 2005, 39(5): 17–30
Zeldovich N, Boyd-Wickizer S, Kohler E, Mazieres D. Making information flow explicit in HiStar. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 263–278
Enck W, Gilbert P, Chun B G, Cox L P, Jung J, Mcdaniel P, Sheth A. TaintDroid: an information flow tracking system for real-time privacy monitoring on smartphones. ACM Transactions on Computer Systems, 2010, 32(2): 1–29
Arzt S, Rasthofer S, Fritz C, Bodden E, Bartel A, Klein J, Traon Y L, Octeau D, Mcdaniel P. Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. ACM Sigplan Notices, 2014, 49(6): 259–269
King D, Hicks B, Hicks M, Jaeger T. Implicit flows: can’t live with’ em, can’t live without’ em. In: Proceedings of International Conference on Information Systems Security. 2008, 56–70
Vallina-Rodriguez N, Shah J, Finamore A, Grunenberger Y, Papagiannaki K, Haddadi H. Breaking for commercials: characterizing mobile advertising. In: Proceedings of ACM Conference on Internet Measurement Conference. 2012, 343–356
Gill P, Erramilli V, Chaintreau A, Krishnamurthy B, Rodriguez P. Follow the money: understanding economics of online aggregation and advertising. In: Proceedings of the 2013 Conference on Internet Measurement Conference. 2013, 141–148
Ren J, Lindorfer M, Lindorfer M, Legout A, Choffnes D. Recon: revealing and controlling PII leaks in mobile network traffic. In: Proceedings of the 14th International Conference on Mobile Systems, Applications, and Services. 2016, 361–374
Liu Y, Song H H, Bermudez I, Mislove A, Baldi M, Tongaonkar A. Identifying personal information in internet traffic. In: Proceedings of ACM Conference on Online Social Networks. 2015, 59–70
Xia N, Song H H, Liao Y, Iliofotou M, Nucci A, Zhang Z L. Mosaic: quantifying privacy leakage in mobile networks. Computer Communication Review, 2013, 43(4): 279–290
Lee S, Wong E L, Goel D, Dahlin M, Shmatikov V. πBox: a platform for privacy-preserving apps. In: Proceedings of the 10th Usenix Conference on Networked Systems Design and Implementation. 2013, 501–514
Herbster R, Dellatorre S, Druschel P, Bhattacharjee B. Privacy capsules: preventing information leaks by mobile apps. In: Proceedings of International Conference on Mobile Systems, Applications, and Services. 2016, 399–411
Song Y, Hengartner U. Privacyguard: a VPN-based platform to detect information leakage on android devices. In: Proceedings of the 5th ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices. 2015, 15–26
Le A, Varmarken J, Langhoff S, Shuba A, Gjoka M, Markopoulou A. AntMonitor: a system for monitoring from mobile devices. In: Proceedings of ACM SIGCOMM Workshop on Crowdsourcing and Crowdsharing of Big Data. 2015, 15–20
Razaghpanah A, Vallinarodriguez N, Sundaresan S, Kreibich C, Gill P, Allman M. Haystack: a multi-purpose mobile vantage point in user space. Computer Science, 2015, 1–15
Xu Q, Erman J, Gerber A, Mao Z M, Pang J, Venkataraman S. Identifying diverse usage behaviors of smartphone apps. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement Conference. 2011, 329–344
Falaki H, Lymberopoulos D, Mahajan R, Kandula S, Estrin D. A first look at traffic on smartphones. In: Proceedings of ACM SIGCOMM Conference on Internet Measurement. 2010, 281–287
Lindorfer M, Neugschwandtner M, Weichselbaum L, Fratantonio Y, Veen V V D, Platzer C. Andrubis — 1,000,000 apps later: a view on current android malware behaviors. In: Proceedings of the 3rd International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. 2014, 3–17
Mccallister E, Grance T, Scarfone K A. SP 800–122. Guide to protecting the confidentiality of personally identifiable information (PII). Washington: National Institute of Standards & Technology, 2010
Johnson L A, Dempsey K L, Bailey D. SP800-128. Guide for security-focused configuration management of information systems. Journal of Dairy Science, 2011, 77(6): 1604–1617
Greene S S. Security Program and Policies: Principles and Practices. Pearson Education, 2014, 349
Dai S, Tongaonkar A, Wang X, Nucci A, Song D. NetworkProfiler: towards automatic fingerprinting of Android apps. In: Proceedings of IEEE INFOCOM. 2013, 809–817
Han S, Jung J, Wetherall D. A study of third-party tracking by mobile apps in the wild. University of Washington: Technical Report UW-CSE-12-03-01, 2012
Acknowledgements
The work was supported by the National Natural Science Foundation of China (Grant Nos. 61672101, U1636119, 61866038, 61962059), and 2018 College Students’ Innovation and Entrepreneurship Training Program (D2018127). The authors declare that they have no conflicts of interest regarding the publication of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yi Liu is a PhD candidate of Computer Science of Beijing Institute of Technology, China. He received a MS degree from Xi-dian University, China in 2010. He is now working at the network information center of Yan’an University, China. His research interests include network information security, network traffic analysis, and privacy protection on network.
Tian Song is an associate professor of Computer Science of Beijing Institute of Technology, China. He obtained his PhD degree from Tsinghua University, China in 2008. His research interests include network content security, next generation internet, and computer architecture.
Lejian Liao is a professor in School of Computer Sciences, Beijing Institute of Technology, China. He got his PhD degree in 1994 and MS degree in 1988 respectively from Institute of Computing Technology, Chinese Academy of Sciences, China. His current academic interest includes Web intelligence, semantic computing, ontology engineering, and constraint-based technologies. He has published more than 100 academic papers as first author or co-author.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Liu, Y., Song, T. & Liao, L. TPII: tracking personally identifiable information via user behaviors in HTTP traffic. Front. Comput. Sci. 14, 143801 (2020). https://doi.org/10.1007/s11704-018-7451-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-018-7451-z