## Abstract

This paper proposes a machine learning application to identify mobile phone users suspected of involvement in criminal activities. The application characterizes the behavioral patterns of suspect users versus non-suspect users based on usage metadata such as call duration, call distribution, interaction time preferences and text-to-call ratios while avoiding any access to the content of calls or messages. The application is based on targeted Bayesian network learning method. It generates a graphical network that can be used by domain experts to gain intuitive insights about the key features that can help identify suspect users. The method enables experts to manage the trade-off between model complexity and accuracy using information theory metrics. Unlike other graphical Bayesian classifiers, the proposed application accomplishes the task required of a security company, namely an accurate suspect identification rate (recall) of at least 50% with no more than a 1% false identification rate. The targeted Bayesian network learning method is also used for additional tasks such as anomaly detection, distinction between “relevant” and “irrelevant” anomalies, and for associating anonymous telephone numbers with existing users by matching behavioral patterns.

This is a preview of subscription content, access via your institution.

## References

- 1.
Ben-Akiva, M., Bierlaire, M.: Discrete choice methods and their applications to short term travel decisions. In: Handbook of Transportation Science, pp. 5–33. Springer, New York (1999)

- 2.
Ben-Dov, M., Wu, W., Feldman, R., Cairns, P.A.: Improving Knowledge Discovery by Combining Text-Mining & Link Analysis Techniques. Lake Buena Vista, Florida: Workshop on Link Analysis, Counter-terrorism, and Privacy, in conjunction with SIAM International Conference on Data Mining (2004)

- 3.
Ben-Gal, I.: Bayesian networks. In: Ruggeri, F., Faltin, F., Kenett, R. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, New Jersey (2007)

- 4.
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

- 5.
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. 2002. Stat. Sci.

**17**, 235 (2002) - 6.
Bouchard, M., Joffres, K., Frank, R.: Preliminary analytical considerations in designing a terrorism and extremism online network extractor. In: Computational Models of Complex Systems, pp. 171–184. Springer International Publishing (2014)

- 7.
Boulton, G.: Open your minds and share your results. Nature

**486**(7404), 441–441 (2012) - 8.
Burns K.: In US cities, open data is not just nice to have; it’s the norm. The Guardian, 21 Oct 2013. www.theguardian.com/local-government-network/2013/oct/21/open-data-us-san-francisco (2013)

- 9.
Chickering, D.M., Geiger, D., Heckerman, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn.

**20**, 197–243 (1995) - 10.
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Trans. Pattern Anal. Mach. Intell.

**17–7**, 641–650 (1995) - 11.
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT-14, 462–467 (1968)

- 12.
Claeskens, N.L., Hjort, G.: The focused information criterion. J. Am. Stat. Assoc.

**98**, 900–945 (2003) - 13.
De Montjoye, Y.A., Radaelli, L., Singh, V.K.: Unique in the shopping mall: On the reidentifiability of credit card metadata. Science

**347**(622), 536–539 (2015) - 14.
Duda, R.R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

- 15.
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn.

**29**, 131–163 (1997) - 16.
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng.

**2**(4), 42–47 (2012) - 17.
Grau, J., Ben-Gal, I., Posch, S., Grosse, I.: VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res

**34**(suppl 2), W529–W533 (2006) - 18.
Gruber, A., Ben-Gal, I.: Efficient Bayesian network learning for optimization in systems engineering. Qual. Technol. Quant. Manag.

**9–1**, 97–114 (2012) - 19.
Heckerman, D.: A tutorial on learning with Bayesian networks.: MS TR-95-06 (1995)

- 20.
Jensen, D., Rattigan, M., Blau, H.: Information awareness: a prospective technical assessment. In: Proceedings of SIGKDD03 , pp. 378–387 (2003)

- 21.
Kelner, K., Lerner, B.: Learning Bayesian network classifiers by risk minimization. Int. J. Approx. Reason.

**53**, 248–272 (2012) - 22.
Kreykes, B.D.: Data mining and counter-terrorism: the use of telephone records as an investigatory tool in the war on terror. ISJLP

**4**, 431 (2008) - 23.
Marturana, F., Tacconi, S.: A machine learning-based triage methodology for automated categorization of digital media. Digit. Investig.

**10**, 193–204 (2013) - 24.
Mayer, J., Mutchler, P., Mitchell, J.C.: Evaluating the privacy properties of telephone metadata. Proc. Nat. Acad. Sci.

**113**(20), 5536–5541 (2016) - 25.
Mena, J.: Homeland security techniques and technologies. Charles River Media

**198**(254), 262–263 (2007) - 26.
Meng, G., Dan, L., Ni-hong, W., Li-chen, L.: A network intrusion detection model based on K-means algorithm and information entropy. Int. J. Secur. Appl.

**8**(6), 285–294 (2014) - 27.
Nhauo, D., Sung-Ryul, K.: Classification of malicious domain names using support vector machine and Bi-gram method. Int. J. Secur. Appl.

**7**(1) January, 51 (2013) - 28.
Ng, A., Jordan, M.: On discriminative versus generative classifiers: a comparison of logistic regression and naive Bayes. Adv Neural Inf. Process. Syst. v2. pp. 841–848 (2002)

- 29.
Pearl, J.J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)

- 30.
Pearl, J.J.: Causality: Models, Reasoning, and Inference. University Press, Cambridge (2000)

- 31.
Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-based fraud detection research. www.bsys.monash.edu.au/people/cphua (2005)

- 32.
Shmueli, E., Tassa, T.W., Shapira, B., Rokach, L.: Data mining for software trustworthiness. Inf. Sci.

**191**, 98–127 (2012) - 33.
Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: DARPA Information Survivability Conference and Exposition, 2000. DISCEX’00. IEEE Proceedings, vol. 2, pp. 130–144. (2000)

- 34.
Williamson, J.J.: Approximating discrete probability distributions with Bayesian networks. Hobart Tasmani, Proceedings of the International Conference on Artificial Intelligence in Science and Technology (2000)

- 35.
Van Renesse, R., Birman, K., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring management, and data mining. ACM Trans. Comput. Syst.

**21**, 164–206 (2003) - 36.
Zhu, D., Premkumar, G., Zhang, X., Chu, C.H.: Data mining for network intrusion detection: a comparison of alternative methods. Decis. Sci.

**32**, 635–660 (2001)

## Acknowledgements

This research was partially supported by the Israeli Chief Scientist Magneton program no. 44596, “Target-Based Bayesian Network Modeling for Homeland Security applications” (Principle Investigator: Prof. Irad Ben-Gal). We are grateful for the support of our colleagues from the industry in this project, as well as for Shai Yanovski’s participation in the project.

## Author information

### Affiliations

### Corresponding author

## Appendix: the TBNL algorithm

### Appendix: the TBNL algorithm

The TBNL algorithm (for more details see [18]) uses a recursive procedure that can be applied to any node that represents a variable. The procedure, called *AddParents* (described below) adds edges from candidate nodes to the node to which the procedure is currently applied: each time, it adds the edge from the node with the highest Information Gain (IG) value. Essentially, *AddParents* is a greedy, forward feature selection procedure, which is similar to the feature selection scheme used by the *adding-arrows* principle [34]. The main difference is that the TBNL algorithm starts with the class variable and then proceeds recursively to the selected parent nodes. In particular, the TBNL algorithm starts by applying the *AddParents* procedure to the target node to select its parents. Then, *AddParents* is applied to each parent sequentially to select its own parents from the set of the target node’s parents. Thus, any node in the network can be a parent of the target node (i.e., corresponding to a limited form of a Markov blanket) while still maintaining the DAG structure. The input parameters of the *AddParents* procedure are as follows: \(X_i\) represents the current node; \(\varvec{\mathrm {T}}_i\) represents the set of the candidate parents of \(X_i\); \(\varvec{\mathrm {C}}\) represents the set of arbitrary constraints on the network such as the number of permitted parameters; \(\eta _i\) represents a constraint of the maximum allowed MI concerning \(X_i\); and \(\beta _i\) represents the minimum IG “step size” when adding a parent to \(X_i\) in the network. After applying the *AddParents* procedure to node \(X_i\), the output is the set of parent nodes \(\varvec{\mathrm {Z}}_i\) if one of the following conditions is fulfilled: 1) any of the \(\varvec{\mathrm {C}}\) constraints is not met; 2) \(I\left( X_i;\overline{\varvec{\mathrm {Z}}}_i|\varvec{\mathrm {Z}}_i\right) /H\left( X_i\right) <\beta _i\) ; or 3) the set of candidate parents \(\varvec{\mathrm {T}}_i\) is empty. The *AddParents* procedure is shown next. The last two code lines imply that it is a quasi-recursive procedure; namely, the TBNL algorithm actually calls *AddParents* only once. Then, having obtained \(\varvec{\mathrm {Z}}_i\), it iteratively calls \(\varvec{\mathrm {Z}}_j=\textit{AddParents}\left( X_j,\overline{\varvec{\mathrm {Z}}}_i,\varvec{\mathrm {C}},\eta _j,\beta _j\right) \) for each \(X_j\mathrm {\in }\varvec{\mathrm {Z}}_i\). Note that the order of the iterations is well defined: the output parents from each iteration directly affect the input of the next step. Such a procedure generates different outputs than those that would have been obtained had the algorithm iterated the procedure only after obtaining the full set of parents. Thus, the TBNL calls \(\varvec{\mathrm {Z}}_t=\textit{AddParents}\left( X_t,\overline{\varvec{\mathrm {X}}}_t,\varvec{\mathrm {C}},\eta _t,\beta _t\right) \), which ultimately results in a DAG \(\mathcal {G}=\lbrace \varvec{\mathrm {Z}}_1,\varvec{\mathrm {Z}}_2,\ldots ,\varvec{\mathrm {Z}}_N\rbrace \).

## Rights and permissions

## About this article

### Cite this article

Gruber, A., Ben-Gal, I. Using targeted Bayesian network learning for suspect identification in communication networks.
*Int. J. Inf. Secur.* **17, **169–181 (2018). https://doi.org/10.1007/s10207-017-0362-4

Published:

Issue Date:

### Keywords

- Targeted Bayesian network learning
- Suspect identification
- Behavioral patterns
- Privacy
- Security
- Machine learning
- Cyber crimes
- Criminal behavior