Using targeted Bayesian network learning for suspect identification in communication networks


This paper proposes a machine learning application to identify mobile phone users suspected of involvement in criminal activities. The application characterizes the behavioral patterns of suspect users versus non-suspect users based on usage metadata such as call duration, call distribution, interaction time preferences and text-to-call ratios while avoiding any access to the content of calls or messages. The application is based on targeted Bayesian network learning method. It generates a graphical network that can be used by domain experts to gain intuitive insights about the key features that can help identify suspect users. The method enables experts to manage the trade-off between model complexity and accuracy using information theory metrics. Unlike other graphical Bayesian classifiers, the proposed application accomplishes the task required of a security company, namely an accurate suspect identification rate (recall) of at least 50% with no more than a 1% false identification rate. The targeted Bayesian network learning method is also used for additional tasks such as anomaly detection, distinction between “relevant” and “irrelevant” anomalies, and for associating anonymous telephone numbers with existing users by matching behavioral patterns.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Ben-Akiva, M., Bierlaire, M.: Discrete choice methods and their applications to short term travel decisions. In: Handbook of Transportation Science, pp. 5–33. Springer, New York (1999)

  2. 2.

    Ben-Dov, M., Wu, W., Feldman, R., Cairns, P.A.: Improving Knowledge Discovery by Combining Text-Mining & Link Analysis Techniques. Lake Buena Vista, Florida: Workshop on Link Analysis, Counter-terrorism, and Privacy, in conjunction with SIAM International Conference on Data Mining (2004)

  3. 3.

    Ben-Gal, I.: Bayesian networks. In: Ruggeri, F., Faltin, F., Kenett, R. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, New Jersey (2007)

    Google Scholar 

  4. 4.

    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

  5. 5.

    Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. 2002. Stat. Sci. 17, 235 (2002)

    Article  MATH  Google Scholar 

  6. 6.

    Bouchard, M., Joffres, K., Frank, R.: Preliminary analytical considerations in designing a terrorism and extremism online network extractor. In: Computational Models of Complex Systems, pp. 171–184. Springer International Publishing (2014)

  7. 7.

    Boulton, G.: Open your minds and share your results. Nature 486(7404), 441–441 (2012)

    Article  Google Scholar 

  8. 8.

    Burns K.: In US cities, open data is not just nice to have; it’s the norm. The Guardian, 21 Oct 2013. (2013)

  9. 9.

    Chickering, D.M., Geiger, D., Heckerman, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995)

    MATH  Google Scholar 

  10. 10.

    Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17–7, 641–650 (1995)

    Article  Google Scholar 

  11. 11.

    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT-14, 462–467 (1968)

  12. 12.

    Claeskens, N.L., Hjort, G.: The focused information criterion. J. Am. Stat. Assoc. 98, 900–945 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    De Montjoye, Y.A., Radaelli, L., Singh, V.K.: Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 347(622), 536–539 (2015)

    Article  Google Scholar 

  14. 14.

    Duda, R.R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    Google Scholar 

  15. 15.

    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)

    Article  MATH  Google Scholar 

  16. 16.

    Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2(4), 42–47 (2012)

    Google Scholar 

  17. 17.

    Grau, J., Ben-Gal, I., Posch, S., Grosse, I.: VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res 34(suppl 2), W529–W533 (2006)

    Article  Google Scholar 

  18. 18.

    Gruber, A., Ben-Gal, I.: Efficient Bayesian network learning for optimization in systems engineering. Qual. Technol. Quant. Manag. 9–1, 97–114 (2012)

    Article  Google Scholar 

  19. 19.

    Heckerman, D.: A tutorial on learning with Bayesian networks.: MS TR-95-06 (1995)

  20. 20.

    Jensen, D., Rattigan, M., Blau, H.: Information awareness: a prospective technical assessment. In: Proceedings of SIGKDD03 , pp. 378–387 (2003)

  21. 21.

    Kelner, K., Lerner, B.: Learning Bayesian network classifiers by risk minimization. Int. J. Approx. Reason. 53, 248–272 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Kreykes, B.D.: Data mining and counter-terrorism: the use of telephone records as an investigatory tool in the war on terror. ISJLP 4, 431 (2008)

  23. 23.

    Marturana, F., Tacconi, S.: A machine learning-based triage methodology for automated categorization of digital media. Digit. Investig. 10, 193–204 (2013)

    Article  Google Scholar 

  24. 24.

    Mayer, J., Mutchler, P., Mitchell, J.C.: Evaluating the privacy properties of telephone metadata. Proc. Nat. Acad. Sci. 113(20), 5536–5541 (2016)

    Article  Google Scholar 

  25. 25.

    Mena, J.: Homeland security techniques and technologies. Charles River Media 198(254), 262–263 (2007)

    Google Scholar 

  26. 26.

    Meng, G., Dan, L., Ni-hong, W., Li-chen, L.: A network intrusion detection model based on K-means algorithm and information entropy. Int. J. Secur. Appl. 8(6), 285–294 (2014)

    Google Scholar 

  27. 27.

    Nhauo, D., Sung-Ryul, K.: Classification of malicious domain names using support vector machine and Bi-gram method. Int. J. Secur. Appl. 7(1) January, 51 (2013)

  28. 28.

    Ng, A., Jordan, M.: On discriminative versus generative classifiers: a comparison of logistic regression and naive Bayes. Adv Neural Inf. Process. Syst. v2. pp. 841–848 (2002)

  29. 29.

    Pearl, J.J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)

    Google Scholar 

  30. 30.

    Pearl, J.J.: Causality: Models, Reasoning, and Inference. University Press, Cambridge (2000)

    Google Scholar 

  31. 31.

    Phua, C., Lee, V., Smith, K., Gayler, R.: A comprehensive survey of data mining-based fraud detection research. (2005)

  32. 32.

    Shmueli, E., Tassa, T.W., Shapira, B., Rokach, L.: Data mining for software trustworthiness. Inf. Sci. 191, 98–127 (2012)

    Article  MATH  Google Scholar 

  33. 33.

    Stolfo, S.J., Fan, W., Lee, W., Prodromidis, A., Chan, P.K.: Cost-based modeling for fraud and intrusion detection: results from the JAM project. In: DARPA Information Survivability Conference and Exposition, 2000. DISCEX’00. IEEE Proceedings, vol. 2, pp. 130–144. (2000)

  34. 34.

    Williamson, J.J.: Approximating discrete probability distributions with Bayesian networks. Hobart Tasmani, Proceedings of the International Conference on Artificial Intelligence in Science and Technology (2000)

  35. 35.

    Van Renesse, R., Birman, K., Vogels, W.: Astrolabe: a robust and scalable technology for distributed system monitoring management, and data mining. ACM Trans. Comput. Syst. 21, 164–206 (2003)

    Article  Google Scholar 

  36. 36.

    Zhu, D., Premkumar, G., Zhang, X., Chu, C.H.: Data mining for network intrusion detection: a comparison of alternative methods. Decis. Sci. 32, 635–660 (2001)

    Article  Google Scholar 

Download references


This research was partially supported by the Israeli Chief Scientist Magneton program no. 44596, “Target-Based Bayesian Network Modeling for Homeland Security applications” (Principle Investigator: Prof. Irad Ben-Gal). We are grateful for the support of our colleagues from the industry in this project, as well as for Shai Yanovski’s participation in the project.

Author information



Corresponding author

Correspondence to I. Ben-Gal.

Appendix: the TBNL algorithm

Appendix: the TBNL algorithm

The TBNL algorithm (for more details see [18]) uses a recursive procedure that can be applied to any node that represents a variable. The procedure, called AddParents (described below) adds edges from candidate nodes to the node to which the procedure is currently applied: each time, it adds the edge from the node with the highest Information Gain (IG) value. Essentially, AddParents is a greedy, forward feature selection procedure, which is similar to the feature selection scheme used by the adding-arrows principle [34]. The main difference is that the TBNL algorithm starts with the class variable and then proceeds recursively to the selected parent nodes. In particular, the TBNL algorithm starts by applying the AddParents procedure to the target node to select its parents. Then, AddParents is applied to each parent sequentially to select its own parents from the set of the target node’s parents. Thus, any node in the network can be a parent of the target node (i.e., corresponding to a limited form of a Markov blanket) while still maintaining the DAG structure. The input parameters of the AddParents procedure are as follows: \(X_i\) represents the current node; \(\varvec{\mathrm {T}}_i\) represents the set of the candidate parents of \(X_i\); \(\varvec{\mathrm {C}}\) represents the set of arbitrary constraints on the network such as the number of permitted parameters; \(\eta _i\) represents a constraint of the maximum allowed MI concerning \(X_i\); and \(\beta _i\) represents the minimum IG “step size” when adding a parent to \(X_i\) in the network. After applying the AddParents procedure to node \(X_i\), the output is the set of parent nodes \(\varvec{\mathrm {Z}}_i\) if one of the following conditions is fulfilled: 1) any of the \(\varvec{\mathrm {C}}\) constraints is not met; 2) \(I\left( X_i;\overline{\varvec{\mathrm {Z}}}_i|\varvec{\mathrm {Z}}_i\right) /H\left( X_i\right) <\beta _i\) ; or 3) the set of candidate parents \(\varvec{\mathrm {T}}_i\) is empty. The AddParents procedure is shown next. The last two code lines imply that it is a quasi-recursive procedure; namely, the TBNL algorithm actually calls AddParents only once. Then, having obtained \(\varvec{\mathrm {Z}}_i\), it iteratively calls \(\varvec{\mathrm {Z}}_j=\textit{AddParents}\left( X_j,\overline{\varvec{\mathrm {Z}}}_i,\varvec{\mathrm {C}},\eta _j,\beta _j\right) \) for each \(X_j\mathrm {\in }\varvec{\mathrm {Z}}_i\). Note that the order of the iterations is well defined: the output parents from each iteration directly affect the input of the next step. Such a procedure generates different outputs than those that would have been obtained had the algorithm iterated the procedure only after obtaining the full set of parents. Thus, the TBNL calls \(\varvec{\mathrm {Z}}_t=\textit{AddParents}\left( X_t,\overline{\varvec{\mathrm {X}}}_t,\varvec{\mathrm {C}},\eta _t,\beta _t\right) \), which ultimately results in a DAG \(\mathcal {G}=\lbrace \varvec{\mathrm {Z}}_1,\varvec{\mathrm {Z}}_2,\ldots ,\varvec{\mathrm {Z}}_N\rbrace \).


Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gruber, A., Ben-Gal, I. Using targeted Bayesian network learning for suspect identification in communication networks. Int. J. Inf. Secur. 17, 169–181 (2018).

Download citation


  • Targeted Bayesian network learning
  • Suspect identification
  • Behavioral patterns
  • Privacy
  • Security
  • Machine learning
  • Cyber crimes
  • Criminal behavior