Skip to main content
Log in

A bayesian-neural-networks framework for scaling posterior distributions over different-curation datasets

  • Research
  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose and experimentally assess an innovative framework for scaling posterior distributions over different-curation datasets, based on Bayesian-Neural-Networks (BNN). Another innovation of our proposed study consists in enhancing the accuracy of the Bayesian classifier via intelligent sampling algorithms. The proposed methodology is relevant in emerging applicative settings, such as provenance detection and analysis and cybercrime. Our contributions are complemented by a comprehensive experimental evaluation and analysis over both static and dynamic image datasets. Derived results confirm the successful application of our proposed methodology to emerging big data analytics settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of Supporting Data

Sources of data used in this manuscript are reported in the Section References.

References

  • Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., Gehrke, J., Haas, L., Halevy, A., Han, J., et al. (2011). Challenges and opportunities with big data 2011-1. Purdue University Cyber Center Technical Reports

  • Aitchison, L. (2021). A statistical theory of cold posteriors in deep neural networks. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021

  • Al Nuaimi, E., Al Neyadi, H., Mohamed, N., & Al-Jaroodi, J. (2015). Applications of big data to smart cities. Journal of Internet Services and Applications, 6(1), 1–15.

    Article  Google Scholar 

  • Barkwell, K.E., Cuzzocrea, A., Leung, C.K., Ocran, A.A., Sanderson, J.M., Stewart, J.A., Wodi, B.H. (2018). Big data visualisation and visual analytics for music data mining. In: 22nd International conference information visualisation, IV 2018, July 10-13, 2018, (pp. 235–240) Fisciano, Italy

  • Bonifati, A., & Cuzzocrea, A. (2006). Storing and retrieving path fragments in structured P2P networks. Data Knowl Eng, 59(2), 247–269.

    Article  Google Scholar 

  • Brooks, S., Gelman, A., Jones, G.L., Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, –

  • Chakrabarti, A., Zickler, T.E. (2011). Statistics of real-world hyperspectral images. In: The 24th IEEE conference on computer vision and pattern recognition, CVPR 2011, 20-25 June 2011, (pp. 193–200) Colorado Springs, CO, USA

  • Chen, T., Fox, E.B., Guestrin, C. (2014). Stochastic gradient hamiltonian monte carlo. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, 21-26 June 2014. JMLR Workshop and Conference Proceedings, (vol. 32, pp. 1683–1691) Beijing, China

  • Chen, Y., Welling, M. (2012). Bayesian structure learning for markov random fields with a spike and slab prior. In: Proceedings of the twenty-eighth conference on uncertainty in artificial intelligence, August 14-18, 2012, (pp. 174–184) Catalina Island, CA, USA

  • Coronato, A., & Cuzzocrea, A. (2022). An innovative risk assessment methodology for medical information systems. IEEE Trans. Knowl. Data Eng., 34(7), 3095–3110.

    Google Scholar 

  • Cuzzocrea, A. (2013). Analytics over big data: Exploring the convergence of datawarehousing, OLAP and data-intensive cloud infrastructures. In: 37th Annual IEEE computer software and applications conference, COMPSAC 2013, July 22-26, 2013, (pp. 481–483) Kyoto, Japan

  • Cuzzocrea, A., Soufargi, S., Baldo, A., Fadda, E. (2022). Scaling posterior distributions over differently-curated datasets: A bayesian-neural-networks methodology. In: Foundations of Intelligent Systems - 26th International Symposium, ISMIS 2022, October 3-5, 2022, Proceedings. Lecture Notes in Computer Science, (vol. 13515, pp. 198–208) Cosenza, Italy

  • Cuzzocrea, A., Leung, C. K., & MacKinnon, R. K. (2014). Mining constrained frequent itemsets from distributed uncertain data. Future Gener. Comput. Syst., 37, 117–126.

    Article  Google Scholar 

  • DeepMind. (2023). MuJoCo - Advanced Physics Simulation. https://mujoco.org/

  • Furuta, R., Inoue, N., & Yamasaki, T. (2020). Pixelrl: Fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multim., 22(7), 1704–1719.

    Article  Google Scholar 

  • Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, ICML 2018, July 10-15, 2018. Proceedings of Machine Learning Research, (vol. 80, pp. 1856–1865) Stockholmsmässan, Stockholm, Sweden

  • Heek, J., Kalchbrenner, N. (2019). Bayesian inference for large scale image classification. arXiv:1908.03491

  • Hoffman, M. D., & Gelman, A. (2014). The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1), 1593–1623.

    MathSciNet  Google Scholar 

  • Hou, J., Zhu, Z., Hou, J., Zeng, H., Wu, J., & Zhou, J. (2022). Deep posterior distribution-based embedding for hyperspectral image super-resolution. IEEE Transactions on Image Processing, 31, 5720–5732.

    Article  Google Scholar 

  • Jin, X., Lee, Y., Fiscus, J. G., Guan, H., Yates, A. N., Delgado, A., & Zhou, D. (2022). Mfc-prov: Media forensics challenge image provenance evaluation and data analysis on large-scale datasets. Neurocomputing, 470, 76–88.

    Article  Google Scholar 

  • Kemp, S. (2023). Exploring public cybercrime prevention campaigns and victimization of businesses: A bayesian model averaging approach. Comput. Secur., 127, 103089.

    Article  Google Scholar 

  • Koulali, R., Zaidani, H., & Zaim, M. (2021). Image classification approach using machine learning and an industrial hadoop based data pipeline. Big Data Res., 24, 100184.

    Article  Google Scholar 

  • Leung, C.K., Braun, P., Hoi, C.S.H., Souza, J., Cuzzocrea, A. (2019). Urban analytics of big transportation data for supporting smart cities. In: Big data analytics and knowledge discovery - 21st international conference, DaWaK 2019, August 26-29, 2019, Proceedings. Lecture Notes in Computer Science, (vol. 11708, pp. 24–33) Linz, Austria,

  • Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Cuzzocrea, A. (2020). Machine learning and OLAP on big COVID-19 data. In: 2020 IEEE international conference on big data (IEEE BigData 2020), December 10-13, 2020, (pp. 5118–5127) Atlanta, GA, USA

  • Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Wen, Y., Cuzzocrea, A. (2020). Big data visualization and visual analytics of COVID-19 data. In: 24th International conference on information visualisation, IV 2020, September 7-11, 2020, (pp. 415–420) Melbourne, Australia

  • Li, C., Chen, C., Carlson, D.E., Carin, L. (2016). Preconditioned stochastic gradient langevin dynamics for deep neural networks. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, (pp. 1788–1794) Phoenix, Arizona, USA

  • Liu B. (2020). Harnessing low-fidelity data to accelerate bayesian optimization via posterior regularization. In: 2020 IEEE international conference on big data and smart computing, BigComp 2020, February 19-22, 2020, (pp. 140–146) Busan, Korea (South)

  • Ma, Y., Chen, T., Fox, E.B. (2015). A complete recipe for stochastic gradient MCMC. In: Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015, December 7-12, 2015, (pp. 2917–2925)Montreal, Quebec, Canada

  • Milinovich, G. J., Magalhães, R. J. S., & Hu, W. (2015). Role of big data in the early detection of ebola and other emerging infectious diseases. The Lancet Global Health, 3(1), 20–21.

    Article  Google Scholar 

  • Morzfeld, M., Tong, X. T., & Marzouk, Y. M. (2019). Localization for MCMC: sampling high-dimensional posterior distributions with local structure. J. Comput. Phys., 380, 1–28.

    Article  MathSciNet  Google Scholar 

  • Nawaz, M.Z., Arif, O. (2016). Robust kernel embedding of conditional and posterior distributions with applications. In: 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, December 18-20, 2016, (pp. 39–44) Anaheim, CA, USA

  • Ngiam, K. Y., & Khor, W. (2019). Big data and machine learning algorithms for health-care delivery. The Lancet Oncology, 20(5), 262–273.

    Article  Google Scholar 

  • Nguyen, D.T., Nguyen, S.P., Pham, U.H., Nguyen, T.D. (2018). A calibration-based method in computing bayesian posterior distributions with applications in stock market. In: Predictive econometrics and big data. Studies in computational intelligence, (vol. 753, pp. 182–191)

  • Ollier, V., Korso, M.N.E., Ferrari, A., Boyer, R., Larzabal, P. (2018). Bayesian calibration using different prior distributions: An iterative maximum A posteriori approach for radio interferometers. In: 26th IEEE european signal processing conference, EUSIPCO 2018, September 3-7, 2018, (pp. 2673–2677) Roma, Italy

  • OpenAI. (2023). OpenAI Gym Library. https://www.gymlibrary.dev/index.html

  • Orgaz, G. B., Jung, J. J., & Camacho, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59.

    Article  Google Scholar 

  • Pearce, T., Tsuchida, R., Zaki, M., Brintrup, A., Neely, A. (2019). Expressive priors in bayesian neural networks: Kernel combinations and periodic functions. In: Proceedings of the Thirty-Fifth conference on uncertainty in artificial intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019. Proceedings of Machine Learning Research, (vol. 115, pp. 134–144)

  • Pendharkar, P. C. (2017). Bayesian posterior misclassification error risk distributions for ensemble classifiers. Eng. Appl. Artif. Intell., 65, 484–492.

    Article  Google Scholar 

  • Ramamoorthi, R.V., Sriram, K., Martin, R. (2015). On posterior concentration in misspecified models. Bayesian Analysis 10(4)

  • Ruli, E., & Ventura, L. (2016). Higher-order bayesian approximations for pseudo-posterior distributions. Commun. Stat. Simul. Comput., 45(8), 2863–2873.

    Article  MathSciNet  Google Scholar 

  • Russom, P. (2011). Big data analytics. TDWI best practices report, fourth quarter, 19(4), 1–34.

    Google Scholar 

  • Shokrzade, A., Ramezani, M., Tab, F. A., & Mohammad, M. A. (2021). A novel extreme learning machine based knn classification method for dealing with big data. Expert Syst. Appl., 183, 115293.

    Article  Google Scholar 

  • Snoek, J., Ovadia, Y., Fertig, E., Lakshminarayanan, B., Nowozin, S., Sculley, D., Dillon, J.V., Ren, J., Nado, Z. (2019). Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In: Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, (pp. 13969–13980) Vancouver, BC, Canada,

  • Springenberg, J.T., Klein, A., Falkner, S., Hutter, F. (2016). Bayesian optimization with robust bayesian neural networks. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, (pp. 4134–4142) Barcelona, Spain

  • Stuart, A. M., & Teckentrup, A. L. (2018). Posterior consistency for gaussian process approximations of bayesian posterior distributions. Math. Comput., 87(310), 721–753.

    Article  MathSciNet  Google Scholar 

  • Tran, B., Rossi, S., Milios, D., & Filippone, M. (2022). All you need is a good functional prior for bayesian deep learning. J. Mach. Learn. Res., 23, 74–17456.

    MathSciNet  Google Scholar 

  • Tsai, C.-W., Lai, C.-F., Chao, H.-C., & Vasilakos, A. V. (2015). Big data analytics: a survey. Journal of Big data, 2(1), 1–32.

    Article  Google Scholar 

  • Wang, X., Li, T., Cheng, Y., & Chen, C. L. P. (2022). Inference-based posteriori parameter distribution optimization. IEEE Trans. Cybern., 52(5), 3006–3017.

    Article  Google Scholar 

  • Wang, J., & Perez, L. (2017). The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Networks Vis. Recognit, 11(2017), 1–8.

    Google Scholar 

  • Wenzel, F., Roth, K., Veeling, B.S., Swiatkowski, J., Tran, L., Mandt, S., Snoek, J., Salimanss, T., Jenatton, R., Nowozin, S. (2020). How good is the bayes posterior in deep neural networks really? In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, (vol. 119, pp. 10248–10259)

  • Xu, Y., Du, B., Zhang, L., Cerra, D., Pato, M., Carmona, E., Prasad, S., Yokoya, N., Hänsch, R., & Saux, B. L. (2019). Advanced multi-sensor optical remote sensing for urban land use and land cover classification Outcome of the 2018 IEEE GRSS data fusion contest. IEEE J Sel Top Appl Earth Obs Remote Sens, 12(6), 1709–1724.

    Article  Google Scholar 

  • Yasuma, F., Mitsunaga, T., Iso, D., & Nayar, S. K. (2010). Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process., 19(9), 2241–2253.

    Article  MathSciNet  Google Scholar 

  • Zhu, L., Yu, F. R., Wang, Y., Ning, B., & Tang, T. (2019). Big data analytics in intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, 20(1), 383–398.

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the ICSC National Research Centre for High Performance Computing, Big Data and Quantum Computing within the NextGenerationEU program (Project Code: PNRR CN00000013).

Funding

Not Applicable.

Author information

Authors and Affiliations

Authors

Contributions

Alfredo Cuzzocrea: Conceptualization, Methodology, Validation, Resources, Writing– original draft, Writing – review &; editing. Alessandro Baldo: Validation, Writing - original draft, Writing - review &; editing. Edoardo Fadda: Validation. All authors have reviewed the manuscript.

Corresponding author

Correspondence to Alfredo Cuzzocrea.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been made in the context of the Excellence Chair in Big Data Management and Analytics at University of Paris City, Paris, France.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cuzzocrea, A., Baldo, A. & Fadda, E. A bayesian-neural-networks framework for scaling posterior distributions over different-curation datasets. J Intell Inf Syst (2023). https://doi.org/10.1007/s10844-023-00837-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10844-023-00837-6

Keywords

Navigation