A data reuse strategy based on deep learning for high dimensional data’s pattern and instance similarity

Abstract

Data reuse strategy is an effective method to save storage space and improve data utilization in data management. In view of the successful application of deep learning in the field of text mining, a data reuse strategy based on deep learning is proposed for high dimensional data’s pattern and instance similarity. With traditional feature analysis and deep learning model of convolutional neural network, the pattern similarity of data dimension is analyzed so as to optimize the similar dimension pairs among high dimensional data sets. Combining inner-attention mechanism, a semantic similarity model IA-LSTM is designed for instance similarity, which can build the association mapping among data entities by the calculation of the similarity of short text. Based on the pattern and instance similarity in the proposed strategy, reusable data entities are discovered, and column storage is designed to improve data reuse efficiency.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    Jha NK, Mittal S (2020) modeling data reuse in deep neural networks by taking data-types into cognizance. In: IEEE transactions on computers

  2. 2.

    Nie Y, Tang X, Ma Y, et al. (2020) Design of CNN computing module to improve data reuse. In: Microcontrollers and embedded systems

  3. 3.

    Belhadi H, Akli-Astouati K, Djenouri Y et al (2020) Data mining-based approach for ontology matching problem. Appl Intell 50(11):1204–1221

    Article  Google Scholar 

  4. 4.

    Chung TL, Xu B, Liu YB, Ouyang CP, Li SL, Luo LY (2019) Empirical study on character level neural network classifier for Chinese text. Eng Appl Artif Intell 802(1):1–6

    Article  Google Scholar 

  5. 5.

    Wei L, Guo XP (2017) Data reuse strategy based on parallel processing mechanism. Appl Res Comput 34(8):2324–2328

    Google Scholar 

  6. 6.

    Zhao WB, Fan TR, Nie YC et al (2018) Research on attribute dimension partition based on SVM classifying and MapReduce. Wirel Pers Commun 102(4):2759–2774

    Article  Google Scholar 

  7. 7.

    Sun ZQ, Hu W, Zhang QH, Qu YZ (2018) Bootstrapping entity alignment with knowledge graph embedding. In: Twenty-seventh international joint conference on artificial intelligence, IJCAI-18, pp 4396–4402

  8. 8.

    Xu K, Wang L, Yu M, et al. (2019) Cross-lingual knowledge graph alignment via graph matching neural network. In: Proceedings of the annual meeting of theassociation for computational linguistics, ACL, pp 3156–3161

  9. 9.

    Li C, Cao Y, Hou L, et al. (2019) Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: Proceedings of the conference on empirical methods in natural language processing and the international joint conference on natural language processing, EMNLP-IJCNLP, pp 2723–2732

  10. 10.

    Paulheim H (2017) Data-driven joint debugging of the dbpedia mappings and ontology. In: European semantic web conference. Springer, Cham, pp 404–418

  11. 11.

    Majid M, Wout H, Tan YH (2018) A comparative study of ontology matching systems via inferential statistics. IEEE Trans Knowl Data Eng 31:615–628

    Google Scholar 

  12. 12.

    Xue X, Liu J (2017) A compact hybrid evolutionary algorithm for large scale instance matching in linked open data cloud. Int J Artif Intell Tools 26(4):1750013

    Article  Google Scholar 

  13. 13.

    Ochieng P, Kyanda S (2018) A statistically-based ontology matching tool. Distrib Parallel Databases 36(1):195–217

    Article  Google Scholar 

  14. 14.

    Sang CJ, Pierro MD (2018) Improving trading technical analysis with TensorFlow long short-term memory (LSTM) neural network. J Finance Data Sci 2(1):1–6

    Article  Google Scholar 

  15. 15.

    Pratim Barman P, Boruah A (2018) A RNN based approach for next word prediction in assamese phonetic transcription. Proc Comput Sci 143(2):825–834

    Google Scholar 

  16. 16.

    Wang HY, Luo C, Wang XY (2019) Synchronization and identification of nonlinear systems by using a novel self-evolving interval type-2 fuzzy LSTM-neural network. Eng Appl Artif Intell 81(1):123–136

    Google Scholar 

  17. 17.

    Wu Y, Liu X, Feng Y, et al. (2019) Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proceedings of the international joint conference on artificial intelligence, IJCAI, pp 5278–5284

  18. 18.

    Zhao WB, Fan TR, Yin ZX et al (2020) An evaluation method of scientific research team influence based on heterogeneity and node similarity of content and structure. J Ambient Intell Human Comput 11:3617–3626

    Article  Google Scholar 

  19. 19.

    Sun Z, Wang C, Hu W, et al. (2020) Knowledge graph alignment network with gated multi-hop neighborhood aggregation. In: Proceedings of the AAAI conference on artificial intelligence, AAAI, pp 222–229

Download references

Acknowledgements

The authors acknowledge the national natural science foundation of China (61373160), the research project “Research on Repetition Detection Technology of High Dimensional Data based on Deep Learning” of Hebei science and technology information processing laboratory, the research project “Research on recognition method of knowledge evolution path for sequential associated text based on graph neural network” of the natural science foundation of Hebei province and the research project "Knowledge Graph Construction of Multi-Source Domain data based on Knowledge Representation learning" of the education department of Hebei province.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tongrang Fan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, F., Lv, H., Fan, T. et al. A data reuse strategy based on deep learning for high dimensional data’s pattern and instance similarity. Computing (2021). https://doi.org/10.1007/s00607-021-00964-4

Download citation

Keywords

  • Data reuse
  • High dimensional data
  • Deep learning
  • Pattern similarity
  • Instance similarity

Mathematics Subject Classification

  • 68P20
  • 68T07
  • 68U15
  • 68T09