A data reuse strategy based on deep learning for high dimensional data’s pattern and instance similarity


Data reuse strategy is an effective method to save storage space and improve data utilization in data management. In view of the successful application of deep learning in the field of text mining, a data reuse strategy based on deep learning is proposed for high dimensional data’s pattern and instance similarity. With traditional feature analysis and deep learning model of convolutional neural network, the pattern similarity of data dimension is analyzed so as to optimize the similar dimension pairs among high dimensional data sets. Combining inner-attention mechanism, a semantic similarity model IA-LSTM is designed for instance similarity, which can build the association mapping among data entities by the calculation of the similarity of short text. Based on the pattern and instance similarity in the proposed strategy, reusable data entities are discovered, and column storage is designed to improve data reuse efficiency.

The authors acknowledge the national natural science foundation of China (61373160), the research project “Research on Repetition Detection Technology of High Dimensional Data based on Deep Learning” of Hebei science and technology information processing laboratory, the research project “Research on recognition method of knowledge evolution path for sequential associated text based on graph neural network” of the natural science foundation of Hebei province and the research project "Knowledge Graph Construction of Multi-Source Domain data based on Knowledge Representation learning" of the education department of Hebei province.

Wu, F., Lv, H., Fan, T. et al. A data reuse strategy based on deep learning for high dimensional data's pattern and instance similarity. Computing (2021).

  • Data reuse
  • High dimensional data
  • Deep learning
  • Pattern similarity
  • Instance similarity

  • 68P20
  • 68T07
  • 68U15
  • 68T09