Challenges and opportunities: from big data to knowledge in AI 2.0



In this paper, we review recent emerging theoretical and technological advances of artificial intelligence (AI) in the big data settings. We conclude that integrating data-driven machine learning with human knowledge (common priors or implicit intuitions) can effectively lead to explainable, robust, and general AI, as follows: from shallow computation to deep neural reasoning; from merely data-driven model to data-driven with structured logic rules models; from task-oriented (domain-specific) intelligence (adherence to explicit instructions) to artificial general intelligence in a general context (the capability to learn from experience). Motivated by such endeavors, the next generation of AI, namely AI 2.0, is positioned to reinvent computing itself, to transform big data into structured knowledge, and to enable better decision-making for our society.

Key words

Deep reasoning Knowledge base population Artificial general intelligence Big data Cross media 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors would like to thank the following contributors from the College of Computer Science and Technology, Zhejiang University: Wei CHEN, Xi LI, Si-liang TANG, Zhou ZHAO, Yang YANG, and Zi-cheng LIAO. Special thanks to Zhong-fei (Mark) ZHANG and Ya-hong HAN.


  1. Abadi, M., Agarwal, A., Barham, P., et al., 2016. Tensor-Flow: large-scale machine learning on heterogeneous distributed systems. ePrint Archive, arXiv:1603.04467.Google Scholar
  2. Auer, S., Bizer, C., Kobilarov, B., et al., 2007. DBpedia: a nucleus for a web of open data. Proc. 6th Int. Semantic Web Conf. & 2nd Asian Semantic Web Conf., p.722–735. Google Scholar
  3. Bahdanau, D., Cho, K., Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. ePrint Archive, arXiv:1409.0473.Google Scholar
  4. Baudisch, P., Good, N., Bellotti, V., et al., 2002. Keeping things in context: a comparative evaluation of focus plus context screens, overviews, and zooming. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.259–266. Google Scholar
  5. Bergstra, J., Breuleux, O., Bastien, F., et al., 2010. Theano: a CPU and GPU math compiler in Python. Proc. 9th Python in Science Conf., p.1–7.Google Scholar
  6. Bollacker, K., Evans, C., Paritosh, P., et al., 2008. Freebase: a collaboratively created graph database for structuring human knowledge. Proc. ACM SIGMOD Int. Conf. Management of Data, p.1247–1250. Google Scholar
  7. Brill, E., 1992. A simple rule-based part of speech tagger. Proc. Workshop on Speech and Natural Language, p.112–116. CrossRefGoogle Scholar
  8. Carlson, A., Betteridge, J., Kisiel, B., et al., 2010. Toward an architecture for never-ending language learning. Proc. 24th AAAI Conf. on Artificial Intelligence, p.3–11.Google Scholar
  9. Cho, K., Courville, A., Bengio, Y., 2015. Describing multimedia content using attention-based encoder-decoder networks. IEEE Trans. Multim., 17(11):1875–1886. CrossRefGoogle Scholar
  10. Collobert, R., Bengio, S., Mariéthoz, J., 2002. Torch: a Modular Machine Learning Software Library. IDIAP Research Report No. IDIAP-RR 02-46, Dalle Molle Institute for Perceptual Artificial Intelligence, Martigny, Switzerland.Google Scholar
  11. Gordo, A., Almazan, J., Revaud, J., et al., 2016. End-to-end learning of deep visual representations for image retrieval. ePrint Archive, arXiv:1610.07940.Google Scholar
  12. Harris, Z.S., 1954. Distributional structure. In: Hiz, H. (Ed.), Formal Linguistics Series. Springer Netherlands, Houten, Netherlands. Google Scholar
  13. He, K.M., Zhang, X.Y., Ren, S.Q., et al., 2015. Deep residual learning for image recognition. ePrint Archive, arXiv:1512.03385.Google Scholar
  14. Hu, Z.T., Ma, X.Z., Liu, Z.Z., et al., 2016. Harnessing deep neural networks with logic rules. ePrint Archive, arXiv:1603.06318.CrossRefGoogle Scholar
  15. Ip, C.Y., Varshney, A., 2011. Saliency-assisted navigation of very large landscape images. IEEE Trans. Visual. Comput. Graph., 17(12):1737–1746. CrossRefGoogle Scholar
  16. Jia, Y.Q., Shelhamer, E., Donahue, J., et al., 2014. Caffe: convolutional architecture for fast feature embedding. Proc. 22nd ACM Int. Conf. on Multimedia, p.675–678. Google Scholar
  17. Kalchbrenner, N., Grefenstette, E., Blunsom, P., 2014. A convolutional neural network for modelling sentences. ePrint Archive, arXiv:1404.2188.CrossRefGoogle Scholar
  18. Karpathy, A., Joulin, A., Li, F.F.F., 2014. Deep fragment embeddings for bidirectional image sentence mapping. Proc. Advances in Neural Information Processing Systems, p.1889–1897.Google Scholar
  19. Kim, Y.M., Varshney, A., 2006. Saliency-guided enhancement for volume visualization. IEEE Trans. Visual. Comput. Graph., 12(5):925–932. CrossRefGoogle Scholar
  20. Kitcher, P., 1988. Marr’s computational theory of vision. Philos. Sci., 55(1):1–24.MathSciNetCrossRefGoogle Scholar
  21. Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. 26th Annual Conf. on Neural Information Processing Systems, p.1097–1105.Google Scholar
  22. Lee, C.Y., Xie, S., Gallagher, P., et al., 2015. Deeply-supervised nets. Artificial Intelligence and Statistics Conf., p.562–570.Google Scholar
  23. Li, J.W., Monroe, W., Ritter, A., et al., 2016. Deep reinforcement learning for dialogue generation. ePrint Archive, arXiv:1606.01541.CrossRefGoogle Scholar
  24. Liu, Y., Sun, C.J., Lin, L., et al., 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. ePrint Archive, arXiv:1605.09090.Google Scholar
  25. Low, Y.C., Gonzalez, J.E., Kyrola, A., et al., 2014. GraphLab: a new framework for parallel machine learning. ePrint Archive, arXiv:1408.2041.Google Scholar
  26. Mackinlay, J., Hanrahan, P., Stolte, C., 2007. Show me: automatic presentation for visual analysis. IEEE Trans. Visual. Comput. Graph., 13(6):1137–1144. CrossRefGoogle Scholar
  27. Marrinan, T., Aurisano, J., Nishimoto, A., et al., 2014. SAGE2: a new approach for data intensive collaboration using scalable resolution shared displays. Int. Conf. on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), p.177–186. Google Scholar
  28. McCarthy, J., Minsky, M.L., Rochester, N., et al., 2006. A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Mag., 27(4):12–14.Google Scholar
  29. Mikolov, T., Chen, K., Corrado, G., et al., 2013. Efficient estimation of word representations in vector space. ePrint Archive, arXiv:1301.3781.Google Scholar
  30. Neal, R.M., 2012. Bayesian Learning for Neural Networks. Springer Science & Business Media, Berlin, Germany.MATHGoogle Scholar
  31. Pan, Y.H., 2016. Heading toward artificial intelligence 2.0. Engineering, 2(4):409–413. CrossRefGoogle Scholar
  32. Rezende, D.J. Mohamed, S., Danihelka, I., et al., 2016. One-shot generalization in deep generative models. ePrint Archive, arXiv:1603.05106.Google Scholar
  33. Russell, S.J., Norvig, P., Canny, J., et al., 2003. Artificial Intelligence: a Modern Approach. Prentice Hall, Upper Saddle River, USA.MATHGoogle Scholar
  34. Sacha, D., Stoffel, A., Stoffel, F., et al., 2014. Knowledge generation model for visual analytics. IEEE Trans. Visual. Comput. Graph., 20(12):1604–1613. CrossRefGoogle Scholar
  35. Sarjant, S., Legg, C., Robinson, M., et al., 2009. All you can eat ontology-building: feeding Wikipedia to Cyc. Proc. Int. Joint Conf. on Web Intelligence and Intelligent Agent Technology, p.341–348. Google Scholar
  36. Schroeder, W.J., Lorensen, B., Martin, K., 2004. The Visualization Toolkit: an Object-Oriented Approach to 3D Graphics. Kitware, New York, USA.Google Scholar
  37. Shijia, E., Jia, S.B., Yang, X., et al., 2016. Knowledge graph embedding for link prediction and triplet classification. China Conf. on Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data, p.228–232. CrossRefGoogle Scholar
  38. Shneiderman, B., 1996. The eyes have it: a task by data type taxonomy for information visualizations. Proc. IEEE Symp. on Visual Languages, p.336–343. Google Scholar
  39. Shojaee, S.M., Baghshah, M.S., 2016. Semi-supervised zeroshot learning by a clustering-based approach. ePrint Archive, arXiv:1605.09016.Google Scholar
  40. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. ePrint Archive, arXiv:1409.1556.Google Scholar
  41. Sutskever, I., Vinyals, O., Le, Q.V., 2014. Sequence to sequence learning with neural networks. Conf. on Neural Information Processing Systems, p.3104–3112.Google Scholar
  42. Szegedy, C., Liu, W., Jia, Y.Q., et al., 2015. Going deeper with convolutions. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.1–9.Google Scholar
  43. Vrandečić, D., Krötzsch, M., 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, 57(10):78–85. CrossRefGoogle Scholar
  44. Weston, J., Chopra, S., Bordes, A., 2014. Memory networks. ePrint Archive, arXiv:1410.3916.Google Scholar
  45. Wu, F., Yu, Z., Yang, Y., et al., 2014. Sparse multi-modal hashing. IEEE Trans. Multim., 16(2):427–439. CrossRefGoogle Scholar
  46. Wu, F., Jiang, X.Y., Li, X., et al., 2015. Cross-modal learning to rank via latent joint representation. IEEE Trans. Imag. Process., 24(5):1497–1509. CrossRefGoogle Scholar
  47. Zhuang, Y.T., Song, J., Wu, F., et al., 2016. Multi-modal deep embedding via hierarchical grounded compositional semantics. IEEE Trans. Circ. Syst. Video Technol. Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Yue-ting Zhuang
    • 1
  • Fei Wu
    • 1
  • Chun Chen
    • 1
  • Yun-he Pan
    • 1
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations