An Active Workflow Method for Entity-Oriented Data Collection

  • Gaoyang GuoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11158)


In the era of big data, people are dealing with data all the time. Data collection is the first step and foundation for many other downstream applications. Meanwhile, we observe that data collection is often entity-oriented, i.e., people usually collect data related to a specific entity. In most cases, people achieve entity-oriented data collection by manual query and filtering based on search engines or news applications. However, these methods are not very efficient and effective. In this paper, we consider designing reasonable process rules and integrating artificial intelligence algorithms to help people efficiently and effectively collect the target data related to the specific entity. Concretely, we propose an active workflow method to achieve this goal. The whole workflow method is composed of four processes: task modeling for data collection, Internet data collection, crowdsourcing data collection and multi-source data aggregation.


Data collection Entity-oriented Workflow 



This work was supported in part by the National Key Research and Development Program of China (No. 2017YFC0820402), the Intelligent Manufacturing Comprehensive Standardization and New Pattern Application Project of Ministry of Industry and Information Technology (Experimental validation of key technical standards for trusted services in industrial Internet), and the National Natural Science Foundation of China (No. 61373023).


  1. 1.
    Buettner, R.: A systematic literature review of crowdsourcing research from a human resource management perspective. In: Hawaii International Conference on System Sciences, pp. 4609–4618 (2015)Google Scholar
  2. 2.
    Corby, O., Dieng-Kuntz, R., Faron-Zucker, C.: Querying the semantic web with corese search engine. In: Eureopean Conference on Artificial Intelligence, ECAI 2004, Including Prestigious Applicants of Intelligent Systems, PAIS 2004, Valencia, Spain, August, pp. 705–709 (2017)Google Scholar
  3. 3.
    Curcin, V., Ghanem, M., Guo, Y.: The design and implementation of a workflow analysis tool. Philos. Trans. Math. Phys. Eng. Sci. 368(1926), 4193 (2010)CrossRefGoogle Scholar
  4. 4.
    Doan, A.H., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)CrossRefGoogle Scholar
  5. 5.
    Georgakopoulos, D., Hornick, M., Sheth, A.: An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib. Parallel Databases 3(2), 119–153 (1995)CrossRefGoogle Scholar
  6. 6.
    Guo, G., Wang, C., Chen, J., Ge, P., Chen, W.: Who is answering whom? Finding “reply-to” relations in group chats with deep bidirectional lstm networks. Clust. Comput. 10, 1–12 (2018)Google Scholar
  7. 7.
    Guo, G., Wang, C., Ying, X.: Which algorithm performs best: algorithm selection for community detection. In: Companion of the The Web Conference, pp. 27–28 (2018)Google Scholar
  8. 8.
    Kobayashi, M., Takeda, K.: Information retrieval on the web. Annu. Rev. Inf. Sci. Technol. 39(1), 33–80 (2005)Google Scholar
  9. 9.
    Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1977)CrossRefGoogle Scholar
  10. 10.
    Shaila, S.G., Vadivel, A.: Architecture specification of rule-based deep web crawler with indexer. Int. J. Knowl. Web Intell. 4(4), 166–186 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of SoftwareTsinghua UniversityBeijingChina

Personalised recommendations