Skip to main content

Bridging the Gap Between Research and Production with CODE

  • 1747 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 11441)


Despite the ever-increasing enthusiasm from the industry, artificial intelligence or machine learning is a much-hyped area where the results tend to be exaggerated or misunderstood. Many novel models proposed in research papers never end up being deployed to production. The goal of this paper is to highlight four important aspects which are often neglected in real-world machine learning projects, namely Communication, Objectives, Deliverables, Evaluations (CODE). By carefully considering these aspects, we can avoid common pitfalls and carry out a smoother technology transfer to real-world applications. We draw from a priori experiences and mistakes while building a real-world online advertising platform powered by machine learning technology, aiming to provide general guidelines for translating ML research results to successful industry projects.


  • Machine learning
  • Project management
  • Online advertising
  • Real-time bidding

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-16142-2_22
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   79.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-16142-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   99.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. 1.

  2. 2.

  3. 3.

    Adding more languages will actually inflate the average accuracy because most other languages can be easily identified by looking at the character alone and have an accuracy close to 1 (e.g. Chinese, Korean).

  4. 4.


  1. Bagherjeiran, A., Tang, R., Zhang, Z., Hatch, A., Ratnaparkhi, A., Parekh, R.: Adaptive targeting for finding look-alike users. US Patent 9,087,332, 21 July 2015

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  3. Barker, J., Watanabe, S., Vincent, E., Trmal, J.: The fifth ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines. arXiv preprint arXiv:1803.10609 (2018)

  4. Boyko, A., Harchaoui, Z., Nedelec, T., Perchet, V.: A protocol to reduce bias and variance in head-to-head tests. Criteo Internal Report (2015)

    Google Scholar 

  5. Brooks, F.P.: The mythical man-month. Datamation 20(12), 44–52 (1974)

    Google Scholar 

  6. Enam, S.Z.: Why is machine learning ‘hard’? (2016). Accessed 10 Sept 2018

  7. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT press, Cambridge (2016)

    MATH  Google Scholar 

  8. Hermann, J., Del Balso, M.: Scaling machine learning at uber with michelangelo (2018).

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    CrossRef  Google Scholar 

  10. Jin, Y., Wanvarie, D., Le, P.: Combining lightly-supervised text classification models for accurate contextual advertising. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 545–554 (2017)

    Google Scholar 

  11. Juan, Y., Lefortier, D., Chapelle, O.: Field-aware factorization machines in a real-world online advertising system. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 680–688. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  12. Modi, A.N., et al.: TFX: a tensorflow-based production-scale machine learning platform. In: KDD 2017 (2017)

    Google Scholar 

  13. Ng, A.: AI transformation playbook: how to lead your company into the AI era (2018).

  14. Pappas, N., Popescu-Belis, A.: Multilingual hierarchical attention networks for document classification. arXiv preprint arXiv:1707.00896 (2017)

  15. Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid optimizing and inventory scoring in targeted online advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 804–812. ACM (2012)

    Google Scholar 

  16. Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)

    CrossRef  Google Scholar 

  17. Pfister, R., Janczyk, M.: Confidence intervals for two sample means: calculation, interpretation, and a few simple rules. Adv. Cogn. Psychol. 9(2), 74 (2013)

    CrossRef  Google Scholar 

  18. Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data management challenges in production machine learning. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1723–1726. ACM (2017)

    Google Scholar 

  19. Qu, Y., et al.: Product-based neural networks for user response prediction. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1149–1154. IEEE (2016)

    Google Scholar 

  20. Raeder, T., Stitelman, O., Dalessandro, B., Perlich, C., Provost, F.: Design principles of massive, robust prediction systems. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1357–1365. ACM (2012)

    Google Scholar 

  21. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015).

    CrossRef  MathSciNet  Google Scholar 

  22. Sculley, D., Phillips, T., Ebner, D., Chaudhary, V., Young, M.: Machine learning: the high-interest credit card of technical debt (2014)

    Google Scholar 

  23. Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehous. 5(4), 13–22 (2000)

    Google Scholar 

  24. Shi, L., Mihalcea, R., Tian, M.: Cross language text classification by model translation and semi-supervised learning. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1057–1067. Association for Computational Linguistics (2010)

    Google Scholar 

  25. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)

    Google Scholar 

  26. Thomas, R.: What do machine learning practitioners actually do? (2018). Accessed 10 Sept 2018

  27. Yuan, Y., Wang, F., Li, J., Qin, R.: A survey on real time bidding advertising. In: 2014 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), pp. 418–423. IEEE (2014)

    Google Scholar 

Download references


The first author is supported the scholarship from “The 100\(^{th}\) Anniversary Chulalongkorn University Fund for Doctoral Scholarship” and also “The 90\(^{th}\) Anniversary Chulalongkorn University Fund (Ratchadaphiseksomphot Endowment Fund)”. We would like to thank Assoc. Prof. Peraphon Sophatsathit and the anonymous reviewers for their careful reading and their insightful suggestions.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yiping Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Jin, Y., Wanvarie, D., Le, P.T.V. (2019). Bridging the Gap Between Research and Production with CODE. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11441. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16141-5

  • Online ISBN: 978-3-030-16142-2

  • eBook Packages: Computer ScienceComputer Science (R0)