DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Ju, Xiangyu; Chen, Quan; Wang, Zhenning; Guo, Minyi; Gao, Guang R.

doi:10.1007/s10766-017-0525-y

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Published: 06 October 2017

Volume 46, pages 686–698, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Xiangyu Ju¹,
Quan Chen²,
Zhenning Wang¹,
Minyi Guo² &
…
Guang R. Gao³

254 Accesses
Explore all metrics

Abstract

Emerging recommender systems often adopt collaborative filtering techniques to improve the recommending accuracy. Existing collaborative filtering techniques are implemented with either alternating least square algorithm or gradient descent (GD) algorithm. However, both of the two algorithms are not scalable because ALS suffers from high computation complexity and GD suffers from severe synchronization problem and tremendous data movement. To solve the above problems, we proposed a Dataflow-based Collaborative Filtering (DCF) algorithm. More specifically, DCF exploits fine-grain asynchronous feature of dataflow model to minimize synchronization overhead; leverages mini-batch technique to reduce computation and communication complexities; uses dummy edge and multicasting techniques to avoid fine-grain overhead of dependency checking and reduce data movement. By utilizing all the above techniques, DCF is able to significantly improve the performance of collaborative filtering. Our experiment on a cluster with one master node and ten slave nodes show that DCF achieves 23\(\times \) speedup over ALS on Spark and 18\(\times \) speedup over GD on Graphlab in public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time-Based Distributed Collaborative Filtering Recommendation Algorithm

An Efficient Incremental Collaborative Filtering System

Accelerating Parallel ALS for Collaborative Filtering on Hadoop

References

Apache hadoop project. http://hadoop.apache.org/ (2017)
Abadi, M., Barham, P., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI. Savannah, Georgia, USA (2016)
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Article Google Scholar
Armbrust, M., Xin, R.S., et al.: Spark sql: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Euro-Par 23, 187–198 (2011). doi:10.1002/cpe.1631
Google Scholar
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Knowledge-based systems. Recomm. Syst. Surv. 46, 109–132 (2013)
Google Scholar
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc. (1998)
Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. (TIST) 6(1), 2 (2015)
Google Scholar
Culler, D.E.: Dataflow architectures. Technical report, DTIC Document (1986)
Dean, J., Corrado, G., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp. 69–77. ACM (2011)
Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: ICDM, pp. 263–272. IEEE (2008)
Kim, J.K., Ho, Q., Lee, S., Zheng, X., Dai, W., Gibson, G.A., Xing, E.P.: Strads: a distributed framework for scheduled model parallel machine learning. In: Eurosys, p. 5 (2016)
Koren, Y., Bell, R., Volinsky, C., et al.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Article Google Scholar
Li, M., Andersen, D.G., et al.: Scaling distributed machine learning with the parameter server. In: OSDI, vol. 1, p. 3 (2014)
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)
Meng, X., Bradley, J., et al.: Mllib: machine learning in apache spark. JMLR 17(34), 1–7 (2016)
MathSciNet MATH Google Scholar
Oh, J., Han, W.S., Yu, H., Jiang, X.: Fast and robust parallel SGD matrix factorization. In: SIGKDD, pp. 865–874. ACM (2015)
Takane, Y., Young, F.W., De Leeuw, J.: Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1), 7–67 (1977)
Article MATH Google Scholar
Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a codelet program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, pp. 64–69. ACM (2011)

Download references

Acknowledgements

We thank our anonymous reviewers for their feedback and suggestions. This work was partially sponsored by the National Basic Research 973 Program of China under Grant 2015CB352403 and the National Natural Science Foundation of China (NSFC) (61602301 & 61632017).

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, 200240, China
Xiangyu Ju & Zhenning Wang
Shanghai Institute for Advanced Communication and Data Science, Shanghai Jiao Tong University, Shanghai, 200240, China
Quan Chen & Minyi Guo
University of Delaware, Newark, DE, 19716, USA
Guang R. Gao

Authors

Xiangyu Ju
View author publications
You can also search for this author in PubMed Google Scholar
Quan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhenning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Guang R. Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyu Ju.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ju, X., Chen, Q., Wang, Z. et al. DCF: A Dataflow-Based Collaborative Filtering Training Algorithm. Int J Parallel Prog 46, 686–698 (2018). https://doi.org/10.1007/s10766-017-0525-y

Download citation

Received: 02 September 2017
Accepted: 18 September 2017
Published: 06 October 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10766-017-0525-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Abstract

Access this article

Similar content being viewed by others

Time-Based Distributed Collaborative Filtering Recommendation Algorithm

An Efficient Incremental Collaborative Filtering System

Accelerating Parallel ALS for Collaborative Filtering on Hadoop

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Abstract

Access this article

Similar content being viewed by others

Time-Based Distributed Collaborative Filtering Recommendation Algorithm

An Efficient Incremental Collaborative Filtering System

Accelerating Parallel ALS for Collaborative Filtering on Hadoop

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation