Extend UDF Technology for Integrated Analytics

Chen, Qiming; Hsu, Meichun; Liu, Rui

doi:10.1007/978-3-642-03730-6_21

Qiming Chen^19,20,
Meichun Hsu^19,20 &
Rui Liu^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5691))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1107 Accesses
14 Citations

Abstract

Running analytics computation inside database engines through the use of UDFs (User Defined Functions) has been extensively investigated, but not yet become a scalable approach due to two major limitations. One limitation lies in that the existent UDFs are not relation-in, relation-out and schema-aware, unable to model complex applications, and cannot be composed with relational operators in a SQL query. Another limitation lies in the difficulty of programming UDFs for efficient interaction with query processing, since that requires hard-to-follow system knowledge beyond the analytics expertise. These limitations actually keep away most users from using UDFs for their analytics applications.

To solve these problems, we extend the UDF technology in both semantic and system dimensions. We first expand our investigation on Relation Valued Functions (RVFs) with the goal of having RVF executions tightly integrated with query processing, but allowing RVF developers to be liberated from DBMS internal details. We separate an RVF into two parts: RVF shell that contains the system utilities, and user-function that contains application logic only. We provided focused system support based on the notion of invocation pattern, and developed the mechanism for generating an RVF-shell automatically based on the schemas of its argument and return relations, the well understood invocation pattern, and the common data conversion protocol. A complete RVF is made by plugging the “user function” in the RVF-shell.

We have prototyped the proposed approach on the open-sourced database engine Postgres. Our experience reveals its advantages in making UDF tightly integrated with the query executor but relieving analytics users from dealing with system details – a fundamental data engineering requirement to make UDF technology practically usable for converging data intensive analytics and data management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Argyros, T.: How Aster In-Database MapReduce Takes UDF’s to the next Level (2008), http://www.asterdata.com/
Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In: VLDB (2008)
Google Scholar
Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems (CoopIS 2008) (2008)
Google Scholar
Chen, Q., Hsu, M.: Inter-Enterprise Collaborative Business Process Management. In: Proc. of 17th Int’l Conf on Data Engineering (ICDE 2001), Germany (2001)
Google Scholar
Chen, Q., Hsu, M.: Support Dataflow Applications inside Database Engine. Submitted to ER 2009 (2009)
Google Scholar
Cooper, B.F., et al.: PNUTS: Yahoo!’s Hosted Data Serving Platform. In: VLDB (2008)
Google Scholar
Dayal, U., Hsu, M., Ladin, R.: A Transaction Model for Long-Running Activities. In: VLDB 1991 (1991); Received 10 Year Best Paper Award in 2001
Google Scholar
Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation”. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM Press, New York (2006)
Google Scholar
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation And Data Management System. In: VLDB (2008)
Google Scholar
Jaedicke, M., Mitschang, B.: User-Defined Table Operators: Enhancing Extensibility of ORDBMS. In: VLDB (1999)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant key points. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Moran, B.: UDFs Endanger Performance, http://www.sqlmag.com/Article/ArticleID/42139/sql_server_42139.html
Novick, A.: Drilling Down into Performance Problem. In: Transact-SQL User-Defined Functions, ch. 11, pp. 235–244, Wordware Publishing (2004) ISBN 1-55622
Google Scholar
Ordonez, C., Garcia-Garcia, J.: vector and Matrix Operations Programmed with UDFs in a Relational DBMS. In: CIKM 2006 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

HP Labs, Palo Alto, California, USA
Qiming Chen, Meichun Hsu & Rui Liu
HP Labs, Beijing, China
Qiming Chen, Meichun Hsu & Rui Liu

Authors

Qiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meichun Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Aalborg University, Selma Lagerlöfsvej 300, 9220, Aalborg Ø, Denmark
Torben Bach Pedersen
IBM India Research Lab, Plot No. 4, Block C, Institutional Area, Vasant Kunj, 110 070, New Delhi, India
Mukesh K. Mohania
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Hsu, M., Liu, R. (2009). Extend UDF Technology for Integrated Analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-03730-6_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics