Scaling-Up and Speeding-Up Video Analytics Inside Database Engine
Most conventional video processing platforms treat database merely as a storage engine rather than a computation engine, which causes inefficient data access and massive amount of data movement. Motivated by providing a convergent platform, we push down video processing to the database engine using User Defined Functions (UDFs).
However, the existing UDF technology suffers from two major limitations. First, a UDF cannot take a set of tuples as input or as output, which restricts the modeling capability for complex applications, and the tuple-wise pipelined UDF execution often leads to inefficiency and rules out the potential for enabling data-parallel computation inside the function. Next, the UDFs coded in non-SQL language such as C, either involve hard-to-follow DBMS internal system calls for interacting with the query executor, or sacrifice performance by converting input objects to strings.
To solve the above problems, we realized the notion of Relation Valued Function (RVF) in an industry-scale database engine. With tuple-set input and output, an RVF can have enhanced modeling power, efficiency and in-function data-parallel computation potential. To have RVF execution interact with the query engine efficiently, we introduced the notion of RVF invocation patterns and based on that developed RVF containers for focused system support.
We have prototyped these mechanisms on the Postgres database engine, and tested their power with Support Vector Machine (SVM) classification and learning, the most widely used analytics model for video understanding. Our experience reveals the value of the proposed approach in multiple dimensions: modeling capability, efficiency, in-function data-parallelism with multi-core CPUs, as well as usability; all these are fundamental to converging data-intensive analytics and data management.
Unable to display preview. Download preview PDF.
- 1.Boser, B.E., et al.: A Training Algorithm for Optimal Margin Classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, vol. 5, pp. 144–152 (1992)Google Scholar
- 2.Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In: VLDB 2008 (2008)Google Scholar
- 3.Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems, CoopIS 2008 (2008)Google Scholar
- 4.Chen, Q., Hsu, M.: Inter-Enterprise Collaborative Business Process Management. In: Proc. of 17th Int’l Conf on Data Engineering (ICDE 2001), Germany (2001)Google Scholar
- 5.Dayal, U., Hsu, M., Ladin, R.: A Transaction Model for Long-Running Activities. In: VLDB 1991 (1991)Google Scholar
- 6.Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM, New York (2006)Google Scholar
- 7.DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation And Data Management System. In: VLDB 2008 (2008)Google Scholar
- 8.Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel Support Vector Machines: The Cascade SVM. In: NIPS 2004 (2004)Google Scholar
- 9.Jaedicke, M., Mitschang, B.: User-Defined Table Operators: Enhancing Extensibility of ORDBMS. In: VLDB 1999 (1999)Google Scholar
- 10.Novick, A.: Drilling Down into Performance Problem. Transact-SQL User-Defined Functions, ch. 11, pp. 235–244. Wordware Publishing (2004) ISBN 1-55622Google Scholar