Tractable queries on big data via preprocessing with logarithmic-size output
- 172 Downloads
To provide a dichotomy between those queries that are feasible on big data after appropriate preprocessing and those for which preprocessing does not help, Fan et al. developed the \(\sqcap \)-tractability theory, which provides a formal foundation on the tractability of query classes in the context of big data. Inspired by some technologies used to deal with big data, we introduce a novel notion of \(\sqcap '\)-tractability in this paper. We place a restriction on preprocessing functions, which limits the functions to produce relatively short outputs, at most logarithmic-size of the inputs. We set a complexity class to denote the classes of Boolean queries that are \(\sqcap '\)-tractable and conclude that it is properly contained in that of \(\sqcap \)-tractable query classes, after discovering that a \(\sqcap \)-tractable query class is not \(\sqcap '\)-tractable. With an existing reduction, which does not allow re-factorizing data and query parts, we define complete query classes for the complexity class and give an efficient way to detect such query classes. We also investigate the query classes that can be made \(\sqcap '\)-tractable and prove that all PTIME classes of Boolean queries can be made \(\sqcap '\)-tractable.
KeywordsBig data Complexity class Preprocessing Query Tractability
The authors are very grateful to Professor Wenfei Fan and the anonymous reviewers for their invaluable suggestions. This work was supported by the National Natural Science Foundation of China (Grants Nos. 61370053, 61572003, and 61772035).
- 1.Cao Y, Fan W, Wo T, Yu W (2014) Bounded conjunctive queries. PVLDB 7(12):1231–1242Google Scholar
- 5.Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the ACM 2012 international conference on management of data, pp 157–168Google Scholar
- 6.Fan W, Geerts F, Neven F (2013) Making queries tractable on big data with preprocessing: through the eyes of complexity theory. PVLDB 6(9):685–696Google Scholar
- 7.Fan W, Geerts F, Libkin L (2014) On scale independence for querying big data. In: Proceedings of the ACM 33rd symposium on principles of database systems, pp 51–62Google Scholar
- 8.Fan W, Wang X, Wu Y (2014) Querying big graphs within bounded resources. In: Proceedings of the ACM 2014 international conference on management of data, pp 301–312Google Scholar
- 16.Jung G, Gnanasambandam N, Mukherjee T (2012) Synchronous parallel processing of big-data analytics services to optimize performance in federated clouds. In: IEEE proceedings of the 5th international conference on cloud computing, pp 811–818Google Scholar
- 17.Kang U, Tong H, Sun J, Lin C, Faloutsos C (2011) Gbase: A scalable and general graph management system. In: ACM proceedings of the 17th international conference on knowledge discovery and data mining, pp 1091–1099Google Scholar
- 18.Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems. Manning Publications Co, GreenwichGoogle Scholar
- 21.National Research Council (2013) Frontiers in massive data analysis. The National Academies Press, WashingtonGoogle Scholar
- 22.Papadimitriou CH (2003) Computational complexity. In: Encyclopedia of computer science. Wiley, Chichester, pp 260–265Google Scholar
- 26.Vardi MY (1982) The complexity of relational query languages. In: Proceedings of the 14th Annual ACM Symposium on Theory of Computing, pp 137–146Google Scholar