Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Active, Real-Time, and Intellective Data Warehousing

  • Mukesh MohaniaEmail author
  • Ullas Nambiar
  • Hoang Tam Vo
  • Michael Schrefl
  • Millist Vincent
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_8-3

Synonyms

Definition

Active data warehousing is the technical ability to capture transactions when they change and integrate them into the warehouse, along with maintaining batch or scheduled cycle refreshes. An active data warehouse offers the possibility of automating routine tasks and decisions. The active data warehouse exports decisions automatically to the online transaction processing (OLTP) systems.

Real-time data warehousing describes a system that reflects the state of the source systems in real time. If a query is run against the real-time data warehouse to understand a particular facet about the business or entity described by the warehouse, the answer reflects the fully up-to-date state of that entity. Most data warehouses have data that are highly latent and thus reflect the business at a point in the past. In contrast, a real-time data warehouse has low latency data and provides current (or real-time) data.

Simply put, a real-time data warehouse can be built using an active data warehouse with a very low latency constraint added to it. An alternate view is to consider active data warehousing as being a design methodology suited to tactical decision-making based on very current data, while (near) real-time data warehousing is a collection of technologies that refresh a data warehouse frequently. In particular, a (near) real-time data warehouse is one that acquires, cleanses, transforms, stores, and disseminates information in minutes or seconds after the data has arrived at the source systems. An active data warehouse, on the contrary, operates in a non-real-time response mode with one or more OLTP systems.

Intellective data warehousing represents the next generation of systems that incorporate three key cognitive aspects, namely, understanding, contextual awareness, and continuous learning to provide complete data-to-knowledge pipeline without notable human intervention [20]. Intellective data warehouse systems leverage recent advances in machine learning and information retrieval to simplify the ingestion and characterization of data and to collect and manage the critical metadata needed for integration, analytical query processing, result summarization, provenance, explanation, and facet analysis.

Historical Background

A data warehouse is a decision support database that is periodically updated by extracting, transforming, and loading operational data from several OLTP databases. In the data warehouse, OLTP data is arranged using the (multi)dimensional data modeling approach (see [1] for a basic approach and [2] for details on translating an OLTP data model into a dimensional model), which classifies data into measures and dimensions. Several multidimensional data models have been proposed [3, 4, 5, 6], whereas an in-depth comparison is provided by Pedersen and Jensen in [5]. The basic unit of interest in a data warehouse is a measure or fact (e.g., sales), which represents countable, semi-summable, or summable information concerning a business process. An instance of a measure is called measure value. A measure can be analyzed from different perspectives, which are called the dimensions (e.g., location, product, time) of the data warehouse [7]. A dimension consists of a set of dimension levels (e.g., time: Day, Week, Month, Quarter, Season, Year, ALLTimes), which are organized in multiple hierarchies or dimension paths [6] (e.g., Time[Day] → Time[Month] → Time[Quarter] → Time[Year] → Time[ALLTimes]; Time[Day] → Time[Week] → Time[Season] → Time[ALLTimes]). The hierarchies of a dimension form a lattice having at least one top dimension level and one bottom dimension level. The measures that can be analyzed by the same set of dimensions are described by a base cube or fact table. A base cube uses level instances of the lowest dimension levels of each of its dimensions to identify a measure value. The relationship between a set of measure values and the set of identifying level instances is called cell. Loading data into the data warehouse means that new cells will be added to base cubes and new level instances will be added to dimension levels. If a dimension D is related to a measure m by means of a base cube, then the hierarchies of D can be used to aggregate the measure values of m using operators like SUM, COUNT, or AVG. Aggregating measure values along the hierarchies of different dimensions (i.e., rollup) creates a multidimensional view on data, which is known as data cube or cube. Deaggregating the measures of a cube to a lower dimension level (i.e., drilldown) creates a more detailed cube. Selecting the subset of a cube’s cells that satisfy a certain selection condition (i.e., slicing) also creates a more detailed cube.

The data warehouses are used by analysts to find solutions for decision tasks by using OLAP (online analytical processing) systems [7]. The decision tasks can be split into three, viz., nonroutine, semi-routine, and routine. Nonroutine tasks occur infrequently and/or do not have a generally accepted decision criteria. For example, strategic business decisions such as introducing a new brand or changing an existing business policy are nonroutine tasks. Routine tasks, on the other hand, are well-structured problems for which generally accepted procedures exist, and they occur frequently and at predictive intervals. Examples can be found in the areas of product assortment (change price, withdraw product, etc.), in customer relationship management (grant loyalty discounts, etc.), and in many administrative areas (accept/reject paper based on review scores). Semi-routine tasks are tasks that require a nonroutine solution, e.g., a paper rated contradictorily must be discussed by program committee. Since most tasks are likely to be routine, it is logical to automate processing of such tasks to reduce the delay in decision-making.

Active data warehouses [8] were designed to enable data warehouses to support automatic decision-making when faced with routine decision tasks and routinizable elements of semi-routine decision tasks. The active data warehouse design extends the technology behind active database systems. Active database technology transforms passive database systems into reactive systems that respond to database and external events through the use of rule processing features [9, 10]. Limited versions of active rules exist in commercial database products [11, 12].

Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available. Traditionally, data warehouses were regarded as an environment for analyzing historical data, either to understand what has happened or simply to log the changes as they happened. However, of late, businesses want to use them to predict the future, e.g., to predict customers likely to churn, and thereby seek better control of the business. Nevertheless, until recently, it was not practical to have zero-latency data warehouses – the process of extracting data had too much of an impact on the source systems concerned, and the various steps needed to cleanse and transform the data required multiple temporary tables and took several hours to run. However, the increased visibility of (the value of) warehouse data, and the take-up by a wider audience within the organization, has led to several product developments by IBM [13], Oracle [14], and other vendors that make real-time data warehousing now possible.

Right-time data warehousing is a more sophisticated approach that makes new data quickly available for data warehouses while retaining the insert speeds of bulk loading [24]. The essence of this approach is using a main-memory-based catalyst to provide intermediate storage (“memory tables”) for data warehouse tables, which are eventually loaded to its final target (the physical data warehouse tables) at the right time. This approach allows for fast insertions of new data from the source systems. A policy can define when to materialize the new data in the main-memory catalyst into the data warehouse; however, the data can be queried on demand while being held by memory tables as well for end users to immediately access changes in the data sources.

Foundations

Enabling Active Data Warehousing

The two example scenarios below describe typical situations in which active rules can be used to automate decision-making:

Scenario 1: Reducing the price of an article. Twenty days after a soft drink has been launched on a market, analysts compare the quantities sold during this period with a standardized indicator. This indicator requires the total quantities sold during the 20-day period do not drop below a threshold of 10,000 sold items. If the analyzed sales figures are below this threshold, the price of the newly launched soft drink will be reduced by 15%.

Scenario 2: Withdrawing articles from a market. At the end of every quarter, high-priced soft drinks which are sold in Upper Austrian stores will be analyzed. If the sales figures of a high-priced soft drink have continuously dropped, the article will be withdrawn from the Upper Austrian market. Analysts inspect sales figures at different granularities of the time dimension and at different granularities of the location dimension. Trend, average, and variance measures are used as indicators in decision-making.

Rules that mimic the analytical work of a business analyst are called analysis rules [8]. The components of analysis rules constitute the knowledge model of an active data warehouse (and also a real-time data warehouse). The knowledge model determines what an analyst must consider when he specifies an active rule to automate a routine decision task.

An analysis rule consists of (i) the primary dimension level and (ii) the primary condition, which identify the objects for which decision-making is necessary; (iii) the event, which triggers rule processing; (iv) the analysis graph, which specifies the cubes for analysis; (v) the decision steps, which represent the conditions under which a decision can be made; and (vi) the action, which represents the rule’s decision task. Below is a brief description of the components of an analysis rule. A detailed discussion is given in [8].

Event: Events are used to specify the time points at which analysis rules should be carried out. Active data warehouses provide three kinds of events: (i) OLTP method events, (ii) relative temporal events, and (iii) calendar events. OLTP method events describe basic events in the data warehouse’s sources. Relative temporal events are used to define a temporal distance between such a basic event and carrying out an analysis rule. Calendar events represent fixed points in time at which an analysis rule may be carried out. Structurally, every event instance is characterized by an occurrence time and by an event identifier. In its event part, an analysis rule refers to a calendar event or to a relative temporal event.

An OLTP method event describes an event in the data warehouse’s source systems that is of interest to analysis rules in the active data warehouse. Besides occurrence time and event identifier, the attributes of an OLTP method event are a reference to the dimension level for which the OLTP method event occurred and the parameters of the method invocation. To make OLTP method events available in data warehouses, a data warehouse designer has to define the schema of OLTP method events and extend the data warehouse’s extract/transform/load mechanism. Since instances of OLTP method events are loaded some time after their occurrence, analysis rules cannot be triggered directly by OLTP method events.

Temporal events determine the time points at which decision-making has to be initiated. Scenario 1 uses the relative temporal event “20 days after launch” while scenario 2 uses the periodic temporal event “end of quarter.” The conditions for decision-making are based on indicators, which have been established in manual decision-making. Each condition refers to a multidimensional cube, and therefore “analyzing” means to evaluate the condition on this cube. Scenario 1 uses a quantity-based indicator, whereas scenario 2 uses value-based indicators for decision-making. The decision whether to carry out the rule’s action depends on the result of evaluating the conditions. The action of scenario 1 is to reduce the price of an article, whereas the action of scenario 2 is to withdraw an article from a market.

Primary condition: Several analysis rules may share the same OLTP method as their action. These rules may be carried out at different time points and may utilize different multidimensional analyses. Thus, a certain analysis rule usually analyzes only a subset of the level instances that belong to the rule’s primary dimension level. The primary condition is used to determine for a level instance of the primary dimension level whether multidimensional analysis should be carried out by the analysis rule. The primary condition is specified as a Boolean expression, which refers to the describing attributes of the primary dimension level. If omitted, the primary condition evaluates to TRUE.

Action: The purpose of an analysis rule is to automate decision-making for objects that are available in OLTP systems and in the data warehouse. A decision means to invoke (or not to invoke) a method on a certain object in an OLTP system. In its action part, an analysis rule may refer to a single OLTP method of the primary dimension level, which represents a transaction in an OLTP system. These methods represent the decision space of an active data warehouse. To make the transactional behavior of an OLTP object type available in the active data warehouse, the data warehouse designer must provide (i) the specifications of the OLTP object type’s methods together with required parameters, (ii) the preconditions that must be satisfied before the OLTP method can be invoked in the OLTP system, and (iii) a conflict resolution mechanism, which solves contradictory decisions of different analysis rules. Since different analysis rules can make a decision for the same level instance of the rules’ primary dimension level during the same active data warehouse cycle, a decision conflict may occur. Such conflicts are considered as inter-rule conflicts. To detect inter-rule conflicts, a conflict table covering the OLTP methods of the decision space is used. The tuples of the conflict table have the form <m1, m2, m3>, where m1 and m2 identify two conflicting methods and m3 specifies the conflict resolution method that will be finally executed in OLTP systems. If a conflict cannot be solved automatically, it has to be reported to analysts for manual conflict resolution.

Analysis graph: When an analyst queries the data warehouse to make a decision, he or she follows an incremental top-down approach in creating and analyzing cubes. Analysis rules follow the same approach. To automate decision-making, an analysis rule must “know” the cubes that are needed for multidimensional analysis. These cubes constitute the analysis graph, which is specified once by the analyst. The n dimensions of each cube of the analysis graph are classified into one primary dimension, which represents the level instances of the primary dimension level, and n − 1 analysis dimensions, which represent the multidimensional space for analysis. Since a level instance of the primary dimension level is described by one or more cells of a cube, multidimensional analysis means to compare, aggregate, transform, etc., the measured values of these cells. Two kinds of multidimensional analysis are carried out at each cube of the analysis graph: (i) select the level instances of the primary dimension level whose cells comply with the decision-making condition (e.g., withdraw an article if the sales total of the last quarter is below USD 10,000) and (ii) select the level instances of the primary dimension level whose cells comply with the condition under which more detailed analyses (at finer grained cubes) are necessary (e.g., continue analysis if the sales total of the last quarter is below USD 500,000). The multidimensional analysis that is carried out on the cubes of the analysis graph are called decision steps. Each decision step analyzes the data of exactly one cube of the analysis graph. Hence, analysis graph and decision steps represent the knowledge for multidimensional analysis and decision-making of an analysis rule.

Enabling Real-Time Data Warehousing

As mentioned earlier, real-time data warehouses are active data warehouses that are loaded with data having (near) zero latency. Data warehouse vendors have used multiple approaches such as hand-coded scripting and data extraction, transformation, and loading (ETL) [15] solutions to serve the data acquisition needs of a data warehouse. However, as users move toward real-time data warehousing, there is a limited choice of technologies that facilitate real-time data delivery. The challenge is to determine the right technology approach or combination of solutions that best meets the data delivery needs. Selection criteria should include considerations for frequency of data, acceptable latency, data volumes, data integrity, transformation requirements, and processing overhead. To solve the real-time challenge, businesses are turning to technologies such as enterprise application integration (EAI) [16] and transactional data management (TDM) [17], which offer high-performance, low impact movement of data, even at large volumes with sub-second speed. EAI has a greater implementation complexity and cost of maintenance and handles smaller volumes of data. TDM provides the ability to capture transactions from OLTP systems and to apply mapping, filtering, and basic transformations and delivers to the data warehouse directly. A more detailed study of the challenges and possible solutions involved in implementing a real-time data warehouse is given in [18], while best practices for real-time data warehousing have recently been described in [14].

Near real-time ETL. There exist applications that do not have high demand for real-time data. In this case, true real-time data warehousing is not strictly required, and it is sufficient to simply increase the frequency of data loading, e.g., from daily to twice a day.

Direct trickle feed. This approach enables true real-time data by continuously moving new data from source systems and updating the fact tables in the data warehouse. However, constant updates on fact tables also lead to degrading performance of the data warehouse due to the contention with other queries by reporting and OLAP tools accessing these fact tables simultaneously.

Trickle and flip. This approach reduces the impact of trickle feed on query performance of the data warehouse by storing updates to the fact tables in staging tables of the same format. These staging tables are then duplicated and swapped with the fact tables on a periodic basis, ranging from hourly to minutes for bringing the data warehouse up to date.

External real-time data cache. All the approaches discussed above require the data warehouse to take additional load handling the incoming real-time data. A real-time data cache external to the data warehouse can be used instead to load real-time data from source systems, hence resolving the query performance and scalability problem by routing any query accessing real-time data to the cache. This cache can be implemented as a main-memory catalyst as proposed in [24].

Log shipping. This technique is originally designed for database replication and recovery purpose. Updates to the primary database are tracked in a transaction log file which are periodically transferred to a secondary database for restoring at this replica. As this technique is efficient in detecting changes to data (i.e., inspecting the log rather than scanning the entire database), it is usually used in the above real-time data warehousing approaches for moving changed data from source systems to or near the data warehouse.

Data stream. The moving of changed data from source systems to the data warehouse can be considered as real-time data event streams. Therefore, stream analysis and complex event processing techniques can be used for analyzing data in real-time, thus eliminating the reliance on batched or offline updating of the data warehouse [25].

Integrating OLTP and OLAP. In the recent years, there has been a growing trend in the integrating both OLTP and OLAP into a single system. For example, a hybrid cloud storage for supporting both transactional and analytical workloads was proposed in [26], whereas main-memory database technologies such as Hyper [27] and SAP HANA [28] have also evolved to serve both OLTP and OLAP within the same database engine. In these systems, OLTP operations access the latest version of the data, whereas the OLAP data analysis tasks execute on a recent consistent snapshot of the database.

Key Applications

Active and real-time data warehouses enable businesses across all industry verticals to gain competitive advantage by allowing them to run analytics solutions over the most recent data of interest that is captured in the warehouse. This will provide them with the ability to make intelligent business decisions and better understand and predict customer and business trends based on accurate, up-to-the-second data. By introducing real-time flows of information to data warehouses, companies can increase supply chain visibility, gain a complete view of business performance, and increase service levels, ultimately increasing customer retention and brand value.

The following are some additional business benefits of active and real-time data warehousing:
  • Real-time analytics: Real-time analytics is the ability to use all available data to improve performance and quality of service at the moment they are required. It consists of dynamic analysis and reporting, right at the moment (or very soon after) the resource (or information) entered the system. In a practical sense, real time is defined by the need of the consumer (business) and can vary from a few seconds to few minutes. In other words, more frequent than daily can be considered real time, because it crosses the overnight-update barrier. With increasing availability of active and real-time data warehouses, the technology for capturing and analyzing real-time data is increasingly becoming available. Learning how to apply it effectively becomes the differentiator. Early detection of fraudulent activity in financial transactions is a potential environment for applying real-time analytics. For example, credit card companies monitor transactions and activate counter measures when a customer’s credit transactions fall outside the range of expected patterns. Nevertheless, being able to correctly identify fraud while not offending a well-intentioned valuable customer is a critical necessity that adds complexity to the potential solution.

  • Maximize ERP investments: With a real-time data warehouse in place, companies can maximize their enterprise resource planning (ERP) technology investment by turning integrated data into business intelligence. ETL solutions act as an integral bridge between ERP systems that collect high volumes of transactions and business analytics to create data reports.

  • Increase supply chain visibility: Real-time data warehousing helps streamline supply chains through highly effective business-to-business communications and identifies any weak links or bottlenecks, enabling companies to enhance service levels and gain a competitive edge.

  • Live 360° view of customers: The active database solutions enable companies to capture, transform, and flow all types of customer data into a data warehouse, creating one seamless database that provides a 360° view of the customer. By tracking and analyzing all modes of interaction with a customer, companies can tailor new product offerings, enhance service levels, and ensure customer loyalty and retention.

Future Directions

Data warehousing has greatly matured as a technology discipline; however, enterprises that undertake data warehousing initiatives continue to face fresh challenges that evolve with the changing business and technology environment. Most future needs and challenges will come in the areas of active and real-time data warehousing solutions. Listed below are some future challenges:
  • Integration of OLTP and OLAP systems into a single main-memory database system: Main-memory technologies such as SAP HANA and other in-memory products from Oracle, IBM, Microsoft, Teradata, and Pivotal have started to gain maturity and major adoptions in industry. The capability to work in both analytical and transactional workload environments make these systems highly relevant in scenarios in which real-time in-memory data marts are desirable [29]. These high-speed data marts are complementing existing large-scale data warehouses in today’s enterprises and providing near real-time analytics and new business insights.

  • Real-time and on-demand integration with heterogeneous data sources: The number of enterprise data sources is growing rapidly, with new types of sources emerging every year. Enterprises want to integrate the unstructured data generated from customer emails, chat and voice call transcripts, feedbacks, and surveys with other internal data in order to get a complete picture of their customers and integrate internal processes. Other sources for valuable data include ERP programs, operational data stores, packaged and homegrown analytic applications, and existing data marts. The process of real-time and on-demand integration of these sources into a data warehouse can be complicated and is made even more difficult when an enterprise merges with or acquires another enterprise.

  • Integrating with CRM tools: Customer relationship management (CRM) is one of the most popular business initiatives in enterprises today. CRM helps enterprises attract new customers and develop loyalty among existing customers with the end result of increasing sales and improving profitability. Increasingly, enterprises want to use the holistic view of the customer to deliver value-added services to the customer based on her overall value to the enterprise. This would include automatically identifying when an important life event is happening and sending out emails with necessary information and/or relevant products, gauging the mood of the customer based on recent interactions, and alerting the enterprise before it is too late to retain the customer and most important of all identifying customers who are likely to accept suggestions about upgrades of existing products/services or be interested in newer versions. The data warehouse is essential in this integration process, as it collects data from all channels and customer touch points and presents a unified view of the customer to sales, marketing, and customer-care employees. Going forward, data warehouses will have to provide support for analytics tools that are embedded into the warehouse, analyze the various customer interactions continuously, and then use the insights to trigger actions that enable delivery of the abovementioned value-added services. Clearly, this requires an active data warehouse to be tightly integrated with the CRM systems. If the enterprise has low latency for insight detection and value-added service delivery, then a real-time data warehouse would be required.

  • Built-in data mining and analytics tools: Users are also demanding more sophisticated business intelligence tools. For example, if a telecom customer calls to cancel his call-waiting feature, real-time analytic software can detect this and trigger a special offer of a lower price in order to retain the customer. The need is to develop a new generation of data mining algorithms that work over data warehouses that integrate heterogeneous data and have self-learning features. These new algorithms must automate data mining and make it more accessible to mainstream data warehouse users by providing explanations with results, indicating when results are not reliable and automatically adapting to changes in underlying predictive models.

  • Intellective data warehousing: Over the last several decades, data warehousing technologies have played an important role in assisting business decision-makers to derive valuable insights from various data sources. Nevertheless, faced with an ever-increasing quantity and diversity of data in their enterprises, from internal and external sources, today’s CDOs (chief data officers) are looking for next-generation systems that can provide complete data-to-knowledge pipeline without notable human intervention at each step including data acquisition; selection, cleaning, and transformation; extraction and integration; mining, OLAP, and analytics; and result summarization, provenance, and explanation [21]. Furthermore, in contrast to traditional data warehouse systems which are mainly query-based and rely much on users thinking hard to find smart questions for querying relevant information from the system, the next-generation intellective data warehouse systems are changing the nature of data discovery for their capability to understand the content of data and find related information with respect to interesting facets or potentially useful associations [22]. In addition, these systems can present both the known information and facet analysis to the users in novel ways such as augmented reality visualization.

The need of an intellective data warehouse is further illustrated in the following example. A near up-to-date quantity of a product available across multiple stores can be queried from a centralized data warehouse where data from the stores’ databases are periodically pushed into. Now, given a new order from a customer, the system may improve customer satisfaction if it is able to understand every line order and provide real-time discount for specific products that are still available in large quantity in the stores. Another exemplary need of cognition in this application is automatic fulfillment of customer orders using data mining and association. Consider a scenario where the customer’s order cannot be fulfilled by the local store (i.e., the query result for local inventory is null). In this case, instead of rejecting the order immediately, which creates customer dissatisfaction, it would be desirable if the system is able to perform reasoning and find other stores that can post the purchased items to the customer’s address within the requested time frame. In addition, the ETL process in a data warehouse can also benefit from the capability of an intellective system that can understand input data and reason the relevance and quality of the incoming data for automated data curation.

Enabling intellective data warehousing: To date, the database, information retrieval, machine learning, and artificial intelligence communities have been working in silos when creating intelligent information systems. There is a tremendous opportunity to marry recent advances in these areas and bring cognitive capability including understanding, contextual awareness, and continuous learning into data warehouse systems in order to make them intelligent and powerful. These systems recognize semantic relationships between data items and continuously build and update its knowledge base for answering user’s query with rich information beyond the known data. They evaluate multiple features on the fly to provide facet analysis and useful contextual-associated information pertaining to the user query’s results. Recently, various search engines [23] have started to support similar capability, but limited to keyword-based entity-seeking queries.

Cross-References

Recommended Reading

  1. 1.
    Kimball R, Strethlo K. Why decision support fails and how to fit it. ACM SIGMOD Rec. 1995;24(3):91–7.CrossRefGoogle Scholar
  2. 2.
    Golfarelli M, Maio D, Rizzi S. Conceptual design of data warehouses from E/R schemes. In: Proceedings of the 31st Annual Hawaii International Conference on System Sciences, Vol. VII; 1998. p. 334–43.Google Scholar
  3. 3.
    Lehner W. Modeling large scale OLAP scenarios. In: Advances in database technology. In: Proceedings of the 6th International Conference on Extending Database Technology; 1998. p. 153–67.Google Scholar
  4. 4.
    Samtani S, Mohania M, Kumar V, Kambayashi Y. Recent advances and research problems in data warehousing. In: ER ‘98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies; 1998. p. 81–92.Google Scholar
  5. 5.
    Pedersen TB, Jensen CS. Multidimensional data modeling for complex data. In: Proceedings of the 15th International Conference on Data Engineering; 1999. p. 336–45.Google Scholar
  6. 6.
    Vassiliadis P. Modeling multidimensional databases, cubes and cube operations. In: Proceedings of the 10th International Conference on Scientific and Statistical Database Management; 1998. p. 53–62.Google Scholar
  7. 7.
    Mohania M, Samtani S, Roddick J, Kambayashi Y. Advances and research directions in data-warehousing technology. Australas J Inf Syst. 1999;7:1.Google Scholar
  8. 8.
    Thalhammer T, Schrefl M, Mohania M. Active data warehouses: complementing OLAP with analysis rules. Data Knowl Eng. 2001;39(3):241–69.CrossRefzbMATHGoogle Scholar
  9. 9.
    ACT-NET Consortium. The active database management system manifesto: a rulebase of ADBMS features. ACM SIGMOD Rec. 1996;25(3).Google Scholar
  10. 10.
    Simon E, Dittrich A. Promises and realities of active database systems. In: Proceedings of the 21th International Conference on Very Large Data Bases; 1995. p. 642–53.Google Scholar
  11. 11.
    Brobst S. Active data warehousing: a new breed of decision support. In: Proceedings of the 13th International Workshop on Data and Expert System Applications; 2002. p. 769–72.Google Scholar
  12. 12.
    Borbst S, Rarey J. The five stages of an active data warehouse evolution. Teradata Mag. 2001;3:38–44.Google Scholar
  13. 13.
  14. 14.
    Best practices for Real-time Data Warehousing. An oracle white paper. 2014. http://www.oracle.com/us/products/middleware/data-integration/realtime-data-warehousing-bp-2167237.pdf
  15. 15.
    Kimball R, Caserta J. The data warehouse ETL toolkit: practical techniques for extracting, cleaning, conforming, and delivering data. Wiley; 2004.Google Scholar
  16. 16.
    Linthicum RS. Enterprise application integration. Addison-Wesley; 1999.Google Scholar
  17. 17.
    Improving SOA with Goldengate TDM Technology. GoldenGate White Paper; 2007.Google Scholar
  18. 18.
    Langseth J. Real-time data warehousing: challenges and solutions. DSSResources.COM; 2004.Google Scholar
  19. 19.
    Paton NW, Diaz O. Active database systems. ACM Comput Surv. 31, 1999;1.Google Scholar
  20. 20.
    High R. The era of cognitive systems: an inside look at IBM Watson and how it works. IBM Corporation Redbooks; 2012.Google Scholar
  21. 21.
    Abadi D, Agrawal R, Ailamaki A, Balazinska M, Bernstein P, Carey M, Chaudhuri S, Dean J, Doan A, Franklin M, Gehrke J, Haas L, Halevy A, Hellerstein J, Ioannidis Y, Jagadish H, Kossmann D, Madden S, Mehrotra S, Milo T, Naughton J, Ramakrishnan R, Markl V, Olston C, Ooi BC, Re C, Suciu D, Stonebraker M, Walter T, Widom J. The Beckman report on database research. Commun ACM. 2016;59(2):92–9.CrossRefGoogle Scholar
  22. 22.
    Jonas J, Sokol L. Data finds data, Chapter 7. In: Segaran T, Hammerbacher J, editors. Beautiful data: the stories behind elegant data solutions. O’Reilly Media; 2009.Google Scholar
  23. 23.
    Chirigati F, Liu J, Korn F, Wu YW, Yu C, Zhang H. Knowledge exploration using tables on the web. PVLDB. 2016;10(3):193–204.Google Scholar
  24. 24.
    Thomsen C, Pedersen TB, Lehner W. RiTE: providing on-demand data for right-time data warehousing. In: Proceedings of the of the IEEE 24th International Conference on Data Engineering (ICDE); 2008. p. 456–65.Google Scholar
  25. 25.
    Agrawal D. The reality of real-time business intelligence. In: Castellanos M, Dayal U, Sellis T, editors. Proceedings of the 2nd International Workshop on Business Intelligence for the Real-Time Enterprise (BIRTE 2008). Springer, LNBIP; 2009. 27, 75–88.Google Scholar
  26. 26.
    Cao Y., Chen C., Guo F., Jiang D., Lin Y., Ooi B. C., Vo H. T., Wu S., Xu Q. ES2: a cloud data storage system for supporting both OLTP and OLAP. In: ICDE; 2011. p. 291–302.Google Scholar
  27. 27.
    Kemper A, Neumann T. HyPer: a hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: ICDE; 2011. p. 195–206.Google Scholar
  28. 28.
    Sikka V, Färber F, Lehner W, Cha SK, Peh T, Bornhövd C. Efficient transaction processing in SAP HANA database: the end of a column store myth. In: SIGMOD; 2012. p. 731–42.Google Scholar
  29. 29.

Copyright information

© Springer Science+Business Media LLC 2018

Authors and Affiliations

  • Mukesh Mohania
    • 1
    Email author
  • Ullas Nambiar
    • 2
  • Hoang Tam Vo
    • 1
  • Michael Schrefl
    • 3
  • Millist Vincent
    • 4
  1. 1.IBM ResearchMelbourneAustralia
  2. 2.Zensar Technologies LtdPuneIndia
  3. 3.University of LinzLinzAustria
  4. 4.University of South AustraliaAdelaideAustralia

Section editors and affiliations

  • Torben Bach Pedersen
    • 1
  • Stefano Rizzi
    • 2
  1. 1.Department of Computer ScienceAalborg UniversityAalborgDenmark
  2. 2.DISI – University of BolognaBolognaItaly