Advertisement

Warehousing and Mining Massive RFID Data Sets

  • Jiawei Han
  • Hector Gonzalez
  • Xiaolei Li
  • Diego Klabjan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)

Abstract

Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. In the near future, it is expected that every major retailer will use RFID systems to track the movement of products from suppliers to warehouses, store backrooms and eventually to points of sale. The volume of information generated by such systems can be enormous as each individual item (a pallet, a case, or an SKU) will leave a trail of data as it moves through different locations. We propose two data models for the management of this data. The first is a path cube that preserves object transition information while allowing muti-dimensional analysis of path dependent aggregates. The second is a workflow cube that summarizes the major patterns and significant exceptions in the flow of items through the system. The design of our models is based on the following observations: (1) items usually move together in large groups through early stages in the system (e.g., distribution centers) and only in later stages (e.g., stores) do they move in smaller groups, (2) although RFID data is registered at the primitive level, data analysis usually takes place at a higher abstraction level, (3) many items have similar flow patterns and only a relatively small number of them truly deviate from the general trend, and (4) only non-redundant flow deviations with respect to previously recorded deviations are interesting. These observations facilitate the construction of highly compressed RFID data warehouses and the exploration of such data warehouses by scalable data mining. In this study we give a general overview of the principles driving the design of our framework. We believe warehousing and mining RFID data presents an interesting application for advanced data mining.

Keywords

Query Processing Abstraction Level Data Cube High Abstraction Level Frequent Pattern Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow logs. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 469–483. Springer, Heidelberg (1998)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithm for mining association rules in large databases. In Research Report RJ 9839, IBM Almaden Research Center, San Jose, CA (June 1994)Google Scholar
  3. 3.
    Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1999), June 1999, pp. 359–370. Philadelphia, PA (1999)Google Scholar
  4. 4.
    Carrasco, R.C., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–152. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  5. 5.
    Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Record 26, 65–74 (1997)CrossRefGoogle Scholar
  6. 6.
    Gonzalez, H., Han, J., Li, X.: Flowcube: Constructuing RFID flowcubes for multi-dimensional analysis of commodity flows. In: Proc. 2006 Int. Conf. Very Large Data Bases (VLDB 2006), September 2006, Seoul, Korea (2006)Google Scholar
  7. 7.
    Gonzalez, H., Han, J., Li, X., Klabjan, D.: Warehousing and analysis of massive RFID data sets. In: Proc. 2006 Int. Conf. Data Engineering (ICDE 2006), April 2006, Atlanta, Georgia (2006)Google Scholar
  8. 8.
    Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery 1, 29–54 (1997)CrossRefGoogle Scholar
  9. 9.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)Google Scholar
  10. 10.
    Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 1996), June 1996, pp. 205–216. Montreal, Canada (1996)Google Scholar
  11. 11.
    Venture development corporation (vdc). http://www.vdc-corp.com/
  12. 12.
    Jeffery, S.R., Alonso, G., Franklin, M.J.: Adaptive cleaning for RFID data streams. Technical Report UCB/EECS-2006-29, EECS Department, University of California, Berkeley (March 2006)Google Scholar
  13. 13.
    Jeffery, S.R., Alonso, G., Franklin, M.J., Hong, W., Widom, J.: A pipelined framework for online cleaning of sensor data streams. In: Proc. 2006 Int. Conf. Data Engineering (ICDE 2006), April 2006, Atlanta, Georgia (2006)Google Scholar
  14. 14.
    Sarma, S., Brock, D.L., Ashton, K.: The networked physical world. In: White paper, MIT Auto-ID Center (2000), http://archive.epcglobalinc.org/publishedresearch/MIT-AUTOID-WH-001.pdf
  15. 15.
    Sarma, S.E., Weis, S.A., Engels, D.W.: RFID systems, security & privacy implications. In: White paper, MIT Auto-ID Center (2002), http://archive.epcglobalinc.org/publishedresearch/MIT-AUTOID-WH-014.pdf
  16. 16.
    Shukla, A., Deshpande, P.M., Naughton, J.F.: Materialized view selection for multidimensional datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases (VLDB 1998), August 1998, pp. 488–499. New York (1998)Google Scholar
  17. 17.
    Thollard, F., Dupont, P., dela Higuera, C.: Probabilistic DFA inference using kullback-leibler divergence and minimality. In: Probabilistic, D.F.A. (ed.) Proc. 2000 Int. Conf. Machine Learning (ICML 2000), June 2000, pp. 975–982. Stanford, CA (2000)Google Scholar
  18. 18.
    van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16, 1128–1142 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiawei Han
    • 1
  • Hector Gonzalez
    • 1
  • Xiaolei Li
    • 1
  • Diego Klabjan
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations