Towards the Design of a Scalable Email Archiving and Discovery Solution

  • Frank Wagner
  • Kathleen Krebs
  • Cataldo Mega
  • Bernhard Mitschang
  • Norbert Ritter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5207)


In this paper we propose a novel approach to specialize a general purpose Enterprise Content Management (ECM) System into an Email Archiving and Discovery (EAD) System. The magnitude and range of compliance risks associated with the management of EAD is driving investment in the development of more effective and efficient approaches to support regulatory compliance, legal discovery and content life-cycle needs. Companies must recognize and address requirements like legal compliance, electronic discovery, and document retention management. What is needed today are EAD systems capable to process very high message ingest rates, support distributed full text indexing, and allow forensic search such to support litigation cases. All this must be provided at lowest cost with respect to archive management and administration. In our approach we introduce a virtualized ECM repository interface where the key content repository components are wrapped into a set of tightly coupled Grid service entities, such to achieve scale-out on a cluster of commodity blade hardware that is automatically configured and dynamically provisioned. By doing so we believe, we can leverage the strength of Relational Database Management Systems and Full Text Indexes in a managed clustered environment with minimal operational overhead.


Enterprise Content Management Email Archiving and Discovery Legal Compliance Virtualization Scale-out Engineering for Scalability 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bace, J., Logan, D.: The Costs and Risks of E-discovery in Litigation. Gartner (December 1, 2005)Google Scholar
  2. 2.
    Base One: Database Scalability - Dispelling myths about the limits of database-centric architecture (Retrieved 2008-02-20),
  3. 3.
    Barlas, D., Vahidy, T.: The Email Management Crisis. White paper, Iron Mountain Inc. (January 24, 2006)Google Scholar
  4. 4.
    Brewer, E.: Combining systems and databases: A search engine retrospective. In: Stonebraker, M., Hellerstein, J. (eds.) Readings in Database Systems, 4th edn. MIT Press, Cambridge (2004)Google Scholar
  5. 5.
    Chaudhuri, S., Dayal, U., Yan, T.W.: Join queries with external text sources: Execution and optimization techniques. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, pp. 410–422 (1995)Google Scholar
  6. 6.
    Chen, K.: IBM DB2 content manager v8 implementation on DB2 universal database: A primer. Technical report, IBM (2003)Google Scholar
  7. 7.
    Churchill, B., Clark, L., Rosenoer, J., von Bulow, F.: The impact of electronically stored. information on corporate legal and compliance management: An IBM point of view. White paper, IBM Corporation (October 2006)Google Scholar
  8. 8.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)Google Scholar
  9. 9.
    DiCenzo, C., Chin, K.: Magic Quadrant for E-Mail Active Archiving. Gartner (2007)Google Scholar
  10. 10.
    Ghodsi: Distributed k-ary System: Algorithms for Distributed Hash Tables. Doctoral thesis, KTH - Royal Institute of Technology (2006)Google Scholar
  11. 11.
    Hausheer, D., Stiller, B.: Design of a distributed P2P-based content management middleware. In: Proceedings of the 29th Euromicro Conference, pp. 173–180 (2003)Google Scholar
  12. 12.
    Manoel, E., Horkan, P., Parziale, L.: Dynamic Provisioning of SAP Environments using IBM Dynamic Infrastructure for MySAP and IBM Tivoli Provisioning Manager. IBM Redbooks Paper (October 2005)Google Scholar
  13. 13.
    Maymounkov, P., Mazières, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Mega, C., Wagner, F., Mitschang, B.: From Content Management to Enterprise Content Management. In: Datenbanksysteme in Business, Technologie und Web (BTW) (2005)Google Scholar
  15. 15.
    Michael, M., Moreira, J.E., Shiloach, D., Wisniewski, R.W.: Scale-up x Scale-out: A Case Study using Nutch/Lucene. In: Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium (March 2007)Google Scholar
  16. 16.
    Moreira, J.E., Michael, M.M., Da Silva, D., Shiloach, D., Dube, P.: Scalability of the Nutch search engine. In: Proceedings of the 21st annual international conference on Supercomputing (ICS 2007) (June 2007)Google Scholar
  17. 17.
    Plotkin, J.: E-mail discovery in civil litigation: Worst case scenario vs. best practices. White paper by KVault Software Plc., (April 2004)Google Scholar
  18. 18.
    The Radicati Group: Taming the Growth of Email – An ROI Analysis. White Paper by The Radicati Group, Inc. (2005)Google Scholar
  19. 19.
    The Radicati Group: An Overview of the Archiving Market and Jatheon Technologies by The Radicati Group, Inc. (September 2006)Google Scholar
  20. 20.
    Thickins, G.: Compliance: Do no evil – critical implications and opportunities for storage. Byte and Switch Insider 2(5) (2004)Google Scholar
  21. 21.
    U. S. Department of the Interior: It’s in the mail: Common questions about electronic mail and official records (2006) Google Scholar
  22. 22.
    Werelius, T.: Trends in Email Archiving., Computer World Storage Networking World Online (August 21, 2006)Google Scholar
  23. 23.
    Yu, H., Moreira, J.E., Dube, P., I-hsin, C., Zhang, L.: Performance Studies of a WebSphere Application, Trade, in Scale-out and Scale-up Environments. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pp. 1–8 (2007)Google Scholar
  24. 24.
    Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications 22 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Frank Wagner
    • 1
  • Kathleen Krebs
    • 2
  • Cataldo Mega
    • 3
  • Bernhard Mitschang
    • 1
  • Norbert Ritter
    • 2
  1. 1.University of Stuttgart, IPVS 
  2. 2.University of Hamburg, VSIS 
  3. 3.IBM Deutschland Entwicklung GmbH 

Personalised recommendations