Skip to main content

Towards the Design of a Scalable Email Archiving and Discovery Solution

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5207))

Abstract

In this paper we propose a novel approach to specialize a general purpose Enterprise Content Management (ECM) System into an Email Archiving and Discovery (EAD) System. The magnitude and range of compliance risks associated with the management of EAD is driving investment in the development of more effective and efficient approaches to support regulatory compliance, legal discovery and content life-cycle needs. Companies must recognize and address requirements like legal compliance, electronic discovery, and document retention management. What is needed today are EAD systems capable to process very high message ingest rates, support distributed full text indexing, and allow forensic search such to support litigation cases. All this must be provided at lowest cost with respect to archive management and administration. In our approach we introduce a virtualized ECM repository interface where the key content repository components are wrapped into a set of tightly coupled Grid service entities, such to achieve scale-out on a cluster of commodity blade hardware that is automatically configured and dynamically provisioned. By doing so we believe, we can leverage the strength of Relational Database Management Systems and Full Text Indexes in a managed clustered environment with minimal operational overhead.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bace, J., Logan, D.: The Costs and Risks of E-discovery in Litigation. Gartner (December 1, 2005)

    Google Scholar 

  2. Base One: Database Scalability - Dispelling myths about the limits of database-centric architecture (Retrieved 2008-02-20), http://www.boic.com/scalability.htm

  3. Barlas, D., Vahidy, T.: The Email Management Crisis. White paper, Iron Mountain Inc. (January 24, 2006)

    Google Scholar 

  4. Brewer, E.: Combining systems and databases: A search engine retrospective. In: Stonebraker, M., Hellerstein, J. (eds.) Readings in Database Systems, 4th edn. MIT Press, Cambridge (2004)

    Google Scholar 

  5. Chaudhuri, S., Dayal, U., Yan, T.W.: Join queries with external text sources: Execution and optimization techniques. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, pp. 410–422 (1995)

    Google Scholar 

  6. Chen, K.: IBM DB2 content manager v8 implementation on DB2 universal database: A primer. Technical report, IBM (2003)

    Google Scholar 

  7. Churchill, B., Clark, L., Rosenoer, J., von Bulow, F.: The impact of electronically stored. information on corporate legal and compliance management: An IBM point of view. White paper, IBM Corporation (October 2006)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)

    Google Scholar 

  9. DiCenzo, C., Chin, K.: Magic Quadrant for E-Mail Active Archiving. Gartner (2007)

    Google Scholar 

  10. Ghodsi: Distributed k-ary System: Algorithms for Distributed Hash Tables. Doctoral thesis, KTH - Royal Institute of Technology (2006)

    Google Scholar 

  11. Hausheer, D., Stiller, B.: Design of a distributed P2P-based content management middleware. In: Proceedings of the 29th Euromicro Conference, pp. 173–180 (2003)

    Google Scholar 

  12. Manoel, E., Horkan, P., Parziale, L.: Dynamic Provisioning of SAP Environments using IBM Dynamic Infrastructure for MySAP and IBM Tivoli Provisioning Manager. IBM Redbooks Paper (October 2005)

    Google Scholar 

  13. Maymounkov, P., Mazières, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Mega, C., Wagner, F., Mitschang, B.: From Content Management to Enterprise Content Management. In: Datenbanksysteme in Business, Technologie und Web (BTW) (2005)

    Google Scholar 

  15. Michael, M., Moreira, J.E., Shiloach, D., Wisniewski, R.W.: Scale-up x Scale-out: A Case Study using Nutch/Lucene. In: Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium (March 2007)

    Google Scholar 

  16. Moreira, J.E., Michael, M.M., Da Silva, D., Shiloach, D., Dube, P.: Scalability of the Nutch search engine. In: Proceedings of the 21st annual international conference on Supercomputing (ICS 2007) (June 2007)

    Google Scholar 

  17. Plotkin, J.: E-mail discovery in civil litigation: Worst case scenario vs. best practices. White paper by KVault Software Plc., (April 2004)

    Google Scholar 

  18. The Radicati Group: Taming the Growth of Email – An ROI Analysis. White Paper by The Radicati Group, Inc. (2005)

    Google Scholar 

  19. The Radicati Group: An Overview of the Archiving Market and Jatheon Technologies by The Radicati Group, Inc. (September 2006)

    Google Scholar 

  20. Thickins, G.: Compliance: Do no evil – critical implications and opportunities for storage. Byte and Switch Insider 2(5) (2004)

    Google Scholar 

  21. U. S. Department of the Interior: It’s in the mail: Common questions about electronic mail and official records (2006)

    Google Scholar 

  22. Werelius, T.: Trends in Email Archiving., Computer World Storage Networking World Online (August 21, 2006)

    Google Scholar 

  23. Yu, H., Moreira, J.E., Dube, P., I-hsin, C., Zhang, L.: Performance Studies of a WebSphere Application, Trade, in Scale-out and Scale-up Environments. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pp. 1–8 (2007)

    Google Scholar 

  24. Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications 22 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paolo Atzeni Albertas Caplinskas Hannu Jaakkola

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wagner, F., Krebs, K., Mega, C., Mitschang, B., Ritter, N. (2008). Towards the Design of a Scalable Email Archiving and Discovery Solution. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds) Advances in Databases and Information Systems. ADBIS 2008. Lecture Notes in Computer Science, vol 5207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85713-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85713-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85712-9

  • Online ISBN: 978-3-540-85713-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics