Abstract
In this paper we propose a novel approach to specialize a general purpose Enterprise Content Management (ECM) System into an Email Archiving and Discovery (EAD) System. The magnitude and range of compliance risks associated with the management of EAD is driving investment in the development of more effective and efficient approaches to support regulatory compliance, legal discovery and content life-cycle needs. Companies must recognize and address requirements like legal compliance, electronic discovery, and document retention management. What is needed today are EAD systems capable to process very high message ingest rates, support distributed full text indexing, and allow forensic search such to support litigation cases. All this must be provided at lowest cost with respect to archive management and administration. In our approach we introduce a virtualized ECM repository interface where the key content repository components are wrapped into a set of tightly coupled Grid service entities, such to achieve scale-out on a cluster of commodity blade hardware that is automatically configured and dynamically provisioned. By doing so we believe, we can leverage the strength of Relational Database Management Systems and Full Text Indexes in a managed clustered environment with minimal operational overhead.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bace, J., Logan, D.: The Costs and Risks of E-discovery in Litigation. Gartner (December 1, 2005)
Base One: Database Scalability - Dispelling myths about the limits of database-centric architecture (Retrieved 2008-02-20), http://www.boic.com/scalability.htm
Barlas, D., Vahidy, T.: The Email Management Crisis. White paper, Iron Mountain Inc. (January 24, 2006)
Brewer, E.: Combining systems and databases: A search engine retrospective. In: Stonebraker, M., Hellerstein, J. (eds.) Readings in Database Systems, 4th edn. MIT Press, Cambridge (2004)
Chaudhuri, S., Dayal, U., Yan, T.W.: Join queries with external text sources: Execution and optimization techniques. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, pp. 410–422 (1995)
Chen, K.: IBM DB2 content manager v8 implementation on DB2 universal database: A primer. Technical report, IBM (2003)
Churchill, B., Clark, L., Rosenoer, J., von Bulow, F.: The impact of electronically stored. information on corporate legal and compliance management: An IBM point of view. White paper, IBM Corporation (October 2006)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: 6th Symposium on Operating Systems Design and Implementation (2004)
DiCenzo, C., Chin, K.: Magic Quadrant for E-Mail Active Archiving. Gartner (2007)
Ghodsi: Distributed k-ary System: Algorithms for Distributed Hash Tables. Doctoral thesis, KTH - Royal Institute of Technology (2006)
Hausheer, D., Stiller, B.: Design of a distributed P2P-based content management middleware. In: Proceedings of the 29th Euromicro Conference, pp. 173–180 (2003)
Manoel, E., Horkan, P., Parziale, L.: Dynamic Provisioning of SAP Environments using IBM Dynamic Infrastructure for MySAP and IBM Tivoli Provisioning Manager. IBM Redbooks Paper (October 2005)
Maymounkov, P., Mazières, D.: Kademlia: A Peer-to-Peer Information System Based on the XOR Metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429. Springer, Heidelberg (2002)
Mega, C., Wagner, F., Mitschang, B.: From Content Management to Enterprise Content Management. In: Datenbanksysteme in Business, Technologie und Web (BTW) (2005)
Michael, M., Moreira, J.E., Shiloach, D., Wisniewski, R.W.: Scale-up x Scale-out: A Case Study using Nutch/Lucene. In: Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium (March 2007)
Moreira, J.E., Michael, M.M., Da Silva, D., Shiloach, D., Dube, P.: Scalability of the Nutch search engine. In: Proceedings of the 21st annual international conference on Supercomputing (ICS 2007) (June 2007)
Plotkin, J.: E-mail discovery in civil litigation: Worst case scenario vs. best practices. White paper by KVault Software Plc., (April 2004)
The Radicati Group: Taming the Growth of Email – An ROI Analysis. White Paper by The Radicati Group, Inc. (2005)
The Radicati Group: An Overview of the Archiving Market and Jatheon Technologies by The Radicati Group, Inc. (September 2006)
Thickins, G.: Compliance: Do no evil – critical implications and opportunities for storage. Byte and Switch Insider 2(5) (2004)
U. S. Department of the Interior: It’s in the mail: Common questions about electronic mail and official records (2006)
Werelius, T.: Trends in Email Archiving., Computer World Storage Networking World Online (August 21, 2006)
Yu, H., Moreira, J.E., Dube, P., I-hsin, C., Zhang, L.: Performance Studies of a WebSphere Application, Trade, in Scale-out and Scale-up Environments. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, pp. 1–8 (2007)
Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications 22 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wagner, F., Krebs, K., Mega, C., Mitschang, B., Ritter, N. (2008). Towards the Design of a Scalable Email Archiving and Discovery Solution. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds) Advances in Databases and Information Systems. ADBIS 2008. Lecture Notes in Computer Science, vol 5207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85713-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-85713-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85712-9
Online ISBN: 978-3-540-85713-6
eBook Packages: Computer ScienceComputer Science (R0)