Approximate Object Location and Spam Filtering on Peer-to-Peer Systems
Purchase on Springer.com
$29.95 / €24.95 / £19.95*
* Final gross prices may vary according to local VAT.
Recent work in P2P overlay networks allow for decentralized object location and routing (DOLR) across networks based on unique IDs. In this paper, we propose an extension to DOLR systems to publish objects using generic feature vectors instead of content-hashed GUIDs, which enables the systems to locate similar objects.We discuss the design of a distributed text similarity engine, named Approximate Text Addressing (ATA), built on top of this extension that locates objects by their text descriptions. We then outline the design and implementation of a motivating application on ATA, a decentralized spam-filtering service. We evaluate this system with 30,000 real spam email messages and 10,000 non-spam messages, and find a spam identification ratio of over 97% with zero false positives.
- Broder, A. Z. Some applications of rabin’s fingerprint method. In Sequences II: Methods in Communications, Security, and Computer Science, R. Capocelli, A. D. Santis, and U. Vaccaro, Eds. Springer Verlag, 1993, pp. 143–152.
- Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., AND Stoica, I. Towards a common API for structured P2P overlays. In Proceedings of IPTPS (Berkeley, CA, February 2003).
- Distributed checksum clearinghouse. http://www.rhyolite.com/anti-spam/dcc/.
- Harvey, N. J. A., Jones, M. B., Saroiu, S., Theimer, M., AND Wolman, A. Skipnet: A scalable overlay network with practical locality properties. In Proceedings of USITS(Seattle, WA, March 2003), USENIX.
- Hildrum, K., Kubiatowicz, J. D., Rao, S., AND Zhao, B.Y. Distributed object location in a dynamic network. In Proceedings of ACM SPAA (Winnipeg, Canada, August 2002).
- Li, J., Loo, B. T., Hellerstein, J., Kaashoek, F., Karger, D. R., AND Morris, R. On the feasibility of peer-to-peer web indexing and search. In 2nd International Workshop on Peer-to-Peer Systems (Berkeley, California, 2003).
- Manber, U. Finding similar files in a large file system. In Proceedings of Winter USENIX Conference (1994).
- Maymounkov, P., AND Mazieres, D. Kademlia: A peer-to-peer information system based on the XOR metric. In Proceedings of 1st International Workshop on Peer-to-Peer Systems (IPTPS) (Cambridge, MA, March 2002).
- Mozilla spam filtering. http://www.mozilla.org/mailnews/spam.html.
- Ratnasamy, S., Francis, P., Handley, M., Karp, R., AND Schenker, S. A scalable content-addressable network. In Proceedings of SIGCOMM (August 2001).
- Rowstron, A., AND Druschel, P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proceedings of IFIP/ACM Middleware 2001 (November 2001).
- Sahami, M., Dumais, S., Heckerman, D., AND Horvitz, E. A bayesian approach to filtering junk email. In AAAI Workshop on Learning for Text Categorization (Madison, Wisconsin, July 1998).
- Spamassassin. http://spamassassin.org.
- Spamnet. http://www.cloudmark.com.
- Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., AND Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of SIGCOMM (August 2001).
- Vipul’s razor. http://www.razor.sourceforge.net/.
- Witten, I. H., Moffat, A., AND Bell, T. C. Managing Gigabytes: Compressing and Indexing Documents and Images, second ed. Morgan Kaufmann Publishing, 1999.
- Zhao, B. Y., Kubiatowicz, J. D., AND Joseph, A. D. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, U.C. Berkeley, April 2001.
- Approximate Object Location and Spam Filtering on Peer-to-Peer Systems
- Book Title
- Middleware 2003
- Book Subtitle
- ACM/IFIP/USENIX International Middleware Conference Rio de Janeiro, Brazil, June 16–20, 2003 Proceedings
- pp 1-20
- Print ISBN
- Online ISBN
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- Series ISSN
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 4. Departamento de Informática, PUC-Rio
- 5. Department of Electrical Engineering and Computer Science, Vanderbilt University
- Author Affiliations
- 6. Computer Science Division, U. C., Berkeley
To view the rest of this content please follow the download PDF link above.