Advertisement

RENS – Enabling a Robot to Identify a Person

  • Xin Yan
  • Sabina Jeschke
  • Amit Dubey
  • Marc Wilke
  • Hinrich Schütze
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5928)

Abstract

We outline a web personal information mining system that enables robots or devices like mobile phones which possess a visual perception system to discover a person’s identity and his personal information (such as phone number, email, address, etc.) by using NLP methods based on the result of the visual perception. At the core of the system lies a rule based personal information extraction algorithm that does not require any supervision or manual annotation, and can easily be applied to other domains such as travel or books. This first implementation was used as a proof of concept and experimental results showed that our annotation-free method is promising and compares favorably to supervised approaches.

Keywords

Personal Information Pointwise Mutual Information Test Page Business Card Record Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tang, J., Hong, M., Zhang, J., Liang, B., Li, J.: ArnetMiner: Extraction and Mining of Academic Social Networks. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 2008), pp. 990–998 (2008)Google Scholar
  2. 2.
    Tang, J., Hong, M., Zhang, J., Liang, B., Li, J.: A New Approach to Personal Network Search based on Information Extraction. Demo paper. In: Proc. of ASWC 2006 (2006)Google Scholar
  3. 3.
    Gupta, S., Kaiser, G., Grimm, P., Chiang, M., Starren, J.: Automating Content Extraction of HTML Documents, pp. 179–224. Kluwer Academic Publishers, Dordrecht (2004)Google Scholar
  4. 4.
    Yu, K., Guan, G., Zhou, M.: Resume information extraction with cascaded hybrid model. In: IACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 499–506 (2005)Google Scholar
  5. 5.
    Prasad, J., Paepcke, A.: Coreex: content extraction from online news articles. In: CIKM 2008: Proceeding of the 17th ACM conference on Information and knowledge management, pp. 1391–1392 (2004)Google Scholar
  6. 6.
    Kim, Y., Park, J., Kim, T., Choi, J.: ArnetMiner: Web Information Extraction by HTML Tree Edit Distance Matching. In: International Conference on Convergence Information Technology, pp. 2455–2460 (2007)Google Scholar
  7. 7.
    Gomez, C., Puertas: Named Entity Recognition for Web Content Filtering. Natural Language Processing and Information Systems, 286–297 (2005)Google Scholar
  8. 8.
    Zhai, Y., Liu, B.: Mining data records in web pages. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 601–606 (2003)Google Scholar
  9. 9.
    Zhai, Y., Liu, B., Grossman, R.: Mining web pages for data records. IEEE Intell. Syst., 49–55 (November/December 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Xin Yan
    • 1
  • Sabina Jeschke
    • 2
  • Amit Dubey
    • 4
  • Marc Wilke
    • 1
  • Hinrich Schütze
    • 3
  1. 1.Institute for IT Service TechnologiesUniversity of Stuttgart 
  2. 2.Center for Learning and Knowledge ManagementRWTH Aachen University 
  3. 3.Institute for Natural Language ProcessingUniversity of Stuttgart 
  4. 4.Institute for Communicating and Collaborative SystemsUniversity of Edinburgh 

Personalised recommendations