Finding Influence by Cross-Lingual Blog Mining through Multiple Language Lists

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 139)


Blogs has been one of the important resources of information on the internet. Now-a-days lot of Indian language content being generated in the form of blogs. People express their opinions on various situations and events. The content in the blogs may contain named entities–names of people, places, and organizations. Named entities also contain names of eminent personalities who are famous in or out of that language community. The goal of this paper is to find the influence of a personality among cross-language bloggers. The approach we follow is to collect information from blog pages and index the named entities along with their probabilities of occurrence by removing irrelevant information from the blog. When user searches to find the influence of a personality through a query in Indian language, we use a cross language lexicon in the form of multiple language parallel lists to transliterate the query into other Indian languages and mine blogs to return the influence of the personality across Indian language bloggers. An overview of the system and preliminary results are described.


Cross-lingual Blog analysis multilingual 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Joshi, M., Belsare, N.: Blog Harvest: Blog mining and search framework. In: Lakshmanan, L.V., Roy, P., Tung, A.K. (eds.) Proceedings of the 13th International Conference on Management of Data (COMAD), pp. 226–229. Computer Society of India, Delhi (2006)Google Scholar
  2. 2.
    Brian, U., Ken, B., Amy, M.: New Metrics for Blog Mining. In: Proceedings of SPIE Defense & Security Symposium 2007, Orlando, FL, vol. #6570(657001) (2007)Google Scholar
  3. 3.
    Tomohiro, F., Takehito, U., Hiroshi, N.: Cross-Lingual Concern Analysis from Multilingual Weblog Articles. In: Proc. 6th Inter. Workshop on Social Intelligence Design, pp. 55–64 (2009)Google Scholar
  4. 4.
    Ishida, T.: Language Grid: An Infrastructure for Intercultural Collaboration. In: IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2006), pp. 96–100 (2006); keynote addressGoogle Scholar
  5. 5.
    Pardeep, K.: Development of Hindi-Punjabi Parallel Corpus Using Existing Hindi-Punjabi Machine Translation System and Using Sentence Alignments. International Journal of Computer Applications (0975 – 8887) 5(9) (August 2010)Google Scholar
  6. 6.
    Lisa, B., Bruce, C.W.: Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. In: ACM SIGIR 1997, pp. 84–91 (1997)Google Scholar
  7. 7.
    Hiroyuki, N., Mariko, K., Sayuri, Y.: Visualizing Cross-Lingual/Cross-Cultural Differences in Concerns in Multilingual Blogs. In: Proceedings of the Third International ICWSM Conference, pp. 270–273 (2009)Google Scholar
  8. 8.
    Andreas, J., Elisabeth, L.: Cross language Blog Mining and Trend Visualisation. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid,Spain, pp. 1149–1150 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Search and Information Extraction LabIIIT HyderabadIndia

Personalised recommendations