Information Harvest from Social Network Data (Facebook 100 million URLS)

  • P. NancyEmail author
  • R. Geetha Ramani
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 247)


Online social networks serves as an arena for its members to get in touch with each other, mutually share their information, ideas among themselves. In online social networks the members usually proclaim a profile, which consists of work and education, arts and entertainment and some basic information like gender, e-mail, etc., Such profile facilitates in spotting people, know about their interest, and interact with them in need. The intention of this research is to devise an algorithm to extract information such as name, email address, gender and interest of facebook users from a URL and to predict the gender if unspecified. The Dataset used in this work is a list of 100 million Facebook URLs. This research work paves a way to identify the email communities in Facebook. The outcome of this research reveals the fact that most of the email domains of the facebook user’s fall into yahoo, hotmail, Gmail and msn. The other domains are with least number of users. The users with Yahoo id are higher when compared to other email domains. It also discloses that majority of the interest of facebook members is towards sports. It is followed by music, technology, travelling, God and Temple run, PC gaming.


Algorithm Email address Extraction Facebook Gender Social network 


  1. 1.
    Facebook statistics [Online] Available:
  2. 2.
    Facebook 100 million user profile [Online] Available:
  3. 3.
    Mislove A, Viswanath B, Gummadi KP, Druschel P (2010) You are who you know: inferring user profiles in online social networks. In: Proceedings of WSDM, 2010Google Scholar
  4. 4.
    Chun H, Kwak H, Eom Y-H, Ahn Y–Y, Moon S, Jeong H (2008) Online social networks: sheer volume vs social interaction. In: Proceedings of IMC, 2008Google Scholar
  5. 5.
    Polakis I, Kontaxis G, Markatos E (2010) Using social networks to harvest e-mail addresses. In: Proceedings of WPES’2010Google Scholar
  6. 6.
    Gatterbauer W, Bohunsky P, Herzog M, Krupl B, Pollak B (2007) Towards domain independent information extraction from web tables. In: Proceeding of the international world wide web conference committee (IW3C2), May 8–12 2007, ACM, Banff, Alberta, Canada, pp 71–80Google Scholar
  7. 7.
    RAG Gultom, RF Sari, B Budiardjo (2011) Proposing the new algorithm and technique development for integration web table extraction and building a Mashup. J Comput Sci 7(2):129–142, ISSN 1549–3636Google Scholar
  8. 8.
    Zhai Y, Liu B (2005) Web data extraction based on partial tree alignment. In: Proceedings of WWW 2005, May 10–14 2005, Chiba, Japan. ACM 1-59593-046-9/05/0005Google Scholar
  9. 9.
    Zheleva E, Getoor L (2009) To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: Proceedings of WWW 2009Google Scholar
  10. 10.
    He J, Chu WW, Liu Z (2006) Inferring privacy information from social networks. In: Proceedings of ISI, pp 154–165Google Scholar
  11. 11.
    Heatherly R, Kantarcioglu M,Thuraisingha B, Lindamood J (2009) Reventing private information inference attacks on social networks. Technical report UTDCS-03-09, University of Texas at DallasGoogle Scholar
  12. 12.
    Tang C, Ross K, Saxena N, Chen R (2011) What’s in a name: a study of names, gender inference, and gender behavior in facebook. DASFAA workshops 2011, pp 344–356Google Scholar
  13. 13.
    Nancy P, Geetha Ramani R (2012) Knowledge discovery (email harvesting, gender identification and prediction) in social network data (facebook 100 million URLs), Lecture notes in engineering and computer science. In: Proceedings of the world congress on engineering and computer science 2012, 24–26 October, 2012, San Francisco, USA, pp 449–454Google Scholar
  14. 14.
    Popular baby names [Online] Available:
  15. 15.
    Facebook name list [Online] Available:

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of Information Science and Technology, College of EngineeringAnna UniversityGuindyIndia

Personalised recommendations