Abstract
Web pages are usually unstructured and Information Extraction from them is not trivial. In the paper we describe the process of Information Extraction on the example of researchers’ home pages. For this reason we applied SVM, CRF, and MLN models. Performed analysis concerns texts in English language only.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Synat system ontology, http://wizzar.ii.pw.edu.pl/passim-ontology/:
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press (2008)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) ICML, pp. 282–289. Morgan Kaufmann (2001)
Domingos, P.: Real-World Learning with Markov Logic Networks. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 17–17. Springer, Heidelberg (2004)
Kim, S., Alani, H., Hall, W., Lewis, P.H., Millard, D.E., Shadbolt, N.R., Weal, M.J.: Artequakt: Generating tailored biographies with automatically annotated fragments from the web. Presented at the Semantic Authoring, Annotation and Knowledge Markup (SAAKM) 2002 Workshop at the 15th European Conference on Artificial Intelligence (ECAI 2002), pp. 1–6 (2002)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Li, Y., Liu, B., Sarawagi, S. (eds.) KDD, pp. 990–998. ACM (2008)
Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. TKDD 5(1), 2 (2010)
Ghahramani, Z., Jordan, M.I.: Factorial hidden markov models. Machine Learning 29(2-3), 245–273 (1997)
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2), 107–136 (2006)
Zhu, J., Nie, Z., Wen, J.R., Zhang, B., Ma, W.Y.: Simultaneous record detection and attribute labeling in web data extraction. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) KDD, pp. 494–503. ACM (2006)
Yao, L., Tang, J., Li, J.Z.: A unified approach to researcher profiling. In: Web Intelligence, pp. 359–366. IEEE Computer Society (2007)
Domingos, P., Richardson, M.: Markov logic: A unifying framework for statistical relational learning. In: Proceedings of the ICML 2004 Workshop on Statistical Relational Learning and its Connections to Other Fields, pp. 49–54 (2004)
Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006 Proceedings of the Sixth International Conference on Data Mining, pp. 572–582. IEEE Computer Society Press (2006)
Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering (2008)
Poon, H., Domingos, P.: Joint inference in information extraction. In: Proceedings of the 22nd National Conference on Artificial Intelligence, pp. 913–918. AAAI Press (2007)
Kok, S., Domingos, P.: Learning the structure of markov logic networks. In: Raedt, L.D., Wrobel, S. (eds.) ICML. ACM International Conference Proceeding Series, vol. 119, pp. 441–448. ACM (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Andruszkiewicz, P., Nachyła, B. (2013). Automatic Extraction of Profiles from Web Pages. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-35647-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)