Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents

Illina, Irina; Fohr, Dominique

doi:10.1007/978-3-319-93782-3_2

Irina Illina^16,17,18 &
Dominique Fohr^16,17,18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10930))

Included in the following conference series:

Language and Technology Conference

507 Accesses

Abstract

Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. To increase the vocabulary coverage, a huge amount of text data should be used. In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. This model allows to better take into account lexical and semantic word relationships. In the context of broadcast news transcription and in terms of recall, experimental results show a good ability of the proposed model to select new relevant proper names.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baroni, M., Lenci, A.: Distributional memory: a general framework for corpus-based semantics. Comput. Linguist. 36(4), 673–721 (2010)
Article Google Scholar
Bengio, Y., Goodfellow, I., Courville, A.: Deep Learning. MIT Press, Cambridge (2015)
MATH Google Scholar
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Google Scholar
Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: Proceedings of ICASSP (2013)
Google Scholar
Federico, M., Bertoldi, N.: Broadcast news LM adaptation using contemporary texts. In: Proceedings of Interspeech, pp. 239–242 (2001)
Google Scholar
Fohr, D., Illina, I.: Word space representations and their combination for proper name retrieval from diachronic documents. In: Proceedings of Interspeech (2015)
Google Scholar
Friburger, N., Maurel, D.: Textual similarity based on proper names. In: Proceedings of the Workshop Mathematical/Formal Methods in Information Retrieval, pp. 155–167 (2002)
Google Scholar
Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.-F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. In: Proceedings of Interspeech (2005)
Google Scholar
Illina, I., Fohr, D., Linares, G.: Proper name retrieval from diachronic documents for automatic transcription using lexical and temporal context. In: Proceedings of SLAM (2014)
Google Scholar
Illina, I., Fohr, D., Jouvet, D.: Grapheme-to-phoneme conversion using conditional random fields. In: Proceedings of Interspeech (2011)
Google Scholar
Illina, I., Fohr, D., Mella, O., Cerisara, C.: The automatic news transcription system: ANTS, some real time experiments. In: Proceedings of ICSLP (2004)
Google Scholar
Kobayashi, A., Onoe, K., Imai, T., Ando, A.: Time dependent language model for broadcast news transcription and its post-correction. In: Proceedings of ICSPL (1998)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (2009)
Google Scholar
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225 (2015)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2015)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL:HLT (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of EMNLP (2014)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of ICNMLP (1994)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP (2002)
Google Scholar
Turney, P., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is funded by the ContNomina project supported by the French national Research Agency (ANR) under contract ANR-12-BS02-0009.

Author information

Authors and Affiliations

Université de Lorraine, LORIA, UMR 7503, 54506, Vandoeuvre-lès-Nancy, France
Irina Illina & Dominique Fohr
Inria, 54600, Villers-lès-Nancy, France
Irina Illina & Dominique Fohr
CNRS, LORIA, UMR 7503, 54506, Vandoeuvre-lès-Nancy, France
Irina Illina & Dominique Fohr

Authors

Irina Illina
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Fohr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Illina .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
LIMSI-CNRS, Orsay Cedex, France
Joseph Mariani
Adam Mickiewicz University, Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Illina, I., Fohr, D. (2018). Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents. In: Vetulani, Z., Mariani, J., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2015. Lecture Notes in Computer Science(), vol 10930. Springer, Cham. https://doi.org/10.1007/978-3-319-93782-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-93782-3_2
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93781-6
Online ISBN: 978-3-319-93782-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics