Abstract
Several language models rely on an assumption modeling each local context as a (potentially oriented) bag of words, and have proven to be very efficient baselines. Sequence graphs are the natural structures encoding their information. However, a sequence graph may have several realizations as a sequence, leading to a degree of ambiguity. In this paper, we study such degree of ambiguity from a combinatorial and computational point of view. In particular, we present theoretical properties of sequence graphs. Several combinatorial problems are presented, depending on three levels of generalisation (window size, graph orientation, and weights), that we characterize with new complexity results. We establish different algorithms, including an integer program and a dynamic programming formulation to respectively recognize a sequence graph and to count the number of its distinct realizations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A.: A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Linguist. 4, 385–399 (2016)
Gawrychowski, P., Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Universal reconstruction of a string. Theoret. Comput. Sci. 812, 174–186 (2020)
Gibert, J., Valveny, E., Bunke, H.: Dimensionality reduction for graph of words embedding. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 22–31. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20844-7_3
Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Invest. 30(1), 3–26 (2007)
Peng, H., et al.: Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018 World Wide Web Conference, pp. 1063–1072 (2018)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Roth, M., Woodsend, K.: Composition of word representations improves semantic role labelling. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 407–413 (2014)
Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the ACL and the 7th IJCNLP (Volume 1: Long Papers), pp. 1702–1712 (2015)
Sanjeev, A., Yingyu, L., Tengyu, M.: A simple but tough-to-beat baseline for sentence embeddings. In: Proceedings of ICLR (2017)
Skianis, K., Malliaros, F., Vazirgiannis, M.: Fusing document, collection and label graph-based representations with word embeddings for text classification. In: Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pp. 49–58 (2018)
Acknowledgments
The authors wish to express their gratitude to Guillaume Fertin and an anonymous reviewer of an earlier version of this manuscript, for their valuable suggestions and constructive criticisms. Sammy Khalife acknowledges Agence Nationale de la Recherche for partially funding this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Khalife, S., Ponty, Y., Bulteau, L. (2021). Sequence Graphs Realizations and Ambiguity in Language Models. In: Chen, CY., Hon, WK., Hung, LJ., Lee, CW. (eds) Computing and Combinatorics. COCOON 2021. Lecture Notes in Computer Science(), vol 13025. Springer, Cham. https://doi.org/10.1007/978-3-030-89543-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-89543-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89542-6
Online ISBN: 978-3-030-89543-3
eBook Packages: Computer ScienceComputer Science (R0)