Skip to main content

Sequence Graphs Realizations and Ambiguity in Language Models

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13025))

Included in the following conference series:

  • 930 Accesses

Abstract

Several language models rely on an assumption modeling each local context as a (potentially oriented) bag of words, and have proven to be very efficient baselines. Sequence graphs are the natural structures encoding their information. However, a sequence graph may have several realizations as a sequence, leading to a degree of ambiguity. In this paper, we study such degree of ambiguity from a combinatorial and computational point of view. In particular, we present theoretical properties of sequence graphs. Several combinatorial problems are presented, depending on three levels of generalisation (window size, graph orientation, and weights), that we characterize with new complexity results. We establish different algorithms, including an integer program and a dynamic programming formulation to respectively recognize a sequence graph and to count the number of its distinct realizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arora, S., Li, Y., Liang, Y., Ma, T., Risteski, A.: A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Linguist. 4, 385–399 (2016)

    Article  Google Scholar 

  2. Gawrychowski, P., Kociumaka, T., Radoszewski, J., Rytter, W., Waleń, T.: Universal reconstruction of a string. Theoret. Comput. Sci. 812, 174–186 (2020)

    Article  MathSciNet  Google Scholar 

  3. Gibert, J., Valveny, E., Bunke, H.: Dimensionality reduction for graph of words embedding. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 22–31. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20844-7_3

    Chapter  MATH  Google Scholar 

  4. Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56(1), 3–69 (2014)

    Article  MathSciNet  Google Scholar 

  5. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  6. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Invest. 30(1), 3–26 (2007)

    Article  Google Scholar 

  7. Peng, H., et al.: Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In: Proceedings of the 2018 World Wide Web Conference, pp. 1063–1072 (2018)

    Google Scholar 

  8. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  9. Roth, M., Woodsend, K.: Composition of word representations improves semantic role labelling. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 407–413 (2014)

    Google Scholar 

  10. Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the ACL and the 7th IJCNLP (Volume 1: Long Papers), pp. 1702–1712 (2015)

    Google Scholar 

  11. Sanjeev, A., Yingyu, L., Tengyu, M.: A simple but tough-to-beat baseline for sentence embeddings. In: Proceedings of ICLR (2017)

    Google Scholar 

  12. Skianis, K., Malliaros, F., Vazirgiannis, M.: Fusing document, collection and label graph-based representations with word embeddings for text classification. In: Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12), pp. 49–58 (2018)

    Google Scholar 

Download references

Acknowledgments

The authors wish to express their gratitude to Guillaume Fertin and an anonymous reviewer of an earlier version of this manuscript, for their valuable suggestions and constructive criticisms. Sammy Khalife acknowledges Agence Nationale de la Recherche for partially funding this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sammy Khalife .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khalife, S., Ponty, Y., Bulteau, L. (2021). Sequence Graphs Realizations and Ambiguity in Language Models. In: Chen, CY., Hon, WK., Hung, LJ., Lee, CW. (eds) Computing and Combinatorics. COCOON 2021. Lecture Notes in Computer Science(), vol 13025. Springer, Cham. https://doi.org/10.1007/978-3-030-89543-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89543-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89542-6

  • Online ISBN: 978-3-030-89543-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics