Skip to main content
Log in

Valence extraction using EM selection and co-occurrence matrices

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a non-probabilistic deep grammar parser and some post-processing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced co-occurrence matrices. Using co-occurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches an F-score of 45%, compared to an F-score of 39% for the standard frame-based BHT filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34, 555–596.

    Article  Google Scholar 

  • Baker, C. F., & Ruppenhofer, J. (2002). FrameNet’s frames vs. Levin’s verb classes’. In Proceedings of the 28th annual meeting of the Berkeley Linguistics Society (pp. 27–38).

  • Bańko, M. (Ed.) (2000). Inny słownik języka polskiego. Warszawa: Wydawnictwo Naukowe PWN.

    Google Scholar 

  • Baum, L. E. (1972). Inequality and associated maximization technique in statistical estimation of probabilistic functions of Markov processes. Inequalities, 3, 1–8.

    Google Scholar 

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

    Google Scholar 

  • Brent, M. R. (1993). From grammar to Lexicon: Unsupervised learning of lexical syntax. Computational Linguistics, 19, 243–262.

    Google Scholar 

  • Briscoe, T., & Carroll, J. (1997). Automatic extraction of subcategorization from Corpora. In Proceedings of the 5th ACL conference on applied natural language processing (pp. 356–363). Washington, DC. Morgan Kaufmann.

  • Carroll, G., & Rooth, M. (1998). Valence induction with a head-lexicalized PCFG. In Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (No. 4, Vol. 3, pp. 25–54).

  • Chesley, P., & Salmon-Alt, S. (2006). Automatic extraction of subcategorization frames for French. In Proceedings of the language resources and evaluation conference (LREC 2006), Genua, Italy.

  • Chi, Z., & Geman, S. (1998). Estimation of probabilistic context-free grammars. Computational Linguistics, 24, 299–305.

    Google Scholar 

  • Colmerauer, A. (1978). Metamorphosis grammar. In Natural language communication with computers. Lecture Notes in Computer Science (Vol. 63, pp. 133–189). New York: Springer.

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 185–197.

    Google Scholar 

  • Dębowski, Ł., & Woliński, M. (2007). Argument co-occurrence matrix as a description of verb valence. In Z. Vetulani (Ed.), Proceedings of the 3rd language & technology conference, October 5–7, 2007, Poznań, Poland (pp. 260–264).

  • Ersan, M., & Charniak, E. (1995). A statistical syntactic disambiguation program and what it learns. In S. Wermter, E. Riloff, & G. Scheler (Eds.), Learning for natural language processing (pp. 146–159). New York: Springer.

    Google Scholar 

  • Fast, J., & Przepiórkowski, A. (2005). Automatic extraction of polish verb subcategorization: An evaluation of common statistics. In Z. Vetulani (Ed.), Proceedings of the 2nd language & technology conference, Poznań, Poland, April 21–23, 2005 (pp. 191–195).

  • Gorrell, G. (1999). Acquiring subcategorisation from textual corpora. M. Phil. Dissertation, University of Cambridge.

  • Halford, G. S., Wilson, W. H., & Phillips, W. (1998). Processing capacity defined by relational complexity: Implications for comparative, developmental and cognitive psychology. Behavioral Brain Sciences, 21(6), 803–864.

    Google Scholar 

  • Jelinek, F. (1997). Statistical methods for speech recognition. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Korhonen, A. (2002). Subcategorization acquisition. Ph.D. Dissertation, University of Cambridge.

  • Kupiec, J. (1992). Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language, 6, 225–242.

    Article  Google Scholar 

  • Kurcz, I., Lewicki, A., Sambor, J., & Woronczak, J. (1990). Słownik frekwencyjny polszczyzny współczesnej. Kraków: Instytut Języka Polskiego PAN.

    Google Scholar 

  • Lapata, M., & Brew, C. (2004). Verb class disambiguation using informative priors. Computational Linguistics, 30, 45–73.

    Article  Google Scholar 

  • Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: The University of Chicago Press.

    Google Scholar 

  • Macleod, C., Grishman, R., & Meyers, A. (1994). Creating a common syntactic dictionary of English. In SNLR: International workshop on sharable natural language resources, Nara, August, 1994.

  • Manning, C. (1993). Automatic acquisition of a large subcategorization dictionary from corpora. In Proceedings of the 31st annual meeting of the ACL, Columbus, OH (pp. 235–242).

  • Mayol, L., Boleda, G., & Badia, T. (2005). Automatic acquisition of syntactic verb classes with basic resources. Language Resources and Evaluation, 39, 295–312.

    Article  Google Scholar 

  • McCarthy, D. (2001). Lexical acquisition at the syntax-semantics interface: Diathesis alternations, subcategorization frames and selectional preferences. Ph.D. Thesis, University of Sussex.

  • Merialdo, B. (1994). Tagging English text with a probabilistic model. Computational Linguistics, 20, 155–171.

    Google Scholar 

  • Młynarczyk, A. K. (2004). Aspectual pairing in Polish. Ph.D. Thesis, Universiteit Utrecht.

  • Neal, R., & Hinton, G. (1999). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Cambridge, MA: The MIT Press.

    Google Scholar 

  • Polański, K. (Ed.). (1980–1992). Słownik syntaktyczno-generatywny czasowników polskich. Wrocław: Zakład Narodowy im. Ossolińskich/Kraków: Instytut Języka Polskiego PAN.

  • Przepiórkowski, A. (2006). What to acquire from corpora in automatic valence acquisition’. In V. Koseska-Toszewa, R. Roszko (eds.) Semantyka a konfrontacja językowa (3). Warszawa: Slawistyczny Ośrodek Wydawniczy PAN.

  • Przepiórkowski, A., & Fast, J. (2005). Baseline experiments in the extraction of polish valence frames. In M. A. Kłopotek, S. T. Wierzchoń, & K. Trojanowski (Eds.), Intelligent information processing and web mining (pp. 511–520). New York: Springer.

  • Przepiórkowski, A., & Woliński, M. (2003). A flexemic tagset for polish. In Proceedings of morphological processing of slavic languages (EACL 2003) (pp. 33–40).

  • Rudin, W. (1974). Real and complex analysis. New York: McGraw-Hill.

    Google Scholar 

  • Sarkar, A., & Zeman, D. (2000). Automatic extraction of subcategorization frames for Czech. In Proceedings of the 18th international conference on computational linguistics (COLING 2000), Saarbrücken, Germany (pp. 691–698).

  • Schulte im Walde, S. (2006). Experiments on the automatic induction of German semantic verb classes. Computational Linguistics, 32, 159–194.

    Article  Google Scholar 

  • Surdeanu, M., Morante, R., & Màrquez, L. (2008). Analysis of joint inference strategies for the semantic role labeling of Spanish and Catalan. In Proceedings of the computational linguistics and intelligent text processing 9th international conference (CICLing 2008) (pp. 206–218).

  • Świdziński, M. (1992). Gramatyka formalna języka polskiego. Warszawa: Wydawnictwa Uniwersytetu Warszawskiego.

    Google Scholar 

  • Świdziński, M. (1994). Syntactic dictionary of polish verbs. Warszawa: Uniwersytet Warszawski/Amsterdam: Universiteit van Amsterdam.

  • Tokarski, J. (1993). Schematyczny indeks a tergo polskich form wyrazowych. Warszawa: Wydawnictwo Naukowe PWN.

    Google Scholar 

  • Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Google Scholar 

  • Woliński, M. (2004). Komputerowa weryfikacja gramatyki Świdzińskiego. Ph.D. Thesis, Instytut Podstaw Informatyki PAN, Warszawa.

  • Woliński, M. (2005). An efficient implementation of a large grammar of Polish. Archives of Control Sciences, 15(LI)(3), 251–258.

  • Woliński, M. (2006). Morfeusz—A practical tool for the morphological analysis of polish. In M. A. Kłopotek, S. T. Wierzchoń, & K. Trojanowski (Eds.), Intelligent information processing and web mining (pp. 503–512). New York: Springer.

    Google Scholar 

Download references

Acknowledgements

Grateful acknowledgements are due to Marcin Woliński for his help in using Świgra, to Witold Kieraś for retyping samples of the test dictionaries, and to Marek Świdziński for offering the source file of his valence dictionary. The author thanks also Adam Przepiórkowski, Jan Mielniczuk, Laurence Cantrill, and the anonymous reviewers for many helpful comments concerning the composition of this article. The work was supported by the Polish State Research Project, 3 T11C 003 28, Automatyczna ekstrakcja wiedzy lingwistycznej z \(du{\dot{z}}ego\) korpusu języka polskiego.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Łukasz Dębowski.

Appendices

Appendix 1: A faster reconstruction of the frame set

Although there is no need to compute \(\bar{\mathbf F}(v)\) defined in (5) to verify condition \(f\in\bar{\mathbf F}(v)\) for a given f , the reconstruction \(\bar{\mathbf F}(v)\) can be computed efficiently if needed for other purposes. A naive solution suggested by formula (5) is to search through all elements of the power set 2L(v) and to check for each independently whether it is an element of \(\bar{\mathbf F}(v).\) However, we can do it faster by applying some dynamic programming.

Firstly, let us enumerate the elements of L(v) = {b 1, b 2,…, b N }. In the following, we will compute the chain of sets A 0, A 1, …, A N where \(A_n=\{(B_n\cap f, B_n\setminus f) | f\in \bar{\mathbf F}(v)\}\) and B n = {b 1, b 2,…, b n }.

In fact, there is an iteration for this chain:

$$ \begin{aligned} A_0 &= \{(\emptyset,\emptyset)\},\\ A_n &= \left\{ \begin{array}{l} (f\cup \{b_n\}, g) \left| \begin{array}{l} (f,g)\in A_{n-1},\\ \forall_{a\in f} {\mathbf M}(v)_{b_na}\not=\times ,\\ \forall_{a\in g} {\mathbf M}(v)_{b_na}\not=\leftrightarrow ,\\ \forall_{a\in g} {\mathbf M}(v)_{b_na}\not=\leftarrow \\ \end{array}\right.\\ \end{array}\right\}\\ & \quad \cup \left\{ \begin{array}{l} (f, g\cup \{b_n\}) \left| \begin{array}{l} (f,g)\in A_{n-1},\\ \{b_n\}\not\in {\mathbf E}(v) ,\\ \forall_{a\in f} {\mathbf M}(v)_{b_na}\not=\leftrightarrow ,\\ \forall_{a\in f} {\mathbf M}(v)_{b_na}\not=\leftarrow \end{array}\right. \end{array}\right\} \end{aligned} $$

Once the set \(A_N=\{(f, {\mathbf L}(v)\setminus f) | f\in \bar{\mathbf F}(v) \}\) is computed, \(\bar{\mathbf F}(v)\) can be read off easily.

Appendix 2: Parsing of the IPI PAN Corpus

The input of the valence extraction experiment discussed in this paper came from the 250-million-word IPI PAN Corpus of Polish (http://korpus.pl/). The original automatic part-of-speech annotation of the text was removed, since it contained too many errors, and the sentences from the corpus were analyzed using the Świgra parser (Woliński 2004, 2005), see also http://nlp.ipipan.waw.pl/~wolinski/swigra/ Technically, Świgra utilizes two distinct language resources: (1) Morfeusz—a dictionary of inflected words (a.k.a. a morphological analyzer) programmed by Woliński (2006) on the basis of about 20,000 stemming rules compiled by Tokarski (1993), and (2) GFJP—the formal grammar of Polish written by Świdziński (1992). Świdziński’s grammar is a DCG-like grammar, close to the format of the metamorphosis grammar by Colmerauer (1978). It counts 461 rules and examples of its parse trees can be found in Woliński (2004). For the sake of this project, Świgra used a fake valence dictionary that allowed any verb to take none or one NP in the nominative (the subject) and any combination of other arguments.

Only a small subset of sentences was actually selected to be parsed with Świgra. The following selection criteria were applied to the whole 250-million-word IPI PAN Corpus:

  1. 1.

    The selected sentence had to contain a word recognized by Morfeusz as a verb and the verb had to occur ≥396 times in the corpus. (396 is the lowest corpus frequency of a verb from the test set described in Sect. 4. The threshold was introduced to speed up parsing without loss of empirical coverage for any verb in the test set. The selected sentence might contain another less frequent verb if it was a compound sentence.)

  2. 2.

    The selected sentence could not be longer than 15 words. (We supposed that the EM selection would find it difficult to select the correct parse for longer sentences.)

  3. 3.

    Maximally 5000 sentences were selected per recognized verb. (We supposed that a frame which was used less than once per one 5,000 verb occurrences would not be considered in the gold-standard dictionaries.)

In this way, a subset of 1,011,991 sentences (8,727,441 running words) was chosen. They were all fed to Świgra’s input but less than half (0.48 million sentences) were parsed successfully within a preset time of 1 minute per sentence. Detailed statistics are given in Table 4 below. All mentioned thresholds were introduced in advance to compute only the most useful parse forests in the pre-estimated total time of a few months. It was the first experiment ever in which Świgra was applied to more than several hundred sentences. The parsing actually took 2 months on a single PC station.

Table 4 Sizes of the processed parts of the IPI PAN Corpus

Not all information contained in the obtained parse forests was relevant for valence acquisition. Full parses were subsequently reduced to valence frames plus verbs, as in the first displayed example in Sect. 3. First of all, the parse forests for compound sentences were split into separate parse forests for elementary clauses. Then each parse tree was reduced to a string that identifies only the top-most phrases. To decrease the amount of noise in the subsequent EM selection and to speed up computation, we decided to skip 10% of clauses that had the largest number of reduced parses. As a result, we only retained clauses which had ≤40 reduced parses.

To improve the EM selection, we also deleted parses that contained certain syntactically idiosyncratic words—mostly indefinite pronouns to (= this), co (= what), and nic (= nothing)—or highly improbable morphological word interpretations (like the second interpretation for albo = 1. the conjunction or; 2. the vocative singular of the noun alb—a kind of liturgical vestment). The stop list of improbable interpretations consisted of 646 word interpretations which never occurred in the SFPW Corpus but were possible interpretations of the most common words according to Morfeusz. The SFPW Corpus is a manually POS tagged 0.5-million-word corpus prepared for the frequency dictionary of 1960s Polish (Kurcz et al. 1990), which was actually commenced in the 1960s but not published until 1990.

Our format of reduced parses approximates the format of valence frames in Świdziński (1994), so it diverges from the format proposed by Przepiórkowski (2006). To convert a parse in Przepiorkowski’s format into ours, the transformations must be performed as follows:

  1. 1.

    Add the dropped personal subject or the impersonal subject expressed by the ambiguous reflexive marker się when their presence is implied by the verb form.

  2. 2.

    Remove one nominal phrase in the genitive for negated verbs. (An attempt to treat the genitive of negation.)

  3. 3.

    Transform several frequent adjuncts expressed by nominal phrases.

  4. 4.

    Skip the parse if it contains pronouns to (= this), co (= what), and nic (= nothing). (Instead of converting these pronouns into regular nominal phrases.)

  5. 5.

    Remove lemmas from non-verbal phrases and sort phrases in alphabetic order.

The resulting bank of reduced parse forests included 510,743 clauses with one or more proposed valence frames. We parsed successfully only 3.4 million running words of the whole 250-million-word IPI PAN Corpus—four times less than the 12 million words parsed by Fast and Przepiórkowski (2005). However, our superior results in the valence extraction task indicate that skipping a fraction of available empirical data is a good idea if the remaining data can be processed more thoroughly and the skipped portion does not provide different efficiently usable information.

Appendix 3: The EM selection algorithm

Consider the following abstract statistical task. Let Z 1, Z 2, …, Z M , with Z i  : Ω→J, be a sequence of discrete random variables and let Y 1, Y 2, …, Y M be a random sample of sets, where each set \(Y_i:\Upomega\rightarrow 2^J\setminus\emptyset\) contains the actual value of Z i , i.e., Z i  ∈ Y i . The objective is to guess the conditional distribution of Z i given an event (Y i  = A i ) M i=1 , A i  ⊂ J. In particular, we would like to know the conditionally most likely values of Z i . The exact distribution of Y i is not known and unfeasible to estimate if we treat the values of Y i as atomic entities. We have to solve the task via some rationally motivated assumptions.

Our heuristic solution was iteration

$$ p_{ji}^{(n)}= \left\{\begin{array}{ll} p_j^{(n)}/\sum_{j'\in A_i} p_{j'}^{(n)}, & j\in A_i,\\ 0, & \hbox{else},\\ \end{array}\right. $$
(11)
$$ p_j^{(n+1)}= {\frac{1}{M}} \sum_{i=1}^{M} p_{ji}^{(n)}, $$
(12)

with p (1) j  = 1. We observed that coefficients p (n) ji converge to a value that can be plausibly identified with the conditional probability P(Z i  = j|Y i  = A i ).

Possible applications of iteration (11)–(12), which we call the EM selection algorithm, cover unsupervised disambiguation tasks where the number of different values of Y i is very large but the internal ambiguity rate (i.e., the typical cardinality |Y i |) is rather small and the alternative choices within Y i (i.e., the values of Z i ) are highly repeatable. There may be many applications of this kind in NLP and bioinformatics. To our knowledge, however, we present the first rigorous treatment of this particular selection problem.

In this appendix, we will show that the EM selection algorithm belongs to the class of expectation-maximization (EM) algorithms. For this reason, our algorithm resembles many instances of EM used in NLP, such as the Baum-Welch algorithm for hidden Markov models (Baum 1972) or linear interpolation (Jelinek 1997). However, normalization (11), which is done over varying sets A i —unlike the typical case of linear interpolation, is the singular feature of EM selection. The local maxima of the respective likelihood function also form a convex set, so there is no need to care much for initializing the iteration (11)–(12), unlike e.g. the Baum-Welch algorithm.

To begin with, we recall the universal scheme of EM (Dempster et al. 1977; Neal and Hinton 1999). Let P(Y|θ) be a likelihood function, where Y is an observed variable and θ is an unknown parameter.

For the observed value Y, the maximum likelihood estimator of θ is

$$ \theta_{\rm MLE}=\mathop{\hbox{arg max}}\limits_\theta P(Y|\theta). $$

When the direct maximization is impossible, we may consider a latent discrete variable Z and function

$$ Q(\theta',\theta'') =\sum_{z} P(Z=z|Y,\theta')\log P(Z=z,Y|\theta''), $$

which is a kind of cross entropy function. The EM algorithm consists of setting an initial parameter value θ1 and iterating

$$ \theta_{n+1}=\mathop{\hbox{arg max}}\limits_\theta Q\left(\theta_{n},\theta\right) $$
(13)

until a sufficient convergence of θ n is achieved. It is a general fact that P(Y n+1) ≥ P(Y n ) but EM is worth considering only if maximization (13) is easy.

Having outlined the general EM algorithm, we come back to the selection problem. The observed variable is Y = (Y 1, Y 2, …, Y M ), the latent one is Z = (Z 1, Z 2, …, Z M ), whereas the parameter seems to be θ n  = (p (n) j ) jJ . The appropriate likelihood function remains to be determined. We may suppose from the problem statement that it factorizes into \(P(Z,Y|\theta) =\prod_{i} P(Z_i,Y_i|\theta).\) Hence Q(θ′,θ′′) takes the form

$$ Q(\theta',\theta'')=\sum_{i} \sum_{j} P(Z_i=j|Y_i=A_i,\theta') \log P(Z_i=j,Y_i=A_i|\theta''). $$

Assume now

$$ P(Y_i=A|Z_i=j,\theta)= \left\{\begin{array}{ll} g(A), & j\in A,\\ 0, & \hbox{else}, \end{array}\right. $$
(14)
$$ P(Z_i=j|\theta)=p_j $$
(15)

for θ = (p j ) jJ and a parameter-free function g(·) satisfying

$$ \sum_{A\in 2^J} {\mathbf 1}_{\{j\in A\}} g(A) =1, \quad\forall{j\in J}, $$
(16)

where

$$ {\mathbf 1}_{\{\phi\}}=\left\{ \begin{array}{ll} 1, & \phi \hbox{ is true},\\ 0, & \hbox{else}. \end{array}\right. $$

For example, let g(A) = q |A|−1 (1 − q)|J|−|A|, where |A| stands for the cardinality of set A and 0 ≤ q ≤ 1 is a fixed number not incorporated into θ. Then the cardinalities of sets Y i are binomially distributed, i.e., P(|Y i | − 1|θ)∼ B(|J| − 1,q). This particular form of g(A), however, is not necessary to satisfy (16). The model (14)–(15) is quite speculative. In the main part of this article, we need to model the probability distribution of the reduced parse forest Y i under the assumption that the correct parse Z i is an arbitrary element of Y i . In particular, we have to imagine what P(Y i  = A|Z i  = j, θ) is like if j is a semantically implausible parse. We circumvent the difficulty by saying in (14) that this quantity is the same as if j were the correct parse.

Assumption (14) leads to an EM algorithm which does not depend on the specific choice of function \(g(\cdot).\) Therefore the algorithm is rather generic. In fact, (14) assures that P(Y i  = A i |θ) = g(A i )P(Z i A i |θ) and

$$ P(Z_i=j|Y_i=A_i,\theta)=P(Z_i=j|Z_i\in A_i,\theta). $$
(17)

In consequence, iteration (13) is equivalent to

$$ 0= \left.{\frac{\partial}{\partial p_j}} \left[Q(\theta_n,\theta)-\lambda \left(\sum_{j'\in J} p_{j'} -1\right)\right]\right|_{\theta=\theta_{n+1}} = {\frac{\sum_{i=1}^{M} p_{ji}^{(n)}}{p_j^{(n+1)}}}-\lambda, $$
(18)

where p (n) ji  = P(Z i  = j|Z i   ∈ A i n ) is given exactly by (11).

If the Lagrange multiplier λ is assigned the value that satisfies constraint ∑ jJ p j' = 1 then Eq. 18 simplifies to (12). Hence it becomes straightforward that iteration (11)–(12) maximizes locally the log-likelihood

$$ L(\theta):=\log P((Y_i=A_i)_{i=1}^M|\theta) =\log\left[\prod_{i=1}^M {\frac{P(Z_i\in A_i|\theta)}{g(A_i)}}\right], $$
(19)

or simply L (n+1) ≥ L (n) for

$$ L^{(n)}:=L(\theta_n) + \sum_{i=1}^{M} \log g(A_i) =\sum_{i=1}^{M} \log \left[\sum_{j\in A_i} p_j^{(n)}\right],\quad n\ge 2. $$

Moreover, there is no need to care for the initialization of iteration (11)–(12) since the local maxima of function (19) form a convex set \({\mathcal{M}},\) i.e., \(\theta,\theta'\in {\mathcal{M}}\Rightarrow q\theta+(1-q)\theta'\in {\mathcal{M}}\) for 0 ≤ q ≤ 1. Hence that function is, of course, constant on \({\mathcal{M}}.\) To show this, observe that the domain of log-likelihood (19) is a convex compact set \({\mathcal{P}}=\left\{\theta: \sum_j p_j=1, p_j\geq 0\right\}.\)

The second derivative of L reads

$$ L_{jj'}(\theta):={\frac{\partial^2L(\theta)}{\partial p_{j}\partial p_{j'}}} =-\sum_{i=1}^{M} {\frac{ {\mathbf 1}_{\{j\in A_i\}}{\mathbf 1}_{\{j'\in A_i\}} }{\left(\sum_{j''\in A_i} p_{j''}\right)^2 }}. $$

Since matrix {L jj'} is negative definite, i.e., ∑ jj' a j L jj'(θ) a j' ≤ 0, function L is concave. As a general fact, a continuous function L achieves its supremum on a compact set \({\mathcal{P}}\) (Rudin 1974, Theorem 2.10). If additionally L is concave and its domain \({\mathcal{P}}\) is convex then the local maxima of L form a convex set \({\mathcal{M}}\subset {\mathcal{P}},\) where L is constant and achieves its supremum (Boyd and Vandenberghe 2004, Sect. 4.2.2).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dębowski, Ł. Valence extraction using EM selection and co-occurrence matrices. Lang Resources & Evaluation 43, 301–327 (2009). https://doi.org/10.1007/s10579-009-9100-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-009-9100-5

Keywords

Navigation