Compound-internal anaphora: evidence from acceptability judgements on Italian argumental compounds

The particular properties of argumental compounds in Italian pose interesting theoretical challenges, and investigations of possible syntactic operations within this type of complex words have resulted in conflicting conclusions. Regarding compound-internal anaphora, some researchers exclude the possibility that pronouns can refer to the non-head, while others do not. However, these findings have been based on researchers’ intuitions and on occurrences in language corpora, and while intuitions have been shown to give contrasting results, the absence of a grammatical structure in a corpus should not be taken as evidence that the structure is not possible. The present study aims to experimentally determine the possibility of compound-internal pronominal reference based on structural properties of compounds and referential expressions. Judgements were obtained from 140 Italian native speakers who rated the acceptability of sentences containing a pronoun (null or overt) referring to the argument element of an argumental compound. The results indicate that compound-internal anaphoric reference is acceptable in the case of left-headed compounds and, to a somewhat lesser extent, of verb-noun compounds. The argument element of right-headed compounds, however, does not appear to be available to anaphoric reference. Referential expressions also play a role in the degree of acceptability, with left-headed compounds allowing null form anaphora to a greater extent. These results provide new evidence on compound-internal pronominal reference and give important insights into the processing of argumental compounds.


Introduction
Argumental compounds in Italian show features that make them more accessible to syntax than other types of compounds. However, while most of their syntactic peculiarities are highly documented, the acceptability of pronominal reference internal to the compound has remained a disputed issue.
Generally, results suggesting the non-acceptability of compound-internal anaphoric reference in Italian are based on researchers' intuitions or on the limited presence of this phenomenon in corpora. However, the acceptability of compoundinternal anaphoric reference should not be dismissed only based on individual judgments or on the absence of certain patterns in corpus research. Both these approaches have intrinsic limits: while intuitions might change from linguist to linguist, the limited presence of compound-internal anaphora in corpora does not necessarily indicate its non-acceptability. Moreover, corpus-based analysis does not allow for more finegrained considerations on (non-)acceptability constraints.
Drawing on the results of an acceptability judgement task, we provide evidence that Italian argumental compounds do allow pronominal reference to the argument element depending on their structure and on the quality of the referential expression (i.e., null vs overt pronoun). It is shown that the position of the head plays a decisive role, and while compound-internal anaphora is accepted with left-headed compounds and, to a minor extent, with exocentric compounds, the same is not true for rightheaded compounds. Moreover, it has been found that left-headed compounds allow null-subject anaphora to a greater extent, possibly due to pragmatic factors.
Hence, the test made it possible to single out detailed variables that could not otherwise have been observed in corpus-based research. Our results show the benefit of an integration of an experimental method with theoretical considerations and corpus-based research.
The paper is organized as follows: Sect. 2 provides a background introduction to argumental compounds in Italian (Sect. 2.1), an overview of in-word anaphora with some conflicting positions regarding its acceptability in Italian argumental compounds (Sect. 2.2) and current issues in research (Sect. 2.3). In Sect. 3, we present our study and in Sect. 4 we discuss the results, which show that compound-internal anaphora seems to be accepted by native speakers with important differences according to the position of the head and the nature of referential expression. Section 5 presents our conclusions.
In this study, we avoid a label focused on the morphological nature of the head and use "argumental compound", following Baroni et al. (2009a). In fact, as Scalise and Guevara (2006) also point out, an argumental interpretation may occur even in the absence of a deverbal element, and deverbal constituents do not necessarily project argumental structure. 2 We limit our investigation to the argumental relation, 3 as syntactic considerations are at the basis of our research question and may be crucial in specific syntactic phenomena such as pronominal anaphora. 4 A basic division between argumental and non-argumental compounds was also proposed by Bauer et al. (2013) for English. The division is based on the assumption that argument structure allows a more direct interpretation, being semantically more predictable and constrained, while the interpretation of a predicate-adjunct relation is highly variable and largely determined by the context (see also Mackenzie, 1990;Haspelmath, 2002;Bauer, 2009;Guerrero Medina, 2018, among many others).

Structure of Italian argumental compounds
Italian argumental compounds can be endocentric, i.e., possessing the head inside the compound, or exocentric, i.e. lacking a head constituent. In argumental endocentric compounds, the head selects the non-head (e.g., donatore HEAD sangue ARGUMENT 'blood donor', lit. 'donor blood'), while in argumental exocentric compounds the verbal element selects the nominal element based on argumental restrictions (e.g., lava VERB piatti ARGUMENT 'dishwasher', lit. 'washdishes') (Scalise & Guevara, 2006). 5 In endocentric argumental compounds, a noun is selected as the internal argument by another (usually deverbal) noun or nominalization representing the head (Scalise et al., 2005): (1) trasporto transportation.M.SG latte milk.M.SG 'milk transportation' In this case latte 'milk' is selected by the predicate indicated by the deverbal head trasporto 'transportation'. This structure, where the head is on the left of the compound (N H N henceforth), is assumed to be the archetypal one in Italian (Scalise 1990(Scalise , 1994Bisetto & Scalise, 1999, Scalise & Fábregas, 2010 and other Romance languages, as opposed to Germanic languages (Selkirk, 1982;Scalise, 1986;Lieber, 2009;Melloni, 2020). However, it is also possible for the argument to appear as the first element.
(2) autonoleggio car.F.INV.rental.M.SG. 'car rental' In example (2), auto 'car' is the argument of the nominalized form noleggio 'rental', on the right side of the compound. A right-headed structure (NN H henceforth) in Italian has been assumed to represent relics of Latin composition or foreign calques (e.g., frutticoltura 'fruit farming', scuola bus 'school bus', Scalise 1990Scalise , 1994Masini & Scalise, 2012), to be restricted to a small set of nouns (e.g., auto-as in (2), Iacobini, 2004;Schwarze, 2005;Radimský, 2006;Booij, 2010), or to be subject to phonological constraints (Altakhaineh, 2019). Scalise and Fábregas (2010) claim that the right-headedness of many productive compounds (e.g., autostrada 'highway') represents a learned pattern where the first element is a semi-word (i.e., a learned word that has become a free lexeme), and thus a neoclassical order is present even in words that never existed in the classical languages (see also Iacobini, 2004).
Differences in processing between left-and right-headed compounds have been confirmed experimentally (El Yagoubi et al., 2008;Marelli et al., 2009;Marelli & Luzzatti, 2012;Arcara et al. 2013Arcara et al. , 2014. However, right-headed compounds have been argued to represent a productive word-formation process in contemporary Italian (Guevara & Scalise, 2009;Marelli & Luzzatti, 2012;Radimský 2013aRadimský , 2013bRadimský , 2015. Radimský (2013b) showed that while it is true that NN H compounds often contain elements derived from neoclassical terms and belong to a specialized lexicon, nowadays they can also be formed with ordinary nouns from the common lexicon, "becoming a vital word-formation paradigm in contemporary Italian" (Radimský, 2013a:44).
Despite their debatable nature, we included right-headed compounds in our experiment in order to examine their behavior regarding word-internal anaphora to shed more light on their properties. Due to their increasing presence in the contemporary Italian vocabulary, the position of the head has been argued to be an important criterion to consider if we aim to reach an exhaustive analysis of Italian compounds (Bisetto, 2004;Radimský 2013bRadimský , 2015 and hence, for the reasons illustrated here, we believe that our experiment may help answer questions on the quality of this peculiar compound structure. In Italian, exocentric argumental compounds have a verb + noun structure (VN henceforth). Being neither the verb nor the noun responsible for the semantic or the syntactical properties of the compound, VN compounds do not possess a head  (Scalise, 1992b;Bauer, 2010;Masini & Scalise, 2012;Ricca, 2015). 6 These compounds are very productive in the Romance languages (Tekavčić, 1972;Gather, 2001). 7 The syntactic relation between the elements in Italian VN compounds is almost exclusively that of a predicate and its internal argument (see Scalise, 1992b;Scalise et al., 2009, according  The argument of the verb can be either a direct object of a transitive verb as in (3) or a subject (e.g., batticuore, 'heart palpitations', lit. 'pound heart'). However, VN compounds almost exclusively possess an agentive interpretation (Bisetto, 1994;Gaeta & Ricca, 2009;Scalise et al., 2009), and a transitive reading is the most common and productive (Bisetto, 1999). Figure 1 shows the typology of Italian argumental compounds.
papers, advertising, bureaucratic documents, web language) and not often in spoken language (Baroni et al., 2009b;Bisetto 2010Bisetto , 2015. Their ambiguous nature, at the border between morphology and syntax, has even challenged the possibility of categorizing these structures as 'compounds': they are defined as 'compound-like phrases' by Bisetto and Scalise (1999) and Bisetto (2015) 11 and considered to be the remains of 'juxtaposition genitives' of early phases of the language by Delfitto and Paradisi (2009a). Baroni et al. (2009b) do not incorporate these formations within a single class. According to these authors, this structure includes regular compounds (without internal modifiers) as well as instances of "headlinese phrases" (with internal modifiers). Gaeta and Ricca (2009) and Radimský (2015), however, insist that these structures should be included in the group of subordinate compounds instead.
One feature that has been extensively debated is their transparency to insertion: these constructions allow for modification of the head (4a), the non-head (4b)  Modification of the head is considered more problematic by Delfitto and Paradisi (2009a), while Gaeta and Ricca (2009) attest head modification even with nonargumental compounds. Argument modification is more common than head modification (Radimský, 2015), not only with adjectives but also with more complex NPs. Both the head and the non-head may consist of two coordinated nouns, as in (5a) with coordinated heads, in (5b) with two coordinated arguments without the specification of the second, and in (5c) where the two coordinated arguments are modified by an adjective and followed by a relative clause. The acceptability of head deletion under coordination is debated. According to Gaeta and Ricca (2009), it is observed even with non-argumental compounds. Bisetto and Scalise (1999) consider (6a) marginally acceptable 12 while Lieber and Scalise (2006) consider it ungrammatical. According to Delfitto and Paradisi (2009a), it is possible only if the ellipsis is licensed by an indefinite determiner as in (6b), something that is also suggested by Radimský (2015), who notes the possibility of head deletion in absence of a determiner 13 (6c) (the examples in (6) are adapted from the ones discussed in these studies):  (Ricca 2005(Ricca , 2010Bisetto, 2015), and even by extremely complex structures (example from Gaeta & Ricca, 2009): 12 See also Masini and Scalise's (2012) example: ?Il lavoro consiste in una raccolta-fondi e dati, lit. 'The job consists of collection-funds and data'. 13 Radimský (2015) emphasizes that the acceptability of (6c) might be due to the interpretation of the objects as two coordinated arguments, i.e. il trasporto [passeggeri e merci] '[passengers-and goods] transportation', hence becoming a case of insertion and not of head deletion. Head deletion has been observed with single, coordinated and modified nouns (Ricca, 2005) and also with very complex NPs, example from Bisetto (2015)

In-word anaphora
In his influential paper, Postal (1969), based on introspections of his own dialectal English variety, identifies several constraints on the acceptability of pronominal anaphora, and formulates the generalization that complex words 15 are "anaphoric islands", i.e., they cannot contain a subpart functioning as an antecedent for subsequent anaphora. 16 Hence, while the sentences in (11) are acceptable, those in (12) are, in his view, ungrammatical: 14 It has been emphasized (Bisetto, 2010;Ricca, 2015) that recursive VN compounds should not be confused with structures showing a coordinative relationship between the verbal elements (e.g., lavatergilunotto 'rear window wiper/washer'). The relationship that ties the elements together is in fact structurally different: (10) is a truly recursive compound where the base represents the internal argument of the added verb ([V [VN]] N ), while in the other case the two verbal elements express a coordinative relation and share the same argument ([[VN] [VN]] N ), being a coordinative compound and not a subordinate one. Ricca (2015) stresses that recursive [V [VN]] N structures only exist in compositional constructions, as VV compounds are rare in Romance languages. 15 In his analysis, "complex words" should not be considered from a morphological point of view only: his investigation touches on many different issues, ranging from morphology to information structure and semantics. 16 He defines this phenomenon as "outbound anaphora", which is what we investigate in our study. We did not analyze inbound anaphora (i.e., where the referential expression becomes a sub-part of a word) as these two phenomena are structurally different (Haspelmath, 2011).
(11) a. Max hunts for wild animals i but Pete only kills domesticated ones i . b. People with long legs i don't like people with short ones i . c. Those who teach classical languages i don't appreciate people who deal with modern ones i .
(12) a. *Max is a wild-animal i hunter but Pete only kills domesticated ones i . b. *Long-leg i ged people don't like people with short ones i . c. *Classical language i teachers don't appreciate people who deal with modern ones i .
After Postal (1969) made the claim of word-islandhood, a debate arose: "islands" have been argued to be "peninsulas" (Corum, 1973;Browne, 1974;Lieber, 1992), suggesting that this phenomenon does not involve a categorical constraint. Lakoff and Ross (1972) tried to individuate elements facilitating outbound anaphora to account for its tendencies and proposed, among other things, that it is more acceptable if the morphologically complex word (in this case, a derivative word) containing the antecedent does not c-command the pronoun. Therefore, (13a) is predicted to be less acceptable than (13b): (13) a. ?* The guitar i ist thought that it i was a beautiful instrument. b. ? John became a guitar i ist because he thought that it i was a beautiful instrument.
Their approach, arguing for 'tendencies' and not 'constraints', is shared by Dressler (1987). In fact, he points out that words' subparts are syntactically inaccessible only if we postulate the existence of a unidirectional flow of information, advocating instead for interactional models. He stresses how problematic it is to consider a tendency as an absolute constraint with ad-hoc hypotheses created to confirm such absolutism, and notices that pronominal anaphora can indeed have as its antecedent the argument element of an argumental compound 17 (however, limiting this possibility only to VN compounds), invoking the important role of semantic transparency, i.e., the clear decompositionality of a complex word into its parts. In fact, not only does he specify that the less tightly the lexemes are bonded, the more open to syntax they are (i.e., a structural property), but also the more semantically transparent the lexemes are (i.e., a semantic property). 18 Ward et al. (1991) consider in-word anaphora as a gradient phenomenon, completely motivated by pragmatic factors. Moreover, contrary to what is claimed by Lakoff and Ross (1972), according to Ward et al. (1991:449) neither the syntactic role of the antecedent nor the morphological relation between antecedent and pronoun are decisive for its acceptability: "the degree to which outbound anaphora is felicitous is determined by the relative accessibility of the discourse entities evoked by word-internal elements, and not by any principles of syntax or morphology".
Based on the results of an experimental study, they show that antecedents of outbound anaphora appear to be more easily accessible if already implicitly present in the discourse. They also present an interesting example with an argumental compound: Although casual cocaine i use is down, the number of people using it i routinely has increased.
To account for cases such as (14), they too invoke the notion of semantic transparency.
In this case, cocaine use is easily decomposed because of the interpretation of the argument structure. According to their analysis, since both the predicate use and the argument cocaine are lexically accessible, the discourse entity becomes contextually salient and therefore accessible as the antecedent for a pronoun.

Compound-internal anaphora in Italian argumental compounds
Regarding Italian argumental compounds, some opposing views have been proposed. Scalise (1992a) excludes the possibility that one element of the compound can be the antecedent of anaphora. While it is shown that a word in isolation possesses referential capacity, the same word is assumed not to feature this syntactic property when it is part of a compound (in this case a VN argumental compound):  Scalise (1992a) gives the example (15b) to demonstrate a postulated anaphoric islandhood of compounds. However this is not particularly felicitous since it is explainable only in terms of an abrupt change of subject. Hence, such an example would result in an ill-formed sentence even with a normal NP: In our opinion, (17) does not provide solid proof for unacceptability, because a semantic bias (and possibly a syntactic one) may be the reason why this sentence is illformed. While 'passenger transportation' refers generically to passengers that can be transported, the act of knowing them implies in fact a specific reference. An example that presents a more natural context for the pronoun appears in fact more acceptable:  Delfitto and Paradisi (2009a:55-56) also argue that the ill-formedness of (17) is due to other factors than anaphoric opacity and observe that "anaphora is allowed in cases [...] where the resuming pronoun matches the referential features of the non-head constituent to be resumed":  Bisetto and Scalise's (1999) example (17) which may explain its acceptability. First, the compound does not ccommand the anaphora, something that arguably hinders in-word anaphora (Lakoff & Ross, 1972). Second, the referring element questi ultimi i 'the latter' is not a pronominal form, but a full NP, something that is argued to facilitate in-word anaphora (Montermini, 2006). For these reasons, example (19) does not represent true counterevidence to (17). Lieber and Scalise (2006) reject the possibility of compound-internal pronominal anaphora for NN compounds based on the example (17) proposed by Bisetto and Scalise (1999). They acknowledge the long dispute regarding the Lexical Integrity Hypothesis and coreference into complex words, stating that "further investigation is needed on various factors which seem to influence judgments, including differences between derivation and compounding, the type of syntactic construction involved, the typology of the language in question, the productivity of forms, and so on" (Lieber & Scalise, 2006:12). Radimský (2015) points out that the comparisons between anaphora in argumental compounds in opposition to phrases by Bisetto and Scalise (1999) cannot give insights into the relation between morphology and syntax. They compare (17)  certain.ADJ.F.SG 'That firm deals with the daily transportation of milk but its freshness is uncertain.' As Radimský (2015) observes, while in (20) the anaphora is governed by a full DP del latte 'of the milk', in (17) it concerns a bare noun. However, he does not consider pronominal reference to the non-head acceptable, although he admits the presence of some evidence in corpora 'under certain circumstances' which he explains in terms of discourse phenomena rather than syntactic properties, in agreement with Montermini (2006). He shows an example from Bisetto (2004), who admits that "pronominal reference to the non-head constituent is sometimes possible, even though in sentences that are often peculiar" (Bisetto, 2004:35, our Bisetto (2004) also states that these structures show no variability in the relation between the constituents, underlining the peculiarity of argumental compounds. Example (21) is cited by Masini and Scalise (2012) as well, to show the anaphoric capacity of non-head elements of N H N compounds. However, they do not consider pronominal anaphora with VN compounds acceptable. It is interesting to notice that a similar example was used by Dressler (1987) to precisely show its acceptability:  Radimský (2015) who does not accept compound-internal anaphora with NN compounds, also gives an example from Grandi (2006) to represent an exception, concerning precisely VN compounds.  Grandi (2006:34) describes the structure of VN compounds as showing "a rather low degree of syntactic atomicity, since, in violation of the Lexical Integrity Hypothesis, it allows the relativization of the sole second constituent" (see also Gaeta & Ricca, 2009;Ricca, 2010 for similar considerations). Grandi (2006) also points out that examples such as (24) show that this phenomenon not only is acceptable with new formations but also with compounds that are well established in the lexicon. Regarding the syntactic behaviour of new formations, Ricca (2005) notices that corpora show a permeability of VN compounds to syntax even with nonce words:  Ricca (2005) underlines that a closer look at new formations can provide insights into formation rules since these instances are not stabilized in speakers' mental lexicon and hence are not formed by idiosyncratic semantic evolutions. Arcodia et al. (2009) consider the phenomenon unusual but do not dismiss it as unacceptable. On the contrary, they acknowledge how the referential capacity of one element strongly challenges views of grammar where syntactic rules apply only after morphological rules. Baroni et al. (2009b) define the referential capacity of the nonhead as one of the peculiar properties of NN argumental compounds, underlining the difference with English, where pronominal reference is not acceptable: A thorough discussion of the possibility of outbound anaphora in Italian complex words is presented by Montermini (2006). This study analyzes the phenomenon in general, including several kinds of complex words and referential expressions. The author underlines that demonstratives and full NPs are more acceptable than pronouns, since these involve less referential ambiguity, as in (19). He also points out that the hypothetical universal parameter by Dressler (1987), according to which compounds are more transparent to syntax than derivatives (i.e., complex words created by adding bound affixes to a root instead of to free lexemes) is uncertain, as in Italian it appears to be valid for VN compounds much more than for NN compounds. An explanation, according to the author, is the prominent presence of coordinative compounds: the elements of this compound type are in fact on the same syntactical level, and this would favour referential ambiguity and hence discourage its acceptance. This would thus imply an unbalanced presence of the phenomenon in corpora, rather than a structural difference in acceptability for VN compounds as opposed to NN compounds.

Current issues
Even though the previous studies represent important investigations of Italian compounds, we believe that neither single intuitions nor corpus-based methods can answer the question whether compound-internal pronominal anaphora is acceptable and whether there are degrees of acceptability caused by the structure of the compound and the quality of referential expression (see the review on the limits of corpus research by Dash & Ramamoorthy, 2019, among others). Experimental evidence is essential to address these topics (see Myers, 2017 on the importance of acceptability judgments).
A methodological issue seems to lie at the basis of this uncertainty in the literature. Many researchers draw conclusions based on their intuitions which are typically based on theoretical stands. In addition to theoretical reflections based on intuitions, much research on Italian compounds is based on corpora. Even though corpus research on Italian compounds has provided and continues to provide important insights (see among many others, the impressive work of Radimský, 2015), it may not be the best method to investigate specific research questions such as the acceptability of compound-internal pronominal anaphora. As Micheli (2016) underlines, the use of corpora in the investigation of Italian compounds is a challenging research method: compounds in Italian are relatively rare lexical entities, and tend to be used in restricted contexts (this appears to be particularly true for argumental compounds (Baroni et al., 2009a,b;Bisetto 2010Bisetto , 2015). Due to their low frequency, it is hard to record sufficiently many occurrences to let linguists draw conclusions on specific phenomena.
Moreover, negative evidence (or weak positive evidence) in corpus-based research, as Baroni et al. (2009a,b) point out, does not necessarily mean that a phenomenon is unacceptable, because it may reflect other types of bias such as stylistic preferences or pragmatics-related factors.
An experimental investigation of anaphoric reference, however, can answer specific questions, and provide insights into structural differences regarding compounds as well as referential expressions. This is of crucial importance, especially when reflecting on the conclusions drawn by previous studies. As described in the previous section, Montermini (2006) states that NN compounds are statistically more open to anaphoric reference than VN compounds, based on the high frequency of coordinative NN compounds (which have no dependency relation between the elements, and hence the reference would be ambiguous). Montermini (2006) expresses doubts on acceptability judgements, stating that these are subject to variation. This variation is what we are specifically interested in since it appears that we are dealing with tendencies and cannot expect clear-cut answers. He also points out that real instances of word-internal anaphora would be considered unacceptable in an acceptability judgement task. We agree that metalinguistic reflections tend to be more prescriptive in nature; however, we wonder if this could be prevented by explicitly asking informants to think about informal contexts. Finally, if what Montermini (2006) states is true, we can safely assume that our results underestimate the phenomenon, which would mean that the phenomenon is even more acceptable than what our data suggest.
Acknowledging the importance of pragmatic factors (Montermini, 2006;Ward et al., 1991), an acceptability judgment task allows us to maintain the same structure for the target sentences, so that informants are not biased by differences in information packaging. In every sentence, the pronoun clearly refers back to an element of the compound, and we only ask to rate the acceptability of the utterance. Moreover, based on Montermini's (2006) observations on the role of the referential expression, we only used overt direct object pronouns and null subject pronouns, but no demonstratives or full NPs. The fact that less ambiguous referents are more acceptable than pronouns is, according to Montermini (2006), a corroborating element of the pragmatic account for word-internal anaphora acceptability, and this is why we focused on basic forms such as pronouns and zero forms. It is important to underline that Montermini's (2006) analysis encompasses all sorts of complex words and referential expressions, hence, compared to our much narrower research, a corpus-based investigation is more feasible and allows to draw general conclusions. However, aiming to address more fine-grained issues, it would have been impossible to obtain such a control of the data if we based our investigation on corpora. Montermini (2006) and Radimský (2015) agree that the acceptability of wordinternal anaphora is explained by discourse phenomena rather than syntactic properties. We do not deny the role that pragmatic factors play in the acceptability of compound-internal pronominal anaphora. We only think that (in agreement with Ward et al., 1991) if this phenomenon is truly unacceptable, there are no contexts in which it is acceptable. Pragmatics cannot change the essence of all possible unnatural sentences making them natural. Moreover, pragmatics may be at the basis of why this phenomenon occurs, but this fact alone cannot provide an explanation for how this phenomenon occurs, and since evidence has been found in corpora, it is important to analyze the tendencies of compound-internal anaphora focusing on its form rather than on its function.

Materials 19
The target argumental compounds were either VN, N H N or NN H . We selected items based on the frequency and usage of the compounds and the argument constituents, and the morpho-semantic transparency of the constituents. Targets selected for the experiment are listed in Table 1. There were ten of each type, five of which were combined with an overt pronoun and five of which with a null pronoun, as shown in the table.
In the selection of the target items, we aimed to find compounds that were similar in frequency, and that contained non-heads which were also similar in frequency. Both the compounds in Table 1 and the argument elements, are classified in the online vocabulary Il nuovo vocabolario di base della lingua italiana (De Mauro, 2014), as either fondamentale 'fundamental', di alto uso 'high usage', di alta disponibilità 'high availability', or comune 'common'. In addition, we retrieved corpus frequencies for the compounds and their constituents from itTenTen 2016, a 4.9 billion word corpus consisting of Internet texts, available on SketchEngine (Jakubíček et al., 2013). Figure 2 shows the result. In this figure, frequencies of the compounds in the corpus are shown as green boxes in the lower part of the graph, and those of the first and second constituent within each compound type as adjacent boxes above the green boxes. Grey boxes represent the frequencies of the predicate element of the compounds, and the red boxes those of the objects. The frequency range of the compounds is 200 to 5,873, and that of the argument constituents is 56,978 to 1,362,327.
The experimental target sentences consisted of two or more clauses. Two examples are given in (27). The first clause contained one of the compounds while the last clause contained an overt pronoun (as in 27a) or a null pronoun (as in 27b) referring back to the argument element of the compound in the first clause.  Compounds were also matched for morpho-semantic transparency of the constituents: the referring pronoun agreed in number and gender with the argument element but not with the predicate element, as in (28a)  A total of 20 distractor items were added to the experimental sentences with the purpose of masking the purpose of the experiment and to encourage the respondents to make use of all values on the acceptability scale. The distractor items contained compounds just like the target sentences. Ten of them were grammatical, and ten were not. With that, the total number of sentences presented to the respondents was 50.

Procedure
The questionnaire was carried out online through a web-hosted survey platform. At the onset, informants received information that they participated in a study on native speaker judgements and that their task would be to judge the acceptability of 50 sentences on a five-point scale (Dawes, 2008) from completely unnatural to totally natural. According to findings from empirical linguistics, Likert scales provide reliable results (Murphy & Vogel, 2008;Weskott & Fanselow, 2011;Juzek, 2015).
Italian N H N compounds are usually written as separate words, while NN H and VN compounds are written as single words. This difference in orthography presents a potential confounding factor (e.g., Marelli et al., 2015;Juhasz et al., 2005) that we wished to avoid. To do so, the stimulus sentences in the present study were not written but generated by a natural sounding online synthetic voice generator. Informants could listen to the sentences as many times as they wanted and did not have to respond within a particular time limit. They were asked to base their judgments on their intuitions, and were informed that there were no right or wrong answers. Finally, participants were told not to judge the sentences according to a prescriptive criterion but only in terms of how natural they sounded in informal, spoken contexts. After the instructions, the informants provided information about their language background, education, gender and age. The sentences were presented in a random order.

Results
The test was completed by 93 women and 47 men, all monolingual native speakers of Italian. Their age ranged from 23 to 75 years (51 years on average). All but two participants reported that they spoke at least one additional language, mostly English, but also German, Spanish and French. The majority lived in Italy at the time of their responses, but 13 reported living in another country. The highest educational level obtained reported by the participants was middle school (1), high-school (19), Bachelor's degree (17), Master's degree (81), Ph.D. (16), or other (6).
All respondents responded to all sentences. The total number of responses, therefore, was 7,000, 4,200 to the target sentences and 2,800 to the distractor sentences. Figure 3 shows the response distributions to the target items separately for those containing null pronouns (top row) and overt pronouns (bottom row).
The distributions in Fig. 3 indicate no clear agreement across the participants for any of the three types. However, positive responses appear to outnumber negative responses for N H N compounds and VN compounds, suggesting that the participants may have found the sentences a bit strange but did not dismiss them as unnatural. The responses to the NN H compounds on the other hand were definitely more negative.
Below, we present the results of two analyses. In the first analysis, the ratings of the target items are compared with those of the distractors. The purpose of this analysis is to establish whether the target sentences were rated comparably to the grammatical or the ungrammatical distractors. This analysis was done on the full data set. The second analysis focuses on the effect of pronoun (overt or null) on the ratings of the target types. The purpose of this analysis is to establish whether and how the ratings to the different target types were modified depending on whether the pronoun was Fig. 3 Response distributions. Bars indicate counts of responses from least to most acceptable overt or null. Since distractor sentences did not contain overt or null pronouns, this analysis was done on the target items only. For the two analyses, the ratings were replaced by the values -2 to +2. Both analyses consisted of mixed-effects regression modeling described in more detail below. The analyses were performed in R (version 4.0.2, R Core Team, 2020).

Analysis 1
The effects of the item categories (fixed effects) were evaluated using the ungrammatical distractors as a reference category to which the other categories were compared (so-called treatment contrast coding). The analysis also included random intercepts for the participants and for the items. The overall effect of item category was significant (likelihood ratio test: chi-square = 53.174, df = 4, p = .000). The difference between ungrammatical distractors and target sentences was significant for targets with VN compounds (EST = 0.937, SE = 0.237, df = 49.95, t = 3.961, p = 0.000) and for target sentences with N H N compounds (EST = 1.072, SE = 0.237, df = 49.95, t = 4.532, p = 0.000), but not for targets with NN H compounds (EST = 0.250, SE = 0.237, df = 49.95, t = 1.057, p = 0.300). Not surprisingly, the difference between ungrammatical and grammatical distractors was significant (EST = 2.081, SE = 0.237, df = 49.95, t = 8.798, p = 0.000). Figure 4 shows, from left to right, the model-based estimated ratings and 95% confidence intervals for the ungrammatical distractors, the three types of target sentences, and the grammatical distractors. All target sentences were rated higher than the ungrammatical distractors (notably those with VN and N H N compounds), and lower than the grammatical distractors (notably those with NN H compounds).

Analysis 2
The second analysis focuses on the target sentences only and the role of the pronouns in the acceptability judgements. The fixed effects in this analysis were compound type (VN, N H N or NN H ), pronoun type (Null or Overt), and the interaction of these two predictors. The random effects, as in the first analysis, were random intercepts for items and for participants. As already suggested by the previous analysis, the sentences containing NN H compounds were rated lower than those containing VN and N H N compounds. The overall effect of compound was significant (likelihood ratio test: chi-square = 15.131, df = 2, p = .001). More specifically, the ratings of sentences containing NN H compounds were significantly lower than those of sentences containing VN compounds (EST = −0.687, SE = 0.199, z = −3.458, p = 0.002 20 ) and those containing N H N compounds (EST = 0.822, SE = 0.199, z = 4.137, p = 0.000). The difference between sentences containing VN compounds and those containing N H N compounds, on the other hand, was not significant (EST = 0.135,SE = 0.199,z = 0.679,p = 0.776).
Pronoun as an additional fixed effect did not significantly improve the model (likelihood ratio test: chi square = 1.756, df = 1, p = 0.185), and the interaction of pronoun and compound, was only marginally significant (likelihood ratio test: χ 2 = 5.267, df = 2, p = 0.071). In sum, sentences with VN and N H N compounds

Discussion
The aim of this experiment was to determine whether Italian native speakers consider compound-internal pronominal reference acceptable, and the degree to which differences in compound structure and referential expressions affect their acceptability.
The results of Analysis 1 (Sect. 3.3.1) suggest that compound-internal anaphora is largely acceptable for N H N and VN compounds, but not acceptable for NN H compounds. This difference was expected, and is in line with psycholinguistic evidence. Marelli et al. (2009), for instance, investigated priming effects with N H N, NN H and VN compounds, and found that while the mental representation of left-headed and exocentric compounds is tied to both constituents, the one for NN H compounds is strongly tied to the head. This corroborates the theoretical considerations by Di Sciullo and Williams (1987) according to whom left-headed and VN compounds show a lexicalization of syntactic structures (a 'flat representation'), while right-headed compounds represent true morphological objects (a hierarchical representation). Marelli et al. (2009) argue that the internal syntactic structure of VN compounds makes them similar to VPs, where the verb is the most important element, both syntactically and semantically. El Yagoubi et al. (2008) and Arcara et al. (2013) also found neurolinguistic evidence of a higher processing load for NN H compounds compared to N H N compounds. El Yagoubi et al. (2008) propose that this is due to the internal order of left-headed compounds reflecting the canonical Italian order of syntactic elements, thus drawing similar conclusions as Marelli et al. (2009). Interestingly, Arcara et al. (2013) show that canonical order does not fully explain compound processing, since VN and NN H compounds appear similarly affected by decompositional effect. They argue that this might be explained in terms of different grammatical properties leading to a different integration of constituents, but also in terms of their productivity in Italian, which led to expectations on their orthography (see also Arcara et al., 2014). As the authors underline (in line with Marelli & Luzzatti, 2012), compound processing often implies the interaction of parallel morphological and semantic information, and since N H N and VN compounds differ morphologically and semantically, this arguably plays a role in their processing. For these reasons, we think that limiting the analysis to argumental compounds is important to compare similar syntactic and semantic relations.
Regarding further differences between left-and right-headed compounds, Radimský (2013a,b) notices a strong correlation between orthography and the position of the head. In his investigation on "mirror compounds" (i.e., compounds that can be both left-and right-headed, e.g., radio.giornale H vs. giornale H radio 'radio news'), he observes that right-headed compounds are consistently spelled as one word (i.e., 'tight compounds') while left-headed compounds are spelled as two (i.e., 'loose compounds'). While orthography is not a reliable criterion in establishing the degree of 'wordhood' of a linguistic element, it may indicate linguistic intuitions by native speakers and hence, an orthographical fusion of the members might be an indicator of unity (Tollemache, 1945;Iacobini, 2010;Gaeta, 2011), reflecting "different mental word-formation models in the mind of language users" (Radimský, 2013a:48).
Therefore, we can interpret our results as reflecting the minor accessibility of the argument element of NN H compounds, and hence less tied to mental representation, which makes it less available for pronominal anaphora.
These results pose some interesting questions on how complex words are built. If we assume that lexemes are at the basis of compounding then it becomes problematic to account for our results. Lexemes are abstract units of lexical organization and lack grammatical properties (e.g., definiteness): hence they cannot have a referential status. Montermini (2010) argues that in some cases (e.g., argumental compounds), concrete word forms appear to be at the basis of compound formation. However, he also mentions neoclassical compounds as a particular type on the basis of their elements, which are not independent syntactic elements, e.g., cardiologo 'cardiologist': building elements such as cardio-would represent a suppletive form of the autonomous cuore 'heart' (in agreement with Guevara & Scalise, 2009), possessing a [+bound] feature in the lexical representation (Corbin, 1992). Montermini (2010) states that neoclassical compounds are sometimes based on different rules than those of 'native' compounds, and from our results, it appears that the structure, rather than the mere semantic transparency of the elements (we used only semantically transparent elements in our target) plays a greater role.
These considerations are highly interrelated with those on the flat and hierarchical representations for N H N and NN H compounds respectively, and possibly with mirror compounds. We argue that examples such as lavaggio H auto 'car wash' or noleggio H video 'video rental' indicate a type of activity or event (expressed by the deverbal noun) specialized by its argument (i.e., 'a kind of wash/rental specific for cars/video'), while the NN H counterpart (in this cases auto.lavaggio H , video.noleggio H ) preferably indicates an entity (i.e., 'the place one goes to have the car washed or the videos rented'). This semantic specialization may be the cause of an increased lexicalization, which would further increase the opacity of the argument element: the metonymic shift would hence be systematic thanks to the structural feature of the position of the head. However, as already noted by Ricca (2015) regarding VN compounds, it is difficult to determine whether this semantic shift has to be considered as the result of word-formation rules or if it reflects a more general cross-linguistic polysemy involving action nouns and the place where the activity is performed. 21 Additional investigations are needed in order to establish whether this phenomenon represents a clear measurable tendency in the language, or rather whether these examples are merely isolated instances and not markers of a greater tendency.
The very low degree of acceptance of pronominal anaphora in NN H compounds may be explained in terms of speakers' world knowledge. As Montermini (2006) states, a predictable relationship between the referent and a derivative would represent a facilitating condition for in-word anaphora, and he observes that this would explain why geographical nouns and adjectives are often available for in-word anaphora even when not morphologically transparent: The ethnic nouns in (29) in Italian are not equally transparent (francese 'French' -Francia 'France' vs. tedesco 'German' -Germania 'Germany'), yet both of them accept in-word anaphora, something that is explained by Montermini (2006) precisely in terms of the degree of predictability between the referent and the derived word, i.e., speakers' world knowledge (see Bresnan, 1971, who argues that subwords are interpreted as antecedents because they are inferred rather than grammatically assigned). However, and this is why carefully designed experiments are crucial, it is not clear what kinds of inferences and inferred antecedents are acceptable or not, and further research is needed to investigate the role of other possible aspects (e.g., word length, familiarity with referents, etc.).
Something that was pointed out by an anonymous reviewer is the question of whether it is possible at all to disentangle pragmatic factors from purely grammatical ones. In this respect, our test cannot solve all complex issues regarding the influence of pragmatics. However, all the items in the test were out-of-context sentences and consisted of a very similar structure. Yet, a preference for syntactic strategies over others is clear from the results. This would support our assumption that right-headed compounds are not open to compound-internal anaphora, possibly due to a difference in their qualitative nature.
Our results also show that sentences with overt pronouns as referential expressions are on average more acceptable than those with null pronouns. One notable exception is the one represented by N H N compounds. In this case, sentences with null pronouns are more accepted than those with overt pronouns. The interplay between information structure and syntactic role is not clear. Italian is a null-subject language, and null forms refer to subjects, while overt pronouns refer to direct objects. An explanation for our results may be related to the influence of information structure. Syntactic functions are tightly related to informative functions and a correlation between subjects and topics is well known (Lambrecht, 1994). 22 However, future studies may establish why this is the case only for N H N compounds and not for VN compounds. This finding might be linked to the one by Arcara et al. (2013) on the similar processing of VN and NN H compounds: N H N compounds might possess a more syntactic reading, while the higher lexical cohesion of VN compounds may reflect differences in the autonomy, and hence referential capacity, of the argument element, and this might influence information structure as well. To be able to shed light on these issues, further work is needed on Italian subject overt pronouns or on languages that allow deletion of both subject and object pronouns.
An anonymous reviewer also suggested further analyses on semantic transparency, since it may play a role in compound-internal anaphora (see Günther et al., 2020). As we discussed, argument structure has been defined to allow for a direct interpretation because of its semantic predictability ; however we agree that a fine-grained analysis on semantic transparency investigating the constraints on compound-internal anaphora would surely give interesting results, and thus recommend it for future research.
Furthermore, it would be interesting to investigate whether sociolinguistic factors may influence compound-internal pronominal reference. Diatopic and diastratic considerations were not the main focus of our study, but it may be fruitful to investigate this issue further.
Our results allowed us to shed light on a phenomenon that is still debated in linguistic theory. Even if we could establish the non-acceptability of compound-internal pronominal reference because this is not rated as high as grammatical distractors, we still have to account for the different degrees of acceptability according to compound structure. Moreover, if we were tempted to conclude that these sentences are judged acceptable only due to 'pragmatic inference', we would also need to explain why pragmatic inference does not succeed in all cases to the same extent.
Our results show that a closer look at experimental data is crucial, not only for well-established phenomena but especially for phenomena that appear to be on the edge of acceptability but that nevertheless show tendencies and preferences that need to be taken into account. Theory must necessarily draw on experimental evidence, and in turn, experiments need to be carefully designed in order to provide nuanced data that allow theoretical considerations to account for them. To answer our research questions, an acceptability judgment task not only proved to be appropriate to verify the acceptability of the phenomenon but also to grasp subtle variations on this phenomenon concerning the structure of compounds and the quality of referential expressions.

Conclusion
Our findings suggest that compound-internal anaphora is largely accepted with N H N and VN compounds in Italian. The argument element of NN H compounds appears to be impenetrable to pronominal anaphora. Moreover, a difference in referential expression was suggested, with N H N allowing null subject anaphora more than all the other complex words. Although previous findings indicated that NN H compounds are processed differently from N H N compounds, less clear results have been shown in the literature regarding NN compounds compared to VN compounds. The results support theoretical models suggesting a qualitative difference of compounding structures based on the position of the head, and show that the quality of the referential expression can facilitate or inhibit compound-internal anaphora. Future research may focus on the impact of a previously given context in the acceptability of these instances. Regardless, our results point to the need for tighter integration of experimental methods to theoretical considerations and corpus-based research to investigate the syntactic properties of compounds.