The fundamental left–right asymmetry in the Germanic verb cluster

Cinque (Linguist Inq 36(3):315–332, 2005; Universals of language today. Springer, Dordrecht, pp 165–184, 2009; Functional structure from top to toe. Vol. 9 of The cartography of syntactic structures. Oxford University Press, New York, pp 232–265, 2014a) observes that there is an asymmetry in the possible ordering of dependents of a lexical head before versus after the head. A reflection on some of the concepts needed to develop Cinque’s ideas into a theory of neutral word order reveals that dependents need to be treated separately by class. The resulting system is applied to the problem of word order in the Germanic verb cluster. It is shown that there is an extremely close match between theoretically derived expectations for clusters made up of auxiliaries, modals, causative ‘let’, a main verb, and verbal particles. The facts point to the action of Cinque’s fundamental left–right asymmetry in language in the realm of the verb cluster. At the same time, not all verb clusters fall under Cinque’s generalization, which, therefore, argues against treating all cases of restructuring uniformly.

the extended projection of the noun (DP). The theory is descriptively extremely successful in that, among the elements it applies to, it allows the neutral word orders within DP that are actually attested and disallows those that are not. 1 The theory is also explanatory in the sense of Chomsky (1965): variation correlates strictly with readily observable ordering properties which can be used straightforwardly to set parameters (see Abels 2015 for discussion). Cinque's model of neutral word order in the DP is therefore a promising blueprint for a general theory of neutral word order. This paper discusses the concepts needed to generalize Cinque's theory and situates word order variation in the Germanic verb cluster in the context of a Cinque-style general theory of neutral word order. Abels and Neeleman (2012b) shows that, although Cinque's theory is formulated assuming Kayne's (1994) Linear Correspondence Axiom and a cartographic perspective on structures, it effectively relies on the following assumptions: the hierarchy of nominal modifiers is universally fixed; phrase structure obeys the non-tangling condition, which disallows discontinuous constituents (see Ojeda 2006 for discussion and references); 2 movements deriving neutral orders within DP end in positions strictly c-commanding their traces, are leftward, and affect only constituents containing the lexical noun. This is a promising blueprint for a general theory of unmarked word order because it allows for a straightforward generalization as follows: Phrase structure obeys the non-tangling condition; movements deriving unmarked word order within the extended projection of a lexical head L end in positions strictly c-commanding the trace, are leftward, and affect only constituents containing L. 3 The specific hierarchical arrangement of elements remains the only parochial assumption within the theory. Once the hierarchical arrangement is known, there is a prediction about the possible and impossible orders of elements. (Possibility here is understood relative to universal grammar, of course. Individual grammars will allow only a subset of the orders compatible with universal grammar, often only a single one.) While it may seem obvious how to go from here to a general theory of unmarked word order, Sect. 2 will show that, in this overly simplistic form, the theory is too strong and that satellites of a head need to be grouped into distinct classes to be able to generalize Cinque's basic insight in a descriptively adequate and theoretically pleasing way. We can then profitably apply the class-based theory to verb clusters.
Verb clusters were chosen as the target of analysis here for a number of reasons. First, they present a genuine problem of word order; there is a bewildering variability in cluster orders across the Germanic languages and dialects while the meaning expressed is constant. Second, the data are very rich and well documented (Patocka 1997;Seiler 2004;Eroms et al. 2006;Wurmbrand 2006, to appear;Kaufmann 2007;Barbiers et al. 2008;Dubenion-Smith 2010;Louden 2011). The close relatedness of 1 Dryer (2009) disputes this. See Cinque (2014b) for a convincing empirical rebuttal. 2 The non-tangling condition is usually formulated as follows (Partee et al. 1990: 442): In any well-formed constituent structure tree, for any nodes x and y, if x precedes y, then all nodes dominated by x precede all nodes dominated by y.
the Germanic languages simplifies the analytic task, since far fewer potentially interfering differences between the languages and dialects need to be controlled than is the case in a typological study like Cinque's. Third, the traditional analysis of verb clusters in West Germanic assumes that clause union or restructuring is a precondition for verb cluster formation so that all verbs in a cluster are within the same CP. CPs are usually viewed as the maximally extended projection of lexical verbs; just like the DPs, studied by Cinque, are the maximally extended projection of lexical nouns. Finally, verb clusters have eluded a proper theoretical understanding despite the intensive scrutiny they have received (see Wurmbrand 2006, to appear), which means that any constraints on the analysis of verb clusters we may derive from more general theoretical considerations will place welcome boundary conditions on our theorizing about clusters. Successful integration of verb clusters (or a coherent subset of them) into a general theory of word order along the lines sketched above would, concretely, shed light on the following vexing problems. Wurmbrand (2006, to appear) shows that the question of what moves in cluster formation and in which direction is far from settled. If it can be shown that (a coherent subset of) verb clusters with neutral word order fall under a theory that disallows rightward movement and movement of a constituent excluding the lexical head, as I will argue is the case, then neutral cluster orders must be derived without recourse to those devices. A second long-standing problem (see Öhlschläger 1989;Reis 2001;Wurmbrand 2004a;Cinque 2006b a.o.) concerns the lexical versus functional nature of the verbs involved in restructuring and verb clusters. The general theory of word order developed here and building on Cinque's work treats lexical and functional dependents differently. It will be shown that along with aspectual and passive auxiliaries, modals and causative 'let' pattern as functional dependents of lexical verbs while perception verbs, verbs of motion, phase verbs and other clustering verbs come out as lexical. 4 The rest of the paper is structured as follows. The next section contains the conceptual heart of the paper. It goes over Cinque's generalization concerning pre-head vs. post-head asymmetries and introduces Abels and Neeleman's version of Cinque's theory of Greenberg's Universal 20. Echoing a point made repeatedly in commentary on Cinque (1999) (Bobaljik 1999;Svenonius 2002;Nilsen 2003), I show in Sect. 2.2 that Cinque's generalization is false as formulated and leads to paradoxes. The problem is traced to a failure to relativize the generalization according to morphosyntactic classes. A suitable reformulation is attempted in Sect. 2.3. Section 3 evaluates the resulting theory against data from three-verb clusters and concludes that clusters made up of auxiliaries, modals, causatives, and a single lexical verb neatly fit into the theory, while clusters involving an expanded set of verbs do not. The section also suggests integrating verbal particles into the system. Section 4 expands the empirical investigation to four-element clusters made up of auxiliaries, modals, causatives, and a lexical verb. The data lend further support to the theory. Section 5 concludes with a discussion of the findings, the prospects of and challenges for a general theory of unmarked word order, and some remarks on the concrete implementation of the theory proposed in Sect. 2.3. The Appendix contains detailed but ultimately inconclusive discussion of a problematic data point from the Low German variety described in Bölsing (2011). 2 Elements of a universal theory of neutral word order

The fundamental left-right asymmetry
Example (1) is a simplified illustration of the content of Greenberg's (1963) Universal 20, which Cinque's theory of word order takes as its starting point. There are four elements here: the noun (n) , a descriptive adjective (a) , a numeral (num) , and a demonstrative (dem) . They are shown in four (of the 24 logically possible) orders; three of these are attested as the neutral word order in some languages, the forth, marked with an asterisk, is not.
(1) a. dem num a n b. n a num dem c. n dem num a d. *a num dem n Though inaccurate in some of its details, Greenberg's formulation of the word order universal 5 contains the crucial observation that there is a linear asymmetry. Generalizing from the particular categories, we observe that n is the lexical head of its extended projection and that adjectives, numerals, and demonstratives are, for lack of a better term, satellites of the lexical head within its extended projection. Put in these terms, Greenberg's observation says that satellites preceding the lexical head come in a cross-linguistically fixed neutral order while the neutral order allows for cross-linguistic variation when the satellites follow the lexical head. As is easy to verify using partial DP ellipsis as a diagnostic, the pre-head order directly reflects the hierarchical organization of the satellites, in that satellites further to the left are hierarchically higher than and c-command those further to the right (Abels 2015). Cinque (2009) contains the observation in a general form, abstracting away from specific categories. He discusses a fair number of cases that answer to the description of rigid ordering of satellites (S 1 -S 3 in (2)) before the lexical head (L 4 in (2)) and variable ordering after it.
(2) a. S 1 S 2 S 3 L 4 b. L 4 S 3 S 2 S 1 c. L 4 S 1 S 2 S 3 d. *S 3 S 2 S 1 L 4 With the noun as the lexical head, Cinque recapitulates in detail the discussion of Universal 20 from Cinque (2005). He also suggests that if we take S 1 -S 3 to be attributive adjectives of size, color, and nationality, respectively, the same pattern obtains and that the same is true if we take directional prepositions and locative prepositions to be S 1 and S 2 . He shows that we can also take the verb to be the relevant lexical head. In this realm, he recapitulates in detail the discussion from Cinque (2014a), 6 where mood, tense, and aspect function as S 1 -S 3 . He briefly suggests that the ordering of circumstantial PPs of time (S 1 ), place (S 2 ), and manner (S 3 ) follow the same pattern and that the same result obtains if we take S 1 -S 3 to be adverbs (using 'no longer', 'always', and 'completely'). Without going into further detail but citing research on verb clusters, he suggests that "auxiliary and restructuring (or clause union) verbs (Cinque 2006b)" form part of the same pattern with respect to the lexical verb (Cinque 2009: 168). That is, he takes auxiliary and restructuring verbs to represent the lexical verb's satellites, calling all of them uniformly 'aux' in the one structure provided. Clause union and restructuring will be studied in detail in this paper. All of the above cases, Cinque suggests, give rise to the pattern in (2) and should be given the same account: The elements called the satellites here are treated as functional heads or their specifiers which, in accord with Cinque's cartographic outlook, are assumed to occupy fixed positions in a cross-linguistically rigid underlying hierarchy which directly produces the order in (2a). The remaining possible orders are derived through movement operations while the impossible ones are excluded through (simple) constraints imposed on such movements.
Although in his account of Universal 20 and its exceptions Cinque (2005) adopts Kayne's (1994) Linear Correspondence Axiom, which necessitates specific assumptions about the X-bar theoretic status and syntax of what are assumed to be functional heads and phrasal modifiers, Abels and Neeleman (2012b) have shown that the account follows already from the following substantially weaker assumptions: (i) The underlying hierarchical arrangement of demonstrative, numeral, descriptive adjective, and noun within the extended projection of the noun is fixed in such a way that the demonstrative c-commands the remaining three elements, the numeral c-commands the adjective and the noun, and the adjective c-commands the noun. (ii) Phrase structure obeys the non-tangling condition. (iii) All movement involved in deriving unmarked word orders must move a constituent containing the lexical head. (iv) All such movements land in a position within the extended projection of the noun so that the moved element strictly c-commands (in the sense of sister containment) the launching site of movement. 7 And (v), all such movements must be leftward. 8 The first two assumptions allow the generation of eight orders, all of which are simply alternate linearizations of the same underlying hierarchical structure.
(3) dem 1 num 2 a 3 n 4 n 4 a 3 num 2 dem 1 dem 1 num 2 n 4 a 3 a 3 n 4 num 2 dem 1 num 2 a 3 n 4 dem 1 dem 1 n 4 a 3 num 2 dem 1 a 3 n 4 num 2 num 2 n 4 a 3 dem 1 The orders given in (1a) and (1b) are amongst the eight orders shown in (3) above. The remaining six are also attested as the unmarked orders in the languages of the world. The constrained set of movement operations allowed by assumptions (iii)-(v) add the possibility of a further six orders, all of which are, again, attested: (4) dem 1 n 4 num 2 a 3 t n 4 dem 1 num 2 a 3 t n 4 num 2 a 3 t dem 1 n 4 dem 1 a 3 t num 2 a 3 n 4 dem 1 num 2 t n 4 a 3 dem 1 num 2 t This brings the total of attested orders to 14 out of the logically possible 4! = 24. The system allows no further order to be derived and it therefore explains the fact that the remaining 10 orders do not occur as the unmarked order. 9 Consider for example the order dem 1 a 3 num 2 n 4 . It is not attested as an unmarked order according to Cinque (2005) (see footnote 1) and it is disallowed by Greenberg's formulation of Universal 20. That it cannot be derived in the system outlined above becomes obvious by considering the fact that (3) shows all the possible orders generated without movement. The target order dem 1 a 3 num 2 n 4 is not among the eight shown in (3). Thus, in order to derive it without movement, either the non-tangling condition would have to be violated or the underlying hierarchy altered: (5) dem 1 a 3 num 2 n 4 dem 1 a 3 num 2 n 4 Alternatively, dem 1 a 3 num 2 n 4 could be movement derived, as illustrated in the following structures, the first of which violates condition (iii) and the second of which violates condition (v). Perhaps a more interesting case is presented by the order num 2 n 4 dem 1 a 3 , which is allowed under Greenberg's formulation of Universal 20 but is in fact unattested and correctly ruled out by the theory. This order has an obvious derivation violating the c-command condition (iv) and nothing else. [N] 9 Somali has the pairwise orders n 4 dem 1 , n 4 a 3 , num 2 n 4 , the triples n 4 dem 1 a 3 , num 2 n 4 a 3 , and-surprisingly-num 2 dem 1 n 4 . All four elements give rise to the order num 2 dem 1 n 4 a 3 (Adam 2012). The correct analysis for this pattern is not clear. It may be that numerals are essentially nouns in the language or that demonstratives have an independent second position requirement that is driving the pattern.
Under the derivation depicted above, n 4 first moves to num 2 in what looks like a head-adjunction pattern, and then pied-pipes num 2 around dem 1 in the second step of movement. 10 It is important to appreciate the fact that this is a theory specifically of unmarked, neutral word order. Cinque (2005: 315-316 fn. 2) and Abels and Neeleman (2012b: 29-30) mention a number of cases where word orders in violation of the generalizations just given are, in fact, attested. For certain cases of adjectives appearing in unexpected positions, Cinque suggests that they are counterexamples only superficially and that the correct classification of them as (reduced) relative clauses can bring them into the fold of the theory. For other cases, the offending orders are marked and alternate with non-offending neutral orders. Cinque (2005: 316 fn. 2) cites the case of the restricted a 3 dem 1 num 2 n 4 orders in Romance, which is available only as an alternative order and only for certain adjectives. He suggests that the order is derived by fronting the adjective without concomitant movement of the noun. It follows that marked orders have derivational options open to them that are barred for unmarked orders, here, movement of a satellite without the head. Though pinning down the notion of what counts as a marked order with precision is not easy, we can assume the following as a first approximation: When only one order is acceptable, it is the neutral order. When more than one order is available, criteria like frequency, default vs. restricted distribution, and information-structural neutrality must be used to identify the neutral order (see Dryer 2007: section 2 for discussion). In my discussion of verb clusters below, I adopt the basic line of thinking going back at least to Lenerz (1977) whereby the neutral order should at least allow focus projection (one of Dryer's pragmatic criteria). If more than one order allows focus projection, the neutral order will be the one that imposes the fewest further restrictions (Dryer's distributional criteria). Often, the neutral order is also the most frequent (Dryer's frequency criterion).

It's a class society
In order to arrive at a more general theory of neutral word order from the above sketch of Abels and Neeleman's rendition of Cinque's account of Universal 20 and its exceptions, we need to extract the distinctions and operative concepts from this theory. The most fundamental distinction is that between the lexical head and its satellites. The distinction was needed to formulate the generalization and recurs in the theory as the ban against moving constituents not containing the lexical head. The second crucial ingredient is the notion of a syntactic hierarchy in which all satellites c-command the lexical head and are in asymmetric c-command relations with each other. For the head-final order, asymmetric c-command dictates order directly; but the hierarchy is crucially involved in constraining all directly generated orders, (3), and in the statement of what can and cannot move. In fact, given a lexical head with a number of satellites whose underlying hierarchical arrangement is known, we can easily compute whether a particular linear sequencing of these elements is or is not compatible with the theory. This point will become important below. When Cinque speaks of functional heads and modifiers 'associated' with a lexical head, he implicitly invokes a domain, which I have equated with the lexical head's extended projection. It is the domain property that explains why 'the happy students from these countries' with the order a happy dem these n countries does not violate Universal 20: the adjective does not belong to the same domain as the demonstrative. While this is fairly obvious so far, we also need to figure out what the relevant class of satellites is. Cinque (2009: 165) himself says that his "article discusses a pervasive left-right asymmetry found in the order of modifiers and functional heads associated with distinct lexical heads." Modifiers and functional heads mainly exclude arguments. The following moves are implicit here: first, satellites are partitioned into classes; second, it is claimed that there are two relevant classes, one made up of modifiers and functional heads and one of the rest (arguments); third, it is suggested that modifiers and functional heads do and arguments do not fall under Cinque's generalization. This section investigates these claims. While the partitioning of satellites into classes seems to be necessary, the remaining two claims do not survive scrutiny.
As we just saw, Cinque's formulation in terms of modifiers and functional heads excludes one important class of a head's satellites: the arguments. While Cinque does not discuss this explicitly, there are theoretical and empirical grounds for excluding arguments. On the theoretical side, there is, of course, a well-established analytic tradition invoking movement of arguments without the lexical head to position them relative to the head and relative to other elements in the head's extended projection (passive subjects, unaccusative subjects, VP-internal subject hypothesis, raising of agents and patients in event nominals,…). Such analyses are incompatible with the restriction that in the derivation of neutral word order nothing moves except as part of a constituent containing the lexical head.
In addition to these analytic considerations, there are also empirical reasons to exclude arguments. Under a number of fairly well-understood conditions, arguments may occupy a position in the immediate vicinity of the lexical head. For objects this is the complement position, that is, the sister of the verb. A standard argument for the availability of a constituent made up exclusively of object and verb comes from examples like (8). Against the background of the V2 property of German, the examples justify a hierarchical arrangement whereby the object (a DP in (8a) and a CP in (8b)) forms a constituent with the main verb. It follows from Cinque's cartographic assumptions (all satellites have cross-linguistically fixed relative hierarchical positions) that the object underlyingly forms a constituent with the verb to the exclusion of the auxiliary. In other words, the object underlyingly occupies a position that is structurally lower than the position of the non-finite modal (sollen in (8a) and können in (8b)), which, in turn, is more closely associated with the verb than the finite verb (hätte in (8a) and musst in (8b) According to the hierarchy of satellites given above, (9a) represents the order S 3 S 1 L 4 S 2 and (9b)-the order L 4 S 2 S 1 S 3 . Neither can be derived in Cinque's (2005) and Abels and Neeleman's (2012b) system. The examples in (9) would necessitate hierarchical positions for the DP and CP objects that are above the finite verb, contradicting the conclusion reached on the basis of (8). 11 Empirically it is therefore clear that auxiliaries and arguments need to be placed in different classes. Cinque achieves this by removing arguments from the system completely and, implicitly, giving arguments the privilege to move independently of the head of the extended projection in the derivation of neutral word orders. 12 The discussion above has shown that in order to be descriptively adequate, the theory must distinguish between different classes of satellites. We also saw that Cinque, implicitly, makes a two-way distinction between functional heads and modifiers on the one hand and arguments on the other hand. This is a natural move given Cinque's cartographic outlook on syntax. All functional heads are assumed to be arranged in a universally fixed sequence and all modifiers are introduced as specifiers of such (often abstract) heads. However, the attentive reader will no doubt have noticed that the classes of elements used by Cinque to exemplify the generalization were much smaller and seemed very homogenous: For the most part, the illustrations pick out morphosyntactically coherent classes such as adverbs, PPs, and auxiliaries. The expectation of the theory is that there should be no interesting interactions between these classes-all are functional heads or their specifiers after all. Mixing the smaller classes used for illustrative purposes should therefore be innocuous. Given that the main focus of this paper is on the verb cluster, we will briefly look at interactions between the position-11 Similar issues arise within DP. See Adger (2013), Belk and Neeleman (2015). 12 Of course, there are many other ways of thinking about the problem raised by examples (8) and (9). One could give up the cartographic assumption that if the object is sometimes generated as the sister of the verb, then it always is. Or one might assume that Cinque's and Abels and Neeleman's ban on rightward movement is mistaken-as in Evers' (1975) classic analysis of verb clusters. Or one might assume that objects move independently of the lexical head.…My point in the main text is simply to show that strict adherence to all conditions leads to problems. Cinque sidesteps these issues by excluding arguments from consideration.
ing of various cluster-forming verbs and other elements used by Cinque to illustrate his generalization. We will discover that mixing morphosyntactic classes is not at all innocuous. The problem illustrated above for objects arises in exactly the same form for other types of satellites.
Recall that Cinque (2009) mentions both auxiliary verbs and adverbs to illustrate his generalization. Both occur, as heads and modifiers, respectively, in the extended projection of the verb. We should therefore be able to arrange them along a single hierarchy. Any such attempt predicts that the relative order of auxiliaries and adverbs to the left of the lexical head is fixed. This prediction, it turns out, runs into serious problems. The standard German VP-topicalization example in (10a) indicates that the manner adverb 'beautifully' forms a constituent with the verb to the exclusion of the modal and the auxiliary. The same conclusion is suggested by the Zürich German example in (10b) (Martin Salzmann, p.c.), where the auxiliary, modal, and adverb precede the verb in that order. However, both the standard German (10c) and the standard Dutch (10d) (Ad Neeleman, p.c.) require the adverb to be hierarchically higher than the auxiliary.  The problem we run into here is reminiscent of difficulties for Cinque's (1999) attempt to integrate adverbs and auxiliaries into a single linear hierarchy pointed out in Bobaljik (1999), Svenonius (2002), Nilsen (2003). 13 What we observed in the previous paragraph concerning adverbs carries over to PPs. They are mentioned along with auxiliaries by Cinque (2009) as illustrative of his generalization. By cartographic reasoning, it should be possible to integrate both classes into a single hierarchy. We therefore expect a consistent relative order of auxiliaries and PPs to the left of the lexical head and this ordering must reflect the unique hierarchy. This expectation is not met.  As before, the topicalization data from standard German, (11a), suggests attaching the PP to VP below the modals. Zürich German, (11b), confirms this structure. But the standard German, (11c), and standard Dutch, (11d), examples suggest attaching the PP above the modal and the auxiliary. The situation is overall similar to that observed in Bobaljik (1999), Svenonius (2002), Nilsen (2003): When the morphosyntactic classes are looked at in isolation, they organize themselves into a neat hierarchy. The model leads to paradoxes when we try to integrate the classes with each other. A more systematic study would investigate further across-class orderings. Such a study would also need to evaluate empirically how the various within-class orders interact with each other. 14 I will not undertake this task here, instead confining myself to the conclusion that Cinque's generalization about word order relative to the lexical head is well supported within a given class but runs into severe trouble when data mixing classes is taken into account.
With the insight that Cinque's generalization must be restricted to coherent classes in the background, we can speculate that Cinque's exclusion of arguments might have been premature. Consider the relative ordering of subject, object, and verb. According to WALS (Haspelmath et al. 2005), of the six logically possible orders, SOV, SVO, and VSO are common. VOS is rare, OVS extremely rare, and OSV is virtually unattested. If we take the underlying hierarchy to be [S [O V] ] with S and O satellites of the lexical verb, we would expect five orders to be possible and one to be impossible: OSV. The data generally go in the right direction, though languages with reported OSV order would need to be investigated carefully to see whether, for example, S and O are members of the same or of different classes in these languages. If one were a DP and the other a PP, for example, we might expect the possibility of OSV orders. Pearson (2000) reports that when we look at double object constructions across languages, there is only one unmarked order in OV languages and two in VO languages.
This generalization holds only when both objects are morphosyntactically similar, that is, in double object constructions rather than to-datives. Assuming the underly-ing hierarchy to be [ IO [ DO V] ], the reportedly impossible order is theoretically disallowed.
Once we restrict our attention to morphosyntactically similarly represented arguments, it seems possible that even arguments might fall under Cinque's generalization after all.
The discussion in this subsection makes the point that the facts are incompatible with a combination of strict cartography (all of a head's satellites occupy fixed positions relative to all other satellites of that head on a universally immutable syntactic hierarchy) and Cinque's (2005) and Abels and Neeleman's (2012b) restrictions on movement. Instead, the satellites behave as though they are members of distinct hierarchies separated by morphosyntactic class. Each of these classes behaves individually as predicted by Cinque's and Abels and Neeleman's theories. Constraints on the ordering across classes are as yet mostly unknown. I return to the issue briefly in Sect. 5.

A formulation
The discussion in the previous subsection has shown that Cinque's word order generalization must be relativized to different classes of satellites. Cinque's own theory recognizes this necessity, implicitly, but only introduces two different classes: arguments and everything else. We have seen that this theoretical move is problematic, because it does not eliminate incorrect (paradoxical) cross-category interactions; smaller categories are necessary. We have also seen that Cinque's conclusions concerning arguments might be rash. They might yet turn out to be internally more well-behaved and not to warrant exclusion.
We can now give a provisional statement of the theory as follows: (13) Let L n be a lexical head and S 1…n-1 dependents of L n such that a. all S i are members of the same morphosyntactic class and b. all S i occur in the extended projection of L n and c. for all pairs S j , S j+1 , S j is hierarchically more prominent (scope, constituency, selection, government,…) than S j+1 then possible neutral orders of S 1 …S n-1 L n are all those orders given by flexibly linearizing structure [ S 1 [ S 2 …[ S n-1 L n ] …] ] without violating the non-tangling condition and by moving L n or constituents containing L n to strictly c-commanding positions and to the left.
The number of possible orders thus generable is given by the following formula: Zero elements can be ordered in one way. One element can be ordered in one way. Two elements can be ordered in two ways, three elements-in five, four-in fourteen, and 10 in 16,796 ways. The number of orders admissible by this theory grows very fast (faster than e n ) but still much more slowly than the space of logically possible orders (n!), which amounts to 39,916,800 when n = 10.
Very careful readers will no doubt have noticed that the formulation above backs away from an actual commitment to a mechanism deriving the within-class orders. Instead, the formulation above only claims that the typology of within-class orders is the one predicted by Cinque's and Abels and Neeleman's systems. The point will be taken up in Sect. 5.

Three-element clusters
As explained in the introduction, verb clusters were chosen for this study because they present a long-standing problem of word order. Verb clusters involve non-finite embedding with clause-union effects. These effects are often modeled using the assumption that the restructured infinitives are somehow deficient or incomplete (not full CPs) and might therefore be taken to form a single extended projection. This would allow us to treat them in terms of the theory formulated at the end of the last section. The investigation promises to turn up evidence that bears on the following questions: Does restructuring/clause union go hand in hand with the word order restrictions from the general theory of unmarked word order? If not, is there a type of (a degree of) restructuring that does? Can all restructuring verbs be viewed as non-lexical, that is, as functional satellites of the most deeply embedded lexical verb of the cluster? If not, which ones can?
The formulation of the theory at the end of the last section allows us to check whether the order of a given set of elements falls within the theoretically available range, as long as we know what the hierarchical arrangement of the elements is. We expect it to fall within this range, if these elements are satellites of the same lexical head and are members of the same class. For the initial exploration of verb clusters in the first part of this section, I will remain agnostic on the issue of what counts as a satellite and what counts as a separate lexical head, though cases of clear clause-level complementation that show no signs of clause union, restructuring, or coherence will be excluded from the start. Such structures pattern with example (9b). The verbs in such structures are not satellites of each other and CPs are in a class different from verbs.
I will initially assume that verbs, lexical or otherwise, are all members of the same class. The question of (underlying) hierarchical organization of the verbs is usually not contentious. I will follow the general assumption that a verb needs to ccommand another to determine its form, 15 and that the relative scope of verbs is another diagnostic for hierarchical arrangement, as is the ability of a particular group of verbs to appear in the pre-field position, before the finite verb, in a verb-second clause. The finite verb, if there is one, is always the highest. These diagnostics generally give the same results and I am not aware of any serious disagreements about this in the literature. 16 I will follow common practice (going back to Bech 1955) and number the verbs by hierarchical prominence. Thus, the English example 'that he might 1 have 2 been 3 seen 4 ' exhibits the 1-2-3-4 order, and its German counterpart 'dass er gesehen 4 worden 3 sein 2 könnte 1 ' displays the opposite 4-3-2-1 order.
We will now look at three-verb clusters. For such clusters there are five theoretically expected orders and one which is ruled out. The underlying hierarchy can be linearized in the following four ways: There is one additional order requiring movement. (16) The remaining logically possible order is 2-1-3. It cannot be generated as an unmarked order without violating some constraint of the theory.

2-1-3 in three-verb clusters
This subsection asks whether 2-1-3 cluster orders are attested in a way relevant to the theory. Discussion of all other orders is deferred to the next subsection, which will concentrate on three-verb clusters with auxiliaries, modals, and causatives. Orders other than 2-1-3 will be shown to be attested for the restricted set of restructuring verbs; they are therefore also attested in the unrestricted case. Here, we will discuss reported instances of 2-1-3 cluster orders. The 2-1-3 cluster order comes up in a number of works as an alternate order. Schmid and Vogel (2004) give the 2-1-3 order in Aux 1 Mod 2 V 3 clusters as a possible order under certain (different) focus conditions in the following varieties: Rheiderländer Platt spoken in eastern Frisia, St. Gallen Swiss German, and the dialect of Meran. Given that 2-1-3 occurs only as an alternating order, we face the challenge of figuring out whether 2-1-3 is the neutral order or not in Schmid and Vogel's data (see the end of Sect. 2.1 above). 2-1-3 is never the only order relative to any focus condition (Schmid and Vogel 2004: 238) and thus, a fortiori, never obligatory in unmarked contexts with projecting focus. Furthermore, there is no dialect in Schmid and Vogel's sample where 2-1-3 would be compatible with all focus conditions (Schmid and Vogel 2004: 238), though all dialects in Schmid and Vogel's sample have such orders. The latter are thus the unmarked orders by the criteria given at the end of Sect. 2.1; the 2-1-3 order by contrast invariably comes out as marked in these dialects. Schmid and Vogel's attestations of 2-1-3 therefore do not counterexemplify the current theory. Similarly, 2-1-3 is given, but as a marked alternate order, for certain three-verb clusters in the Swabian dialect of Stuttgart in Heilmann (1999). Again, since 2-1-3 is a marked alternate order, I do not consider the variety of Swabian documented by Heilman as a counterexample to the theory. The logic extends to other reports of 2-1-3 as a marked alternate (see for example Schwalm 2013).
The next type of example where we see 2-1-3 orders is illustrated with the West Frisian example below. The example shows an infinitival verb introduced by the cognate of 'to' to the right of the matrix verb. At the same time there is a clear indication that restructuring is happening, since the object of the most deeply embedded verb ('the book') appears to the left of the higher verb ('forbid'). Such examples are referred to in the literature as the third construction (den Besten and Rutten 1989), remnant extraposition (Santorini and Kroch 1991), or Linksverschachtelung (Kvam 1979(Kvam , 1980. In West Frisian this particular order of verbs is obligatory (de Haan 2010a). 17 The third construction shares word order amongst the verbs, lack of IPP effects, and presence of 'to' with clearly bi-clausal extraposition structures (den Besten and Rutten 1989), which might be taken as evidence against restructuring. On the other hand, placement of verbal dependents shows evidence of clause-union (see Wurmbrand 2001;Wöllstein-Leisten 2001;ter Beek 2008 for detailed discussion). Such examples are clearly problematic from the perspective of the current theory-at least if all restructuring verbs are treated as functional dependents of the main verb.
A similarly problematic case of 2-1-3 orders can be found in Zürich German. Lötscher (1978) reports the possibility of 2-1-3 orders in combinations with an auxiliary as the highest, a benefactive verb, perception verb, or phase verb as the second member and a main verb as the third member of the cluster. The relevant structure is illustrated in (18)  As with the third construction discussed above, the word order among the verbs and lack of IPP effect are reminiscent of clearly bi-clausal extraposition structures. 18 The possibility of placing the most deeply embedded object in the matrix domain in (18), however, suggests clause union. Indeed, clause union is obligatory: (19) shows that weak pronouns must be placed in the matrix Mittelfeld (M. Salzmann, p.c.) attesting to the obligatory absence of a Wackernagel position in the embedded domain and, hence, attesting to obligatory restructuring. Moreover, the most deeply embedded infinitive is bare, without 'to', which is usually (Bech 1955) taken as a further sign of obligatory restructuring. 19 Finally, Salzmann (2013b) claims that 2-1-3 and 1-2-3 orders are equally unmarked. All of this makes it difficult to write the 2-1-3 order off as an irrelevant, because marked, alternative order. 20 On balance, the examples from Zürich German seem to be a genuine case of an order that would not be expected if benefactive, perception, and phase verbs are treated as functional satellites of the main verb.
A related, even clearer case is presented by examples discussed in Schallert (2014). The examples again involve 2-1-3 orders with benefactives, perception verbs, and inchoatives as the second member and come from the dialects of the Austrian Vorarlberg and from Liechtenstein, which, like Zürich German, belong to the High Alemannic dialect group. While the 2-1-3 order is only an alternate order with benefactives and perception verbs, 2-1-3 is essentially obligatory with inchoatives. Like in Zürich German there are clear signs of restructuring. Interestingly, there are a few varieties that differ from Zürich German further in that we even find the IPP effect, at least when the main verb is intransitive (Schallert 2014: 195-196 Louden (2011) shows that in current Pennsylvania Dutch the 2-1-3 order is possible and indeed obligatory in clusters with perception verbs, motion verbs, 22 and benefactives as the second member, but not with modals, which exhibit obligatory 3-1-2 order. With causatives as the second member, we find alternation between 3-2-1 and 2-1-3 with a subtle distinction in meaning that Louden takes to indicate a lexical status of the causative when the order is 2-1-3 and a functional status when the order is 3-2-1. None of the verbs that appear in the 2-1-3 order exhibit IPP effects, which are restricted in modern Pennsylvania Dutch to modals. In earlier Pennsylvania Dutch, IPP effects were also, optionally, found with perception verbs and benefactives, which appeared in the 2-1-3 order. Again, the presence of the IPP effect in the older variety strongly suggests that we are dealing with bona fide restructuring and verb clustering. 23 The examples from West Frisian, Zürich German, the Vorarlberg and Liechtenstein, and Pennsylvania Dutch show that there are verb clusters where 2-1-3 is the unmarked or, indeed, the only order. If, as is commonly assumed, verb clustering requires restructuring, that is, clause union, in some sense, then these data disprove Cinque's (2009: 168) conjecture that the linear order of (all) "auxiliary and restructuring (or clause union) verbs (Cinque 2006[a])" falls under his generalization about word order. A moment's reflection shows, however, that there can be a number of senses of clause union which Cinque's formulation might be conflating. While it seems obvious that (21a) contains two clauses (two lexical verbs and two CPs) and that (21c) contains only a single clause (one lexical V and one CP), the situation is less clear for a structure like (21b), where there is only one CP but two lexical verbs. Alas, we cannot infer from these attestations whether they represent marked, unmarked, or obligatory orders and I have to set them aside. 22 Barbiers et al. (2008: map 18b) gives the 2-1-3 order as the only order for 'is gone swimming' in the West Frisian variety spoken on Schiermonnikoog. 23 Unfortunately, Louden (2011) does not discuss restructuring diagnostics systematically.
While the structure in (21a) is fully recursive in that all functional categories can be repeated, (21c) is non-recursive, and (21b) is partially recursive. 24 It is plausible to treat (21b) as a case of clause union in some sense, due to the lack of intervening CP (Wurmbrand 2014(Wurmbrand , 2015. However, C 1 , T 2 , and V 3 should probably not be treated as functional satellites of V 5 in this structure. Instead, TP 4 is a verbal argument of V 3 , that is, TP 4 is a satellite of V 3 rather than V 3 being a satellite of V 5 ! Returning to our discussion of verb clusters, it is not at all clear that in the examples of 2-1-3 order discussed in this subsection only the lowest verb is lexical and the remaining verbs are its functional satellites. The class of verbs that appear in unmarked and/or obligatory 2-1-3 clusters as the second member are perception verbs, benefactives, inchoatives, motion verbs, and a large class of verbs involved in the third construction (see ter Beek 2008 for a list of standard Dutch verbs involved in the third construction). These are plausibly analyzed as lexical verbs rather than as functional satellites of the third member of the cluster. If so, these verbs would induce clause union only in the sense of (21b) (see Wurmbrand 2001contra Cinque 2006a) and would thus not be counterexamples to the theory of neutral word order. I will assume that this is correct and set these examples aside. I briefly return to 2-1-3 clusters in Sect. 5.

Three-verb clusters with auxiliaries, modals, and causatives
In the previous subsection we considered possible cluster orders for three-verb clusters without putting further restrictions on the verbs involved. We found that there are instances where 2-1-3 is the unmarked or the only order, but we concluded that nevertheless, they may not be counterexamples to the present theory of neutral word order, since in those cases it is dubious that V 1 and V 2 are satellites of V 3 . In the present subsection, we will restrict our attention to a smaller class of cluster-forming verbs, namely the temporal, aspectual, and passive auxiliaries, the modal verbs (all of which have the morphological quirk of being preterite-presents in German), and the causative 'let' and its cognates. These verbs are in many ways the most central members of the cluster-forming verbs. In the varieties investigated here, modals, 'let', and the future auxiliary always take bare infinitives without 'to' 25 and the other auxiliaries take participles. Both are characteristic of verbs that undergo clause union obligatorily, while infinitives with zu can go either way. The modals and 'let' are also the central verbs for the IPP effect, since, in a given variety, if any verbs show the IPP effect, the modals and 'let' do (Schmid 2005). And if any verbs obligatorily trigger IPP, they include the modals and 'let'. For clusters made up of Aux 1 , Mod 2 , and V 3 , all five theoretically expected orders are attested as neutral orders while 2-1-3 is completely absent as a neutral order. Barbiers (2005) reports that the translation of Standard Dutch (22) with 1-2-3 order into dialectal variants of Dutch, elicited as part of the SAND project, produced 2-3-1 24 Recursivity distinguishes the structures only on the customary but not logically necessary assumption that extended functional projections are a linearly ordered set. The linearity assumption is fairly often violated in practice (see for example Rizzi 1997;Belletti 2005;Jayaseelan 2001). 25 A reviewer points out that this definition, for better or for worse, excludes verbs like Dutch hoeven 'need', English have to, etc. which do not appear with bare infinitival complements. and 3-2-1 variants in substantial numbers. The 1-3-2 order shows up in small numbers, but with a consistent geographical pattern. Barbiers assumes that 1-3-2 is a possible Dutch pattern for this combination of modals and auxiliaries. The remaining pattern (3-1-2) is virtually absent in the SAND data. Seiler (2004) reports Swiss German data for the same type of sentences, (23), and finds the orders 1-2-3 and 3-1-2 to be clearly attested in his sample. For sentences of the type in (24), Patocka (1997: 278) reports three possible orders in the Bavarian dialects of Austria: 1-3-2, 3-1-2, and 1-2-3. Standard German also has 1-3-2 as an unmarked order for Aux 1 Mod 2 V 3 structures. Crucially, none of these authors report the 2-1-3 pattern to be possible.  (Patocka 1997: 278) These findings are consistent with Wurmbrand's (2004b, to appear) assessment of the situation. For Aux 1 Mod 2 V 3 clusters she reports 1-2-3 order for Dutch and Swiss German; 1-3-2 orders for Standard German, the Allemanic Vorarlberg dialect, and certain Swiss German speakers; 3-1-2 orders for various German and Swiss German dialects, as well as the Allemanic Vorarlberg dialect; 2-3-1 orders for Afrikaans and, under certain circumstances West Flemish; 3-2-1 orders for some German dialects and the Allemanic Vorarlberg dialect; and no 2-1-3 orders. Indeed, for each of these five orders there are dialects where the order is not only attested and unmarked but in fact obligatory. 1-2-3 is the only possible order in a large part of the area covered by the SAND project (Barbiers et al. 2008: 20a). 3-2-1 is obligatory for example in West Frisian (de Haan 2010b; Barbiers et al. 2008: 20a). 1-3-2 appears to be obligatory in a number of the Dutch dialects where it occurs (Barbiers et al. 2008: 20a), it is the most unmarked order in standard German for such clusters (Bader and Schmid 2009). 2-3-1 is the standard order in Afrikaans (Robbers 1997: 57) and a number of places west and south of Antwerp (Barbiers et al. 2008: 20a). 3-1-2 finally is obligatory in some Bavarian dialects (Eroms 2004;Eroms et al. 2006: map 5), in Eastern Hessian (Schwalm 2013: 63 map 8), and in Pennsylvania Dutch (Louden 2011). 2-1-3 is never obligatory with Aux 1 Mod 2 V 3 clusters.
The remaining cluster types that can be constructed from auxiliaries, modals, and main verbs are the following: Mod 1 Aux 2 V 3 , Mod 1 Mod 2 V 3 , and Aux 1 Aux 2 V 3 . They show less variability in ordering (see Wurmbrand to appear: table 2 for an overview). Instead of all five orders that we found with Aux 1 Mod 2 V 3 clusters, there are only four orders. The absence of 2-1-3 is unsurprising at this point, but attestations of 2-3-1 as unmarked or obligatory are also missing.
The unexpected 2-1-3 order is not attested as an unmarked order for these types of clusters. When it is claimed to occur, it is a marked, alternative order (Schmid and Vogel 2004;Heilmann 1999 The examples involve causative 2 -modal 1 -verb 3 and there are clear restructuring diagnostics present in the form of the positioning of weak pronouns. Unfortunately, Höhle's discussion of these examples, though noting that the order is unusual, doesn't make it clear whether this order alternates with others. I have not been able to consult Höhle's sources yet and must set these examples aside pending further inquiry. 26 Overall then, when clusters are restricted to those consisting of auxiliaries, modals, causative 'let', and one main verb, we find strong support for the theory of unmarked word order from Sect. 2.

Digression: verbal particles
The evidence reviewed in the previous subsection suggests that auxiliaries, modals, and causative lassen should count as functional satellites of the lexical verb. In Sect. 2, we saw that adverbs and PPs as well as argumental DPs and CPs are members of a different class. The question then arises where verbal particles (also known as separable prefixes in the Germanic OV-languages) fall. There are a number of analyses of particles (den Dikken 1995;Neeleman and Weerman 1993;Ramchand and Svenonius 2002) that treat them as low heads in the clausal spine, lower than the lexical verb. Such treatments raise the possibility that particles, when present, might be the lowest verbal head in the clause, with the main verb and the auxiliaries as its satellites. This subsection briefly explores this possibility. The considerations of the next paragraph lend independent initial plausibility to such an approach.
Like verbs and auxiliaries and unlike arguments and adjuncts, verbal particles cannot be scrambled. This generalization is natural, if we treat the particles as verbal. Verbal particles, unlike arguments and adjuncts, cannot be left behind in the middle field under partial VP-topicalization. If we treat the main verb as the particle's satellite, then this observation falls together with the observation that auxiliaries and modals cannot be topicalized to the exclusion of the main verb. The impossibility of fronting auxiliaries and modals to the exclusion of the main verb and the main verb to the exclusion of the particle do not fall together if we treat the particle as a (low) satellite of the main verb. Finally, like particle-less main verbs, particles can be fronted to the pre-field-at least under certain circumstances. These considerations provide initial support for the idea explored below. 27 It should be noted first that construing the particle as the true head and the main verb as its lowest verbal satellite does not change the predictions about expected relative orders of main verb, auxiliaries, and modals. This theoretical point is illustrated for Aux 1 Mod 2 V 3 prt 4 clusters in the structures below.
(27) a. Aux 1 -Mod 2 -V 3 , prt 4 variable (i) The data supporting the claims in this paragraph are well known. See Müller (2003) for relevant examples and references, especially regarding the fronting of particles. (iii) All 14 possible four-element structures have been given above. The five desired orders of Aux 1 Mod 2 V 3 remain derivable, the undesired one continues to be ruled out.
The structures above show that the suggested analysis of the particle predicts that the particle can never be to the right of an auxiliary or modal unless the verb is, too. But the converse need not hold: the particle may be to the left of the auxiliary or modal when the verb is further to the right. This prediction is correct and it captures an important generalization about particle placement in Germanic.
We now turn to the more fine-grained facts of particle placement with respect to verbs. In Table 1, expected and attested orders are marked with a 'yes' followed by (a non-exhaustive sample of) the languages or dialects where the order is found. Expected but unattested orders are marked with '!', and unexpected, unattested orders are marked with 'no'. The inclusion of the Germanic VO-languages in the table is but excluded here because it is a dispreferred alternative to the 2-4-3-1 order justified on the assumption that verbal particles in the VO-languages are essentially the same phenomenon as particles, i.e., separable prefixes, in the OV-languages.
As can be seen in the table, the theory from Sect. 2 is successful in the sense that none of the 10 theoretically excluded orders are attested.
However, the theory fails to predict a second generalization, which can be observed here. The particle must precede the main verb unless the main verb follows all modals and auxiliaries. The precondition is met in the Germanic VO-languages. In the Germanic OV-languages, particles systematically precede the verb. In some sense this is unproblematic. There is no theoretical reason to expect the Germanic dialect space to perfectly fill out the space of possibilities made available by universal grammar. On the contrary, it is surprising when it does, as, arguably, it does with the four verb clusters discussed in the next section. However, this is clearly not an entirely accidental gap; there is a pattern: V 3 before prt 4 is absent except in the VO-languages, which also happen to show left-to-right scope among verbs.
What might explain why V 3 before prt 4 never occurs when V 1 or V 2 (or both) follow? Building on Wagner (2004Wagner ( , 2005a, Abels (2013) explains the relative rarity of 2-3-1 orders in the verb cluster in terms of prosody. The explanation carries over to V 3 before prt 4 when V 1 or V 2 (or both) follow.
The explanation in a nutshell runs as follows: Wagner (2005a) had formulated generalization (28) about prosodic domain formation for the Germanic languages. Wagner's 'functors' are functional satellites in current terminology; a functor's complement is its underlying sister. Applying this generalization recursively to the five syntactic structures of three-verb clusters derivable in the current system, the following prosodic structures are derived, where the level of relative prosodic projection is indicated by the number of *s and the opening parentheses indicate the left edges of prosodic constituents. With only a single exception, each prosodic constituent indicated above coincides with a syntactic constituent: The one prosodic constituent that does not correspond to a syntactic constituent is 3-1 in (29c). In Abels (2013) I take this observation as the basis for the cross-linguistic and cross-constructional rarity of 2-3-1 cluster orders. Given the mismatch between prosodic and syntactic structures, 2-3-1 will be more frequently misparsed than the orders where there is no syntax-prosody mismatch. This higher frequency of wrong parses will make the acquisition and thus historical transmission of 2-3-1 more difficult than transmission of other orders and, thus, militate against a language developing this pattern.
The relevance of this for current purposes is that the syntax-prosody mismatch that we see with 2-3-1 order also characterizes all orders where V 3 comes before prt 4 when V 1 or V 2 (or both) follow. This explains the systematic aspect of the gaps in the distribution we see in Table 1. 28 The order 4-1-3-2 is the one remaining theoretically possible but unattested order. It does not fall under the generalization just discussed. Koopman and Szabolcsi (2000) rule this order out categorically, but it is possible for some speakers I have consulted as a marked alternative (J. van Craenenbroeck, A. Neeleman, L. Haegeman, L. Aelbrecht, p.c.) in Mod 1 Aux 2 V 3 prt 4 clusters. 29 If the parallel between the syntax of verb clusters and that of DPs we are pursuing here is real, then it is not surprising that it is exactly the 4-1-3-2 order that remains unattested. In fact, the difficulty in attesting it can be taken as a further argument in support of the parallelism, since within the DP 4-1-3-2 orders are so rare that Cinque (2005: 320) described their cross-linguistic frequency as "very rare-possibly spurious." I conclude that important properties of particles follow from the assumption that they are the lexical head of the extended verbal projection, with the main verb as a satellite in the sense of the current theory. The theory of word order thus confirms those views of particles that take them to be low heads in the clausal spine.

Word order in four-element clusters
This section turns to neutral word orders in clusters consisting of four elements, where the satellites are restricted to being auxiliaries, modals, and causative lassen. Table 2 summarizes the data collapsed across all cluster types. The first column represents the order, the second indicates whether the particular order is expected under the theory from Sect. 2. The next column repeats information from Table 1 about clusters with particles. The remaining columns deal with verb clusters where the lowest element is a main verb and the three satellites are drawn from the set of auxiliaries, modals, and causative 'let'. Expected and attested orders are marked 'yes', unattested unexpected ones 'no'. The one expected but unattested order and the one unexpected attested order are marked '!'. The next column shows whether a given order is only attested with 'let'.
As can be seen, there is an extremely good match between theoretically derived expectations and attested facts for four-verb clusters. All but one of the expected orders are attested as unmarked orders and only one of the unexpected ones is. The following paragraphs give a bit more detail, moving from clearly attested to unattested via a less secure gray area. 28 The historical development of particle placement in English (Fischer et al. 2000: ch. 6) is fully consistent with this view. 29 My own essentially anecdotal data from the main text does not allow further generalizations about the areal distribution of this marked order. J. van Craenenbroeck suggests that there might be varieties where this order is default; the relevant maps in Barbiers et al. (2008) suggest Sint-Jozef-Olen (Antwerp), Mol (Antwerp) and Grote-Spouwen (Limburg, BE) as candidate locations because they are shown with obligatory 1-3-2 in Mod 1 Aux 2 V 3 clusters and obligatory particle float in "wants to eat up" or "should throw away."  (den Besten 1981: 6) c Standard Dutch (Geerts et al. 1984: 600) d Though see example (26) and its discussion above e Mod 1 Aux 2 Mod 3 V 4 clusters in West Flemish (Haegeman 1998a: 277;den Dikken 1994: 83) f Standard German in Aux.perf 1 Mod 2 Aux.pass 3 V 4 clusters (Bader and Schmid 2009: 214), Stellingwerfs Mod 1 Mod 2 Aux 3 V 4 clusters (Zwart 1995: 9) g Afrikaans Mod 1 Aux.perf 2 Aux.pass 3 V 4 (Donaldson 1993: 261 ex. 918) h Zürich German Aux.perf 1 Aux.pass 2 let 3 V 4 (M. Salzmann, p.c.). The order alternates with 1-3-4-2 and 4-3-2-1 but in the latter case with the full participle of 'let' i Standard German Mod 1 Aux.perf 2 Aux.pass 3 V 4 clusters; West Frisian all clusters (de Haan 2010b) j West Flemish Aux 1 Mod 2 Mod 3 V 4 clusters (L. Haegeman, p.c.) k As an alternate in West Flemish Aux.perf 1 Mod 2 Aux.pass 3 V 4 clusters (L. Haegeman, p.c.) l Standard German Mod 1 Aux 2 let 3 V 4 "Skandalkonstruktion" (see text discussion) m Afrikaans Mod 1 Aux.pass 2 let 3 V 4 (Robbers 1997: 64 ex. 46) n Standard German (Bader and Schmid 2009: 214), Vorarlberg (Schallert 2014) o Lindhorst Low German (Bölsing 2011) (see Appendix for discussion) p Spontaneously, at low frequency, possibly as alternate, in Wurmbrand's (2004b) study The simplest cases are represented by the ten orders which are expected, attested, and do not require causative 'let' to be attestable. The notes to the table give references to undisputed attestations of the orders as the only or as an unmarked order. The 2-3-4-1 order is not, as far as I know, attested in this sense in the literature, but it occurs in West Flemish Aux 1 Mod 2 Mod 3 V 4 , in cases where the simpler cluster with a single modal would have 2-3-1 order. See Haegeman (1998aHaegeman ( , b, 2001 for details on the form and placement of the auxiliary. Despite the lack of the kind of atlas data, that are available for three-element clusters, we already see a very good fit between predictions and data for four-element clusters here.
While the ten orders above are clearly attested as unmarked orders when we use the strictest criteria and include only auxiliaries and modals as satellites, the remaining four orders are expected under the theory but attested only if we broaden the database to include causatives and particles or admit orders that aren't clearly neutral.
Second, the 2-4-3-1 order is attested robustly in West Flemish when the lowest element is a particle, (32a), otherwise it is attested only as an alternate order in Aux.perf 1 Mod 2 Aux.pass 3 V 4 clusters in West Flemish, (32b), where it alternates with 1-2-4-3, 1-4-3-2, and 4-1-2-3. is 'that this book has had to be read' Third, the Standard German Mod 1 Aux 2 let 3 V 4 clusters dubbed the "Skandalkonstruktion" in Vogel (2009) because of the unexpected order and the displaced morphology produces examples of the 4-2-3-1 order, but again only with 'let', (33 Finally, 3-4-1-2 is the order given in Robbers (1997: 64 ex. 46) for Mod 1 Aux.pass 2 let 3 V 4 clusters. I am not aware of this order as the neutral order in clusters without 'let'.
This covers the 14 orders expected to exist under the current approach. We now turn to the 10 logically possible orders that are expected not to exist as neutral orders under this approach.
For most of the remaining 10 orders the claim that they do not occur as neutral orders in clusters made up of auxiliaries, modals, 'let', main verbs, and particles is fairly unproblematic. Though Schönenberger (1995) does mention 2-1-3-4, 2-1-4-3, and 4-2-1-3 orders among a long list of other alternate word orders for Swiss German Aux 1 Mod 2 Mod 3 V 4 clusters, it is clear that these are highly marked. 4-2-1-3 orders also show up in Wurmbrand's (2004b) elicitation study for Aux.fut 1 Aux.perf 2 Mod 3 V 4 . In Wurmbrand's study, the order is found only in the Austrian dialect group, where it is with 10 % of the total data the fourth most popular order. I will assume that 4-2-1-3 is not the (most) neutral order for the speakers who produced it, though the issue bears further investigation. The 1-3-2-4 order from example (26) was discussed above and has to be set aside for lack of further information.
This leaves the 2-1-4-3 order as the most problematic potential counterexample to the theory from Sect. 2. The order is attested in Bölsing's (2011) grammar of the Low German dialect of Lindhorst in a number of very complex examples, where the rest of the description would have predicted 2-4-3-1 orders to surface. This itself is interesting and potentially significant, since 2-4-3-1 is the only expected order that I have not been able to find clear attestations of-at least when particles are excluded. I discuss Bölsing's data in the Appendix. Given that there are some clear errors in Bölsing's description, that the examples show a number of eccentricities beyond the word order, and that a number of important questions remain unanswered, I must set this data point, too, aside for the moment, though not without stressing its potential theoretical importance, which should certainly inspire further study.
This concludes the discussion of the attested cluster orders with auxiliaries, modals, 'let', one main verb, and particles. The fit between theory and data is extremely good. The counterexamples from Bölsing (2011) andHöhle (2006) might necessitate a weakening of the theory, though at this point the argument from these examples is too weak to motivate strong conclusions. 208 K. Abels

Discussion
The first part of the paper has shown that a general theory of neutral word order implementing Cinque's generalization has great promise but needs to be based on classes of satellites. Cinque's own bi-partition of satellites into arguments versus everything else was shown to be too coarse; once the necessity of a more fine-grained classification of satellites is accepted, arguments might find their place in the theory after all. The next sections have demonstrated that this general theory of unmarked word order describes the range of attested cluster orders of modals, auxiliaries, a main verb and verbal particles nearly perfectly; the inclusion of 'let' into the database increases the fit, though we noted problems with potentially unexpected orders, which might, ultimately, lead to the exclusion of 'let'. Relevant cases need to be studied more closely. Verbs that appear in the third construction, perception verbs, benefactives, phase verbs, and motion verbs cannot be integrated as satellites of the lowest lexical verb, despite the fact that they show evidence of clause union. In terms of the lexical-functional dichotomy, we have strong evidence for treating auxiliaries, modals, and 'let' as functional satellites of the main verb; on the other hand, verbs appearing in the third construction, benefactives, phase verbs, etc. are not functional satellites of the most deeply embedded verb. The hypothesis that verbal particles are the lexical head of the clause is also strongly supported by the data; the gaps in the paradigm with particles can largely be motivated on prosodic grounds (Abels 2013).
Looked at from the perspective of verb clusters, we found that not all verb clusters fall under the theory of neutral word order from Sect. 2. We provisionally traced this observation to the fact that not all members of the cluster are satellites of the most deeply embedded verbal head in the sense of the theory (or, less plausibly, they might be members of distinct classes). Under the current view, the traditional diagnostics for clause union, which show clause union to accompany clustering, must be interpreted as follows. They involve clause union not in the sense that all elements in the unioned clause become satellites of the most deeply embedded verb but in the sense that there is a continuous sequence of verbal projections without an intervening, closing-off CP level-a conclusion already suggested by the recursive properties of verb clusters (Huybregts 1976;Shieber 1985).
The comments in the previous paragraph directly reflect the discussion of (21) in Sect. 3.1. The structures presented there give rise to a natural tri-partition of constructions. Borrowing and adapting terminology from Wurmbrand (2001) we can call them full clausal embedding without restructuring, (21a), functional restructuring, (21c), and lexical restructuring, (21b). I have suggested that only auxiliaries, modals, and causative 'let' participate in functional restructuring. The tripartition generated by (21) therefore does not map onto established classifications of verb clusters directly. In particular, the established notion of a verb cluster is much broader than what is analyzed here as functional restructuring. It includes many verbs subject to the IPP effect but excluded here. These must be lexical restructuring verbs, (21b), as they are clearly restructuring but give rise to 2-1-3 orders. However, there are further divisions within the lexical restructuring class as envisioned here, since this class will include the elements that give rise to tight verb clusters (as determined by diagnostics such as the IPP or displaced zu) as well as the less tightly integrated verbs in the third construc-tion. Various proposals exist that can be adopted or adapted to implement these further distinctions: the size of the infinitive as in Wurmbrand (2001) and the phasal status of it in ter Beek (2008) are the most obvious ones. I will leave open the question of how to correctly make the distinction within the class of lexically restructuring verbs.
An issue that has been kept in the background deliberately is the question of implementation, in particular the role of movement in the system. The formulation in Sect. 2.3 sets up an expectation about orders but does not directly commit to an implementation. I will now briefly discuss implementation more directly. I first address the case of movement in the derivation of within-class orders, then I move on to acrossclass interactions.
For within-class orders Abels and Neeleman (2012b) have shown that their system, allowing base generation of heads, specifiers, and adjuncts to the left or to the right of their respective sisters, predicts the same typology of orders as Cinque's system, where heads, adjuncts, and specifiers are systematically base generated to the left of their sisters. The data discussed here don't, as far as I can see, allow us to further distinguish the two theories. Both theories furthermore agree that the analysis of certain orders involves movement (3-1-2, 4-1-2-3, 1-4-2-3, 3-4-1-2, 4-3-1-2, 4-1-3-2, and 4-2-3-1). Even these movements are somewhat suspect, however, as they lack key properties otherwise associated with movement. Well-established movement analyses are motivated through their effect at both interfaces: the phonology, where movement is visible through word order, and interpretation, where it is visible as an interpretive effect. The movement operations involved here do not seem to have interpretive effects. In this they resemble head movement under most analyses in the government and binding framework and minimalism; head movement, too, is pure word order movement lacking interpretive effects (see Hall 2015: chapter 3 for recent discussion). The lack of interpretive effects of head movement has led to a number of theories that place head movement in the phonology (Boeckx and Stjepanović 2001) or construe it not as an effect of structural change but directly of linearization (extending ideas in Brody 1997, this line is taken in Abels 2000Abels , 2003Bye and Svenonius 2012;Adger 2013;Hall 2015). Those systems, however, are too weak to generate the full set of orders required under Cinque's and Abels and Neeleman's theories. In particular, the 3-4-1-2 and 4-3-1-2 orders create difficulties for these theories. A more radical approach avoiding the need for movement motivated strictly by considerations of word order might be based on Williams (2011Williams ( , 2013, who incorporates something like Bach's (1987) 'wrap' operation from categorial grammar into the derivation. For the moment, movement seems to be the best tool to approach the facts.
An even more complicated set of questions arises when we look at the relative ordering of elements not in the same class. We observed in Sect. 2.2 that, while it may well be the case that PPs, adverbs, arguments, and auxiliaries obey Cinque's generalization relative to other PPs, adverbs, arguments, and auxiliaries, respectively, this is not true across classes. Given the logic of Cinque's and Abels and Neeleman's theories, this leads to one of the following conclusions: (i) The base hierarchy only fixes the relative positions of elements within a class but allows (a certain) freedom of base hierarchy across classes, or (ii) Cinque's and Abels and Neeleman's assumptions about movement are overly restrictive and the system of movement must be enriched. Neither of these two approaches is a priori implausible.
Flexible base hierarchies of (clustering) verbs with respect to their arguments are frequently assumed in theories with argument-passing mechanisms (Hinrichs and Nakazawa 1989;Haider 1993Haider , 2003Neeleman and van de Koot 2002). Such theories typically also allow base generated scrambling of arguments with respect to adjuncts, another instance of freedom in base generating cross-class hierarchies. While these theories have a fairly clear and well-worked-out account of argument linking, they require further non-standard assumptions about the syntax-semantics interface that have not yet been worked out. This, I take it, is the main thrust of the argumentation in Wurmbrand (2007: section 4).
On the other side are theories that impose rigid base hierarchies. To solve the problems pointed out in Sect. 2.2 without losing the account of the fundamental leftright asymmetry, such theories can allow an additional type of movement: movement that preserves the relative order of elements within the moving class (Abels 2007). What is meant is illustrated in (34)  From the base structure in (34a), (34b-e) can all be derived without disturbing the base order of the adverbs, but (34f) cannot. In other words, the hierarchical order of adverbs in (34b-e) after movement is always the same as before movement: Adv 1 c-commands Adv 2 . The order is disturbed in (34f), which is not order preserving in the intended sense. We can now run Cinque's or Abels and Neeleman's theory on the output trees of these movements and generate exactly the prediction from Sect. 2.3. How plausible is this? Starke (2001) argues that the central idea of relativized minimality (Rizzi 1990(Rizzi , 2001 and related concepts of locality boils down to order preservation (hierarchically construed) within classes of elements. From that perspective, the movements in (34be) are well behaved. However, the postulation of these movements is again somewhat suspect (see Salzmann 2011). They don't seem to have the expected LF effects. They also fail to give rise to the freezing effects that might be expected to accompany movement (though see Williams 2002; Abels 2007 for a possible way out).
Both the flexible base-generation approach and the movement approach are compatible with the data surveyed here. In particular the O-V 1 -V 2 orders and V 2 -V 1 -V 3 orders of non-restructuring and lexical restructuring (Sect. 3.1) can be base generated if argument passing is assumed-where argument passing from daughter to mother is indicated by a superscript θ : Alternatively, they can be movement derived. (36a) is self explanatory. The first step of movement in (36b) moves VP 3 -this is order preserving in the intended sense, since VP 3 is an argument of V 2 and therefore a different class of satellite from the auxiliary represented here as V 1 . Taking the output of this first movement as input, the second movement is a standard case of movement allowed by Cinque's theory. The data discussed in this paper offer no new grounds to decide between these options, as far as I can see. 30 All in all, these considerations just scratch the surface. I have not made a serious effort here to describe, let alone explain, the ordering among satellites in different classes. Descriptively, we know that satellites from different classes can be interspersed without forming a rigid single hierarchy (Bobaljik 1999;Svenonius 2002;Nilsen 2003)-on the left side of the head. We also know that they can stack up on different sides of a head in classical nesting patterns and that two sets of satellites can stack up independently on the left in the cross-serial pattern. The cross-serial pattern might be a special (trivial) case of Bobaljik's 'shuffling together'. What interactions there are to the right of the lexical head is less clear. The order amongst satellites of the same class is more flexible after the head but interactions between classes seem to be curiously restricted. One observation showing post-head restrictions is the impenetrability of the verb cluster, a property which means that verb clusters may be interrupted by nonverbal material to the left of the head but not to its right (Bobaljik 2004 among others). We also find case adjacency to the right of the verb but not on its left (Haider 2005;Janke and Neeleman 2012), resulting in restricted interactions between argument and adjunct placement. Further, Abels (2007) claims that the mirror image of the Swiss German and Dutch cross-serial pattern is never found, which again suggests an asymmetry in the interactions between orders. A systematic study of these questions and a general theoretical analysis do not exist yet.
A final issue of implementation that I would like to mention here and that has also been kept in the background has to do with the way ordering parameters are stated in the grammar. In Abels and Neeleman (2012b: 66-68) this question is discussed. It is suggested that for treelets consisting of a mother, α, and its two daughters, β and γ , the grammar contains ordering statements of the following form: In the structure [ alpha β γ ], order β before/after γ. The description in these statements can mention properties of all three nodes in a binary-branching treelet. Where needed, movement is invoked in addition. Movement is generally a dispreferred option in the derivation of neutral word order. There is probably a learning bias against it. This would explain why 3-1-2 orders are relatively more rare as neutral orders than 1-2-3, 3-2-1, and 1-3-2. (On the rarity of 2-3-1, see Abels 2013 and Sect. 3.3.) Harmonic word order patterns are simpler to describe (and learn) than non-harmonic ones, since fewer specific properties of mother and daughter nodes have to be mentioned in the statements, which, in a harmonic word order language, are therefore fewer and more general. The data on language-internal ordering variation within the NP in Cinque (2005) suggest that this way of formulating the ordering parameters allows for a sensible description in the sense that the different orders available in a given variety can be related by adding a single movement to the system needed for the language more generally, though that movement, of course, can be of a nature not available in the derivation of neutral orders (order-changing movement of constituents not containing the head or rightward movement). The system allows several orders to be available as neutral orders in a given variety, in particular, by simply not fixing the relative order of two sisters in a treelet through a linearization statement. Whether the same naturalness will characterize accounts of language internal variation in the domain of the verb cluster under the approach taken here remains to be verified.
To sum up one more time, in this paper, I have highlighted some of the concepts involved in Cinque's theory of Universal 20. It was shown that Cinque's approach carries the promise of providing a genuine theory of word order, since most of the concepts generalize easily and with encouraging results. However, we did see that a classification of satellites is crucial for the theory (reinforcing conclusions in Bobaljik 1999;Svenonius 2002;Nilsen 2003) but left the details to future work. Verb clusters were investigated as a particularly recalcitrant test case. The investigation yielded interesting conclusions about the contentious classification of modals and verbal particles. With these conclusions held firm, we have discovered that verb clusters made up of auxiliaries, modals, 'let', a main verb, and verbal particles obey Cinque's generalization. Hopefully, this investigation will contribute to an ultimate theory of neutral word order both empirically and theoretically. (38) ek I ni e me an, assume hei he werd 0 will kont 2 cansup gaut good schl ä apen 3 sleep.inf hemn 1 have.inf 'I assume that he will have been able to sleep well.' (Bölsing 2011: 215) The expected form for example (37) is therefore the following: This expected structure is disfavored for two reasons. First there is an immediate repetition of hemn, a kind of haplology that tends to be avoided. Indeed, Bölsing (2011: 217 fn 32) suggests that this repetition is the reason for the unusual order in a similar example. He fails to notice, however, that the order also appears in examples where repetition of 'hemn' is not the issue. Alas, Bölsing does not indicate whether the haplological form is possible as an alternative. The second reason why (39) is disfavored has to do with prosody. As reviewed briefly above, Abels (2013) suggests that 2-3-1 orders are rare across dialects because they are prosodically phrased as 2 (3 1, that is, in such a way that the prosodic and the syntactic phrasing do not match. No other order expected under the current theory has this property. 33 The expected 2-4-3-1 order in (39) would have the prosodic bracketing 2 (4 3 1, which again gives rise to a prosody-syntax mismatch. This is the second reason why (39) may be disfavored. The haplology reasoning seems sound. Indeed, the dialect arguably has two ways to resolve the haplology. One is to shift from the expected order in (39) to the order in (37). The other is to syncopate one of the instances of the auxiliary. Bölsing (2011: 210-211) observes that example (37) alternates with the following form: (40) hei he werd will kont could.sup mor'n tomorrow ema a t mow.ptcp hemn have.inf 'He will have been able to have mowed tomorrow.' (Bölsing 2011: 211) What is puzzling about the form, though Bölsing never comments on this property, is the presence of both a verbal participle and the supine of a modal despite the presence of only a single licensor of such forms, namely, the final hemn. This would cease to be puzzling if we assumed spreading of the participial form (as known from Frisian and Norwegian), but there is no other evidence in the dialect of such a process. We can derive (40) from (39) by syncopating one of the occurrences of the auxiliary and solve the problem of how to license both the participle and the supine. It should be noted that the dialect also allows minimally different forms with only the supine and only the participle. Both of them are glossed with the expected meaning involving a single perfect (either above or below the modal), while Bölsing is puzzled by the double perfect meaning of (40) in the absence of two perfect auxiliaries. If (37) was the only problematic case from Lindhorster Platt, we could probably set it aside as an alternate order: The expected neutral order is (39), which we could then claim is pronounced as (40). Indeed, if evidence can be found that (37) is a marked alternative to (40), this would make (37) irrelevant. However, the reasoning based on haplology does not carry over to the following examples, all of which are expected to show 2-(5-)4-3-1 order but are given by Bölsing with 2-1-(5-)4-3 order. The reasoning that disfavors the 2-(5-)4-3-1 on the grounds of a syntax-prosody mismatch does carry over, of course. have.inf 'The car will have had to have been painted a long time ago.' (Bölsing 2011: 217) There is no indication in Bölsing's discussion that these orders alternate, though the text does not exclude the possibility and the forms are introduced as 'rare' but 'possible in principle'.
I should point out another eccentricity of these facts. It is a fairly strong generalization (Haegeman and van Riemsdijk 1986;Salzmann 2011) that in verb clusters non-verbal scope bearing elements like negation take scope over (not necessarily all) material to their right but not over material to their left. The Lindhorster forms under discussion here flout this generalization, as can be seen from the scope of negation in (41a). This may suggest that the problematically positioned supines are not at all part of the cluster. This idea is further supported by the observation that the supines of the modal verbs seem to drift towards the Wackernagel position, preceding all other material in the middle field: 'When I still didn't hear any banging at half past five, I was immediately able to draw some conclusions.' The positioning of the supine kont ahead of the weak pronouns is very unusual and noteworthy here. Clearly, these examples merit further study. In particular the question whether the unexpected orders alternate with expected ones should be looked into. If not, this might indicate that the categorical ban on rightward movement embraced by Cinque should be replaced by a (strong) preference for leftward movement (of obligatory elements), as in Abels and Neeleman (2012a). The position of the supine (and the auxiliary) in an apparent Wackernagel position together with the scope facts might suggest an altogether different approach, though, based on the idea that the supine behaves as a Wackernagel clitic and moves for that reason.