1 Introduction

The article details the formational process of the FinnTransFrame corpus, a part of the FinnFrameNet project which in addition to a large annotated frame semantic corpus of natural language examples has created a separate corpus of examples translated from English to Finnish. The database of Finnish semantic frames is based on the original English language FrameNet housed at the International Computer Science Institute in Berkeley, California. The Finnish FrameNet project started by collecting 90,592 examples of different frame examples from the original Berkeley FrameNet. The examples represented 866 different frames and the elements that evoke them. The goal was to collect examples that included different frame elements in order to provide the project with a large collection of different uses of frames. The collected examples were translated by professional translators so that each English frame element would have a Finnish equivalent if possible. As part of the translation quality assurance, we examined how the frames translate into Finnish and to what degree the frame elements correspond in the two languages. The FinnFrameNet project has created two separate databases: one is the FinnTransFrameFootnote 1 database of translated frame examples and another is the FinnFrameFootnote 2 database of annotated authentic Finnish sentences that were retrieved from monolingual online language corpora. The FinnTransFrame database was created in order to enable the evaluation of the cross-lingual projection of frames. The other database has been developed for evaluating how the original English frames apply to authentic Finnish rather than translational Finnish. The current article introduces the FinnTransFrame corpus and evaluates frame projection on translated samples and the problems we discovered. The research question when creating the FinnTransFrame corpus was to see to what extent the various frames of the original Berkeley FrameNet transfer into Finnish in translated examples, i.e. what are the main problems and how can they be categorized? We discuss some reasons why the frame translation method works in most cases and why it does not work for certain frames. The FinnFrameNet project is a part of the FIN-CLARIN consortium that aims to build a common infrastructure for the digital humanities in Finland.

The term frame refers to mental concepts of real-world scenes that consist of elements that often participate in the scene in question. The theory of Frame Semantics was originally introduced by Fillmore (1976, 1982). Ideally, if a translation is accurate, the frames that can be found in a certain sentence should translate from the original language to the target language. In other words most of the frames are thought to be universal within a certain cultural region. In order to understand a frame, one must always have some amount of background information that helps in interpreting the frame. (Fillmore and Baker 2010: 318). For instance, to understand the weekdays and descriptions about the flow of time, one must have information about how time is seen and counted in Western cultures. Most work on Frame Semantics has shown that frames can be used across typologically different languages and that certain semantic domains appear to be very similar across languages. However, it was noted throughout the project that such equivalence could not always be met. It has also been noted before (Čulo 2013) that a construction comparable in function in the target language might lead to a frame change.

Research results obtained by Padó and Lapata (2009) showed a frame match rate of nearly 72% when a test set of example sentences was translated between English and German. A relevant factor in this is an idea called the primacy of the frame, in which the translator aims to preserve the conceptual information connected with a frame in the source language by picking an adequate frame in the target language (Čulo 2013: 144). In an ideal situation, the two frames are maximally comparable. According to Čulo (ibid.), this percentage is likely to vary depending on, for example, the language pair in question. Even though our study does not replicate the work of Padó and Lapata in terms of material for annotation, it was to be expected that not all the frames would translate from the original language to the target language.

2 Berkeley FrameNet

The Berkeley FrameNet (BFN) projectFootnote 3 is based on the theory of Frame Semantics and is creating an online lexical resource for English (Fillmore and Baker 2010; Ruppenhofer et al. 2010: 5). FrameNet focuses on inspecting and analyzing semantic frames, for example situations, events, processes, states, physical and visual characteristics and a myriad of other phenomena. The core element in the frame is the element, which evokes the phenomenon or event in question. These words are called frame evoking elements (FEE). After identifying a certain frame, the semantic arguments and complements included in the event or phenomenon, called frame elements (FE) are also analyzed. Null instantiation is a category for annotating absent semantic constituents or frame elements (Fillmore et al. 2003: 245). The term null refers to the fact that a core frame element is absent. Conceptually salient FEs are not always realized as lexical or phrasal material in a sentence, but they have nevertheless been annotated in BFN since they provide relevant insight into the omissibility of semantic material (Ruppenhofer et al. 2010:24–26). Annotation of so called null instantiations is a feature of BFN that was not used in the FinnFrameNet project (Lindén et al. 2017).

All the elements in every frame have been described and labeled according to their semantic role or content. Frame elements are divided into focal core elements and less focal, general peripheral elements according to their relevance to the phenomenon described in the frame. Core elements vary greatly from frame to frame, whereas peripheral elements, for example Time and Place are more general and used in many frames. In BFN, there are also valence tables for each FEE, illustrating how the semantics of a frame can be realized at the syntactic level. FrameNet is the only lexical database that includes such detailed information about the mapping of semantics to syntax.

Frames vary greatly in scale and in the level of abstractness, as well as in the amount of frame elements they incorporate. Semantically extensive and general frames often contain finer, more specific subframes. The same sentence can be annotated as part of a wider frame as well as a more specific frame. The word class of the FEE can also be a defining feature. In BFN, the word class sometimes functions as the distinguishing criterion between frames, as is the case of the frames Sounds and Make_noise covering nouns and verbs, respectively. A distinction of this type may exist both between subframes and more general ones. Similar to the frames themselves, specific formal criteria may also apply to frame elements. The element Role in the frame Judgment_communication is syntactically defined as follows: ‘Role is used for the capacity in which the Evaluee is judged, and is expressed in as-PPs.’Footnote 4 In theory, there can be a limitless number of different frames, and new frames are created and constantly added to the list.

In addition to the English and Finnish FrameNet projects, FrameNet-inspired research also exist for German (Burchardt et al. 2006), Spanish (Subirats and Petruck 2003), Swedish (Borin et al. 2010), Brazilian Portuguese (Duran and Aluísio 2011), Japanese (Ohara et al. 2004), Chinese (You and Liu 2005), Korean (Hahm et al. 2014), French (Meurs et al. 2008), Danish (Bick 2011), Polish (Zawisławska et al. 2008), Italian (Tonelli and Pianta 2008; Lenci et al. 2010), Slovenian (Lönneker-Rodman 2007) and Hebrew (Hayoun and Elhadad 2015). For a list of FrameNets in other languages, see Appendix. For some valuable information about how FrameNets for languages other than English can be structured using different principles, see Boas (2009). For a more detailed overview of the Finnish FrameNet project and other similar projects, see Lindén et al. (2017).

3 Finnish TransFrame corpus

The fundamental objective of FinnFrameNet, a Finnish version of BFN, was to create a new corpus and at the same time test the hypothesis of frame invariance across languages and more specifically to see how well the frames of the original BFN transfer into Finnish, i.e. how universal the frames actually are. For the compilation of FinnFrameNet, 90,592 English examples were chosen from the original BFN corpus and translated into Finnish. The English text was given with the FEEs and FEs marked as slots to be replaced by professional translators, who were instructed to emphasize language fluency over word-to-word translations. After this the translations were used by the FinnFrameNet project for automatically searching corresponding sentences from the National Library Newspaper corpusFootnote 5 in order to find attested examples of the frames and to avoid using only translated Finnish. The results of the FinnFrameNet database project will not be discussed further in this article. However, attested instances of all frames have been found and annotated. For more information, see Lindén et al. (2017).

The translated examples constitute a separate corpus, known as the FinnTransFrame corpus. Using this translated corpus we wanted to examine the reasons why some of the semantic frames do not survive the translation process from one language to another, in our case from English to Finnish. Moreover, we wanted to find out what kind of frames did not cross over, and if the explanation for this is some sort of common denominator between the frames in question. The applied method is a variation of the annotation projection method described in Padó (2007), where the frame elements from the source language have been directly transferred to another language. The aim is to measure the degree of cross-lingual parallelism, which is the main assumption underlying annotation projection (Padó 2007: 2). It assumes that the annotation is parallel across languages to some degree. If the degree of cross-lingual parallelism is high, automatic annotation of the target language may be possible, which would help in creating additional frame annotated data. This is important as manual annotation is rather costly.

The annotation projection method has mostly been applied to languages closely related to each other, for example English–German (Padó and Lapata 2009). Our aim is to study the level of cross-lingual parallelism that can be observed between Finnish and English—languages that are unrelated to each other. Only one earlier study concerning Finnish and the applicability of FrameNet frames exists: Borin et al. (2012) have previously tried to find out how well the FrameNet frames correspond and transfer between Swedish and Finnish. Their method was to use words that are linked in parallel corpora of Swedish and Finnish using a bilingual lexicon, thus differing from the method applied in our research. Their main goal of comparing the transferability of frames between languages not demonstrably related to each other was similar to ours. However, Borin et al. only used a very small number of frames, limiting the inspection to frames most commonly appearing in their Swedish corpus. Therefore, although providing an interesting experiment, their results only say something about the transferability of a limited amount of common frames. According to Borin et al. (2012: 14), the automatic word alignment with Finnish is generally seen as a complicated task because of the free constituent order and rich morphology of Finnish. This is certainly true. Some of these problems had to be solved in both the FinnFrameNet and FinnTransFrame projects. Both corpora were created to investigate how many of the original English frames and frame elements are directly reusable in Finnish. Our main goal in this article is to test the parallelism of as many frames as possible, and compare the results between different frames: what kind of frames transfer easily, and which frames are difficult? What are the main reasons for the non-parallelism?

Cross-lingual parallelism holds if the linguistic unit and its translational equivalent receive identical analyses (Padó 2007: 3). It is, however, unrealistic to expect any two languages to be in perfect correspondence, even if they are closely related. Cyrus (2006), Padó (2007) and Padó and Lapata (2009) use the term translational shift for the notion that translations deviate in many ways from their originals, which causes non-parallelism of frames. Translational shifts can be grammatical or semantical in nature. It has been investigated in a number of studies (for example Padó 2007: 3) that lexical-semantic annotation shows a higher parallelism between languages than syntactic annotation, which is partly due to language-specific syntax annotation conventions. Frames are generally based on conceptual structure (Fillmore 1982). Semantic frames constitute generalizations over surface structure, and are therefore less prone to syntactic variation and syntactic annotation conventions. Our focus was primarily on semantics, but because of our direct annotation alignment method, structural factors turned out to be rather important as well. Below we illustrate an ideal case of frame transfer, outline the method used in the analysis in this study and discuss in more detail the results along with the problem types we encountered. For readers unfamiliar with Finnish, we recommend WikipediaFootnote 6 for a general overview of the language, its typology and history. We also recommend the web site Venla—Finnish for foreignersFootnote 7 for an introduction to Finnish for beginners or Leila White (2008) with an overview of Finnish grammar.

3.1 Ideal case of frame transfer

Ideally, the frames listed in FrameNet are thought to be universal to a certain degree and thus translate from one language to another, given that the translation is accurate. In our project, we wanted to test the assumption of cross-lingual parallelism (Padó 2007) according to which also the annotation of the original sentence should be applicable to the target language. In an ideal translation example, each frame element matches the original frame as in Example 1 where the frame elements appear in the same order. As in Example 1, only the frame elements and the frame evoking element were translated.

(1)

[AgentDuroc] picked up his broad-brimmed black hat, and SET [Themeit] [Goalupon his head]. (Placing)

 

[AgentDuroc]…

LAITTOI

[Themesen]

[Goalpäähänsä].

 

Duroc

put.3SG.PST

        it.ACC

       head.ILL.3POSS

A change in word order is a good example of how a problem for the annotation projection method is no real problem for the translatability of the frame example. The example is considered an ideal case of frame transfer despite a word order change. Often the word order in Finnish does not match the one in English resulting in a curious order of the Finnish frame elements. This does not pose a problem per se, since it would be unreasonable to expect that the word order always matches perfectly. This has been solved in the data set by numbering the Finnish elements in the correct order, which is demonstrated with subscript numbers in Example 2 below.

(2)

Selkirk’s provost, Tom Henderson, [LeaderDIRECTOR] [Governedof a textile dyeing company], berated him for damaging the image of the tweed trade. (Leadership)

 

… [LeaderJOHTAJA]:2

[Governedtekstiilinvärjäysyrityksen]:1

 

              director

            textile.dyeing.company.GEN

 

Some frame examples were translated with ease, preserving even the same word order as in the original English sample. Some frames, however, required more effort, and some examples did not translate from English to Finnish without losing the original frame. It should be noted, however, that a problem in the translatability of a certain example does not necessarily mean that the frame is not universal or that it cannot be translated into Finnish at all. Most of the time the problem was just the one single example in which the frame did not transfer or the annotation had to be changed. However, with a sufficiently large number of examples, we can still get an indication of the translatability of each frame.

3.2 Analysis relating to problematic cases of frame transfer

Our zero hypothesis was complete frame transfer. If there was even a slight mismatch, the hypothesis was rejected, i.e. the example was categorized as problematic and labeled as an item for further study. In the following Sects. 4 and 5 we describe the two stages of analysis that these problematic examples went through. We wanted to closely inspect the reasons why some examples did not fit the original English frame when translated into Finnish. In Sect. 4, the annotators went through the problematic examples and labeled them according to the reason for mismatch. This resulted in five different groups, out of which the last group consisting of real cases of frame non-parallelism was analyzed in more detail. The process and results of the second round of analysis are described in Sect. 5.

4 Data analysis method

After the translation of the original English examples, the annotators went over the translated examples evaluating firstly the validity of the Finnish translations, and secondly, whether or not the example sentences still fit the frame in question after going through the translation process. Problematic examples were tagged according to the mismatches we encountered:

  1. #1.

    The original English sentence is not really an example of the indicated frame or is otherwise incorrectly annotated in the first place.

  2. #2.

    The Finnish translation of the sentence is erroneous.

  3. #3.

    There is a problem with the frame evoking element (FEE), such as splitting or missing an equivalent element in the translation.

  4. #4.

    There is a problem with a frame element (FE), such as changing the role of the element in the translation.

  5. #5.

    The whole frame changes in the translation process.

These mismatch types will be discussed in detail. In some cases, the annotators changed and corrected some of the examples in order to avoid throwing away example sentences that were otherwise appropriate in a frame. These corrections include correcting typos, improving the translation without changing the original meaning, and shifting the division of the elements so that all the elements in the frame have the right lexical equivalent (some of these problems were caused by the collection procedure and were rather superficial). In Sect. 4.1, we give a quantitative overview of the problems. In Sects. 4.2, 4.3, 4.4, 4.5, we discuss problems #1–#4 which were unrelated to non-parallelism. In Sect. 5, we analyze the problems related to non-parallel frames.

4.1 Quantitative error analysis

The translation data contained 90,592 examples from 866 different frames. Of these, 2383 examples were disqualified either because there was a problem with the original English annotation or the original sentence appeared in the wrong frame (995 examples; 1.1%), or because the Finnish translation was erroneous (1388; 1.5%). These problems (#1 and #2) were unrelated to the question of transferability of frames and the corresponding samples were therefore excluded leaving 88,209 relevant examples for further study.

We found that in 3020 (3.4%) plus 3733 (4.2%) samples, there was something wrong with the annotation of the FEE or a FE, respectively (problems #3 and #4).Footnote 8 Finally, in 3732 (4.2%) samples, while the translation was correct and the annotation unproblematic, the frame was still lost in translation representing problem #5. These examples were the most interesting for this study and they will therefore be examined in more detail in this article.

After this analysis, we were left with 77,724 examples that were neatly translated and which preserved the original frame in Finnish as well. Based on the results, the success rate for unproblematic frame transfer between English and Finnish is 88.1%. The figures are presented in Table 1.

Table 1 Quantitative results for error analysis

Out of the 866 frames, 24 (2.7%) were left completely without an acceptable Finnish equivalent. Even though the result tells something about the difficulty of frame transfer in some cases, it is also a result of the shortage of the original English example sentences; Hunting_success_or_failure, for instance, is a very specific frame with only one example sentence on the BFN website. Therefore, if the single example sentence does not transfer well into Finnish, the transfer rate in this case turns out to be 0%. Out of the 24 frames that ended up not transferring into Finnish, three had only two original English examples in BFN, and 21 only one example. Some frames, on the other hand, have plenty of examples; the frame Clothing had 2176 example sentences, out of which 2136 transferred well into Finnish. Thus, there is some disparity in the success rate when considering frame transfer, and some problematic frames with few examples may be of less consequence if we assume that their few examples indicate that the frames are also infrequent in English. Some of the frames, however, clearly indicate relevant problems in terms of transferability. The question of frame-specific difficulty is further discussed in Sect. 5.5.

4.2 Errors in the original English examples

The Examples 3–5 below show some original English sentences appearing in the wrong frame due to mistakes in the annotation of the original corpus (Problem #1). These are usually easy to notice by comparing the name of the frame and the semantic content of the example sentence. Typically, a different meaning of the FEE, which could be evoked in a different context, has triggered a tagging error. The name of the frame in question is in parenthesis, and the FEE and its translated equivalent are in capital letters. The name of the frame element is given as a subscript.

(3)

BRILLIANCE [Behaviour of the court life] (Mental_property)

 

LOISTOKAS

[Behaviour hovielämä]

 

brilliant

             court.life

(4)

[Part INNING] [Whole of a large bay studded with small clay islands called ‘eyes’] (Part_ordered_segment)

 

[Part KUIVATUN

SUON]

[Whole suuressa

poukamassa]

 

       drain.PASS.PTCP.PST.GEN

swamp.GEN

         large.INE

bay.INE

 

‘In a large bay of a drained swamp’

   

(5)

He executed an unnecessarily dramatic handbrake turn, followed his quarry for long enough to ascertain its speed, [Time then] [Manner quickly] OVERHAULED [Entity it] (Reforming_a_system)

 

Hän…

[Time sitten]

[Manner nopeasti]

OHITTI

[Entity sen]

 

he

       then

          quickly

overhaul.PST

        it.ACC

4.3 Errors in the Finnish translation

The following Examples 6 and 7 demonstrate bad or erroneous Finnish translations of the original examples (Problem #2). Some of them are obviously caused by the polysemy of English verbs, which can be tricky for a non-native English speaker to distinguish in some cases.

(6)

[Authority The prime minister] [Time yesterday] ruled out posthumous PARDONS [Offender for more than 300 British soldiers] (Pardon)

 

[Authority Pääministeri]

[Time eilen]…

ANTEEKSIPYYNTÖJÄ [= apologies]

[Offender yli 300

 

            prime.minister

       yesterday

apology.PL.PART

          over

 

brittisotilaalle]

   
 

british.soldier.ALL

   

(7)

[Perceiver_agentive He] ATTENDED [Manner without wavering] (Perception_active)

 

[Perceiver_agentive Hän]

TULI

MUKAAN [= joined]

[Manner empimättä]

 

                     he

come.PST

along

          hesitate.INF.ABE

Other problematic cases reflect inaccuracy in translation as in Examples 8-10 for which good literal translations exist but were not used.

(8)

[Resource_controller Osborn] was [Degree much more] GENEROUS [Behavior in his comments to Mumford] (Stinginess)

 

[Resource_controller Osborn]

oli

[Degree paljon]

SUURPIIRTEISEMPI [= general]

 

                       Osborn

be.PST

          much

general.COMP

 

[Behavior kommenteissaan

Mumfordille]

 
 

comment.PL.INE.3POSS

 

Mumford.ALL

 

(9)

[Part_1 Stirling and Mayne] then SEPARATED [Part_2 from the others] (Becoming_separated)

 

[Part_1 Stirling ja Mayne]

EROTETTIIN [= were separated]

[Part_2 muista]

 

Stirling and Mayne

separate.PASS.PST

         other.PL.ELA

(10)

[Whole The night] was SPLIT [Means by flame and thunder] (Separating)

 

[Whole Yö]

KATKESI [= cut short]

[Means salamoihin

ja

ukkoseen]

 

        night

break.PST

         lightning.PL.ILL

and

thunder.ILL

4.4 Problematic frame evoking elements

There were several kinds of problems with the frame-evoking element. The frame-evoking element (FEE) is assumed to be a single word (or sometimes a lexicalized multi-word construct). Sometimes FEE lexemes (like institutionalize) translate into Finnish as longer constructs (like sijoittaa laitoshoitoon) consisting of a more generic FEE and some other frame element. In a reverse situation, the Finnish expression (tottua) is more compact than its English counterpart (become accustomed to): the FEE together with another word in the phrase combines into a single Finnish word which would deserve FEE status but a suitable frame may not exist. Examples of both cases are presented below.

The translation task aimed for translations in which each English FE and FEE in the frame would get a Finnish language equivalent if possible. However, sometimes the semantically accurate translation required more or less frame elements than the original. In other words, a frame element used in an English language frame had to be incorporated in the frame evoking element in Finnish. For example, some Finnish verbs incorporate another lexical element, in which case the English frame element has an equivalent in the Finnish translation only as a morpheme. Example 11 illustrates an instance of Finnish reflexive verbs or so-called UTU-verbs (for example heittäytyä, ‘throw oneself’, which incorporates the element Theme; for UTU-verbs see for example Sulkala and Karjalainen 1992: 133):

(11)

Then [Agent he] turned and FLUNG [Theme himself] [Goal into his nanny’s arms]. (Cause_motion)

 

Sitten

[Agent hän]

kääntyi

ja

HEITTÄYTYI

[Goal lastenhoitajansa

käsivarsille].

 

then

he

turn.PST

and

throw.REFL.PST

       nanny.GEN.3POSS

arm.PL.ALL

Sometimes even the English FEE itself is not needed in the corresponding Finnish expression. The frame Becoming contains numerous examples which have been given a flawless Finnish translation retaining the original idea, but the verb become, which is the FEE in the Example 12 below, corresponds to zero words in the paired annotations. The expression become accustomed translates into Finnish as a single verb tottua; a literal translation (? Olen tullut tottuneeksi) would be unnatural, if not ungrammatical. All non-auxiliary verbs express either a state continuation or a state change. Sometimes the change is expressed by the verb itself and sometimes we need a verb to indicate the transition. In the Becoming frame, BFN has opted to annotate the change indicating element as the FEE and consequently there is no overt corresponding element in Finnish in the become accustomed case.

(12)

[Transitional_periodDuring the last three years] [EntityI]’ve BECOME [Final_stateaccustomed to refinement]. (Becoming)

 

[Transitional_periodViimeisen

kolmen

vuoden

aikana]

[Entityminä]

olen

 

last.GEN

three.GEN

year.GEN

during

1SG

be.1SG

 

[Final_statetottunut

 

hienostuneisuuteen].

 
 

             become.accustomed.PTCP.PST

 

refinement.ILL

   

German and Swedish have discontinuous verb and particle constructs similar to English phrasal verbs and accordingly, FrameNet implementations in those languages have allowed splitting the FEE in these cases. For more information, see Burchardt et al. (2006) for German and Borin et al. (2010) for Swedish. Finnish does not have particle verbs, but we have used this to justify discontinuous FEE annotation in marginal situations where the FEE can still be seen as a single unit.

There were also cases in which the original English frame evoking element required several Finnish words in order to translate. Example 13 shows the FEEs found in the frame Activity_resume.

(13)

[Time In May 1990] [Agent the government] RENEWED [Activity protests to Romania about pollution in Ruse by chlorine gas from the neighbouring Romanian town of Giurgiu]. (Activity_resume)

[Time Toukokuussa

1990]

[Agent hallitus]

ESITTI

uudelleen

[Activity

Romanialle

May.INE

1990

       government

present.PST

again

 

Romania.ALL

Vastalauseensa

Rusen saastuttamisesta

viereisestä

romanialaisesta

kaupungista

protest.3POSS

Ruse.GEN pollute.NMLZ.ELA

adjacent.ELA

Romanian.ELA

town.ELA

Giurgiusta

tulevalla

kloorikaasulla].

Giurgiu.ELA

come.PTCP.PRS.ADE

chlorine.gas.ADE

In Example 13, the translation changes both the frame and the frame elements. The Finnish language expression esittää uudelleen vastalauseensa (‘present its protest again’) better suits the frame Cause_to_perceive, whereas the more literal translation uudisti vastalauseensa would sound rather unnatural. The German Framenet had a different approach to these situations: they allowed assigning two FEEs to a multi-word expression, which would then have two frames constituting a frame group (Čulo 2013: 149–150). We decided not to add this kind of new structure to the conventions we adopted from BFN.

What a language can easily express in one word seems to be decisive in assigning frames for concepts as in Example 14. The frame Institutionalization is defined as follows: “A Patient is committed to the care of a medical Facility by a proper Authority”.

(14)

[Patient He] was INSTITUTIONALIZED [Time at the age of eight] [Facility in a school for the destitute]. (Institutionalization)

 
 

[Time Kahdeksan

vuoden

ikäisenä]

[Patient hänet]

oli

SIJOITETTU

 

eight.GEN

year.GEN

aged.ESS

he.ACC

be.PST

place.PTCP.PASS.PST

LAITOSHOITOON

[Facility köyhille

tarkoitettuun

kouluun].

institutional.care.ILL

poor.PL.ALL

intended.for.ILL

school.ILL

In Finnish, Example 14 is translated with a periphrastic construct sijoitettu laitoshoitoon ‘placed in institutional care’ as the literal translation *laitoshoitoistaa is not used in Finnish. As the construct is not a fixed expression in Finnish, it may not be justified to keep the frame Institutionalization in Finnish. Instead, the Finnish translation should probably have been annotated with a more generic frame such as Placing (with sijoitettu as FEE and laitoshoitoon as Goal) from a Finnish language point of view. However, from a translation unit point of view, the Institutionalization frame is acceptable.

In the following pair of sentences in Example 15, the problem is reversed. The Finnish translation has a separate derived verb for become institutionalised. This calls for an intransitive (inchoative) frame of each such derived verb. We can compare with the frame pairs Attaching/Becoming_attached, Cause_expansion/Expansion etc. As English makes use of transitive verbs in both cases, such frame pairs do not yet exist in the current BFN.

(15)

[Patient Some children] become INSTITUTIONALISED [Manner more quickly than others]. (Institutionalization)

 

[Patient Jotkut

lapset]

LAITOSTUVAT

[Manner nopeammin

kuin

toiset].

 

some.PL

child.PL

institutionalise.REFL.3PL

        quick.ADV.COMP

than

other.PL

English is also known to have a fairly productive strategy of forming verbs from nouns by zero-derivation (conversion): the frame Cause_harm includes a lot of these, such as elbow.v, knee.v, bruise.v. Generally, these must be translated into Finnish as ‘hit with the elbow’, ‘kick using the knee’, ‘inflict a bruise’. There are some exceptions such as stone.v which has the frame-preserving translation kivittää (derived from kivi ‘stone.n’).

In the following Example 16, in the Finnish translation of the English expression be up to is expressed by morphology in a grammatical construct rather than by a separate word that could be marked as frame evoking.

(16)

[Circumstances Now a podgy, desperately unfit bar-fly], [Entity he] simply wasn’t UP TO [Activity taking on the Man of Action role that he craved]. (Being_up_to_it)

[Circumstances

Turpeana,

toivottoman

huonokuntoisena

baarikärpäsenä]

[Entity häneSTÄ]

EI

 

podgy.ESS

hopeless.GEN

unfit.ESS

bar.fly.ESS

  he.ELA

NEG

OLLUT

[Activity

ottaMAAN

vastaan

Man of Action

-roolia,

jota

hän himoitsi].

 

be.CONNEG

 

      take.INF.ILL

on

  

role.PART     which.PART

he

desire.PST

If all distinctions in all languages are always encoded as new frames, this will lead to a proliferation of frames in all languages. An alternative route would be to annotate derivational morphemes as FEs corresponding to the Theme in Example 11 or even as FEEs corresponding to the Becoming frame in English in Example 12.

In short, many of the FEEs not neatly translating into Finnish within the original frame are due to the paired format in the translation task, which perhaps required overly rigid word correspondences between the elements in the two languages. Still, many of the examples above show that English and Finnish sometimes differ in which concepts they encode as separate lexical items, and this inevitably also leads to some shifts in translation as will be further discussed in the next section.

4.5 Problematic frame elements

When studying the annotation projection method, examples where even a single frame element did not transfer into Finnish were first marked as problematic and then further studied to find out to which degree annotations were parallel between the languages. Some examples were rejected because a frame element did not have an equivalent in Finnish and is therefore empty or its annotation (i.e. semantic role) changes. For instance, a structural difference between languages can cause a semantic role to change in translation, as in Example 17 below, where the element Internal_cause (with surprise) changes to Depictive (yllättyneenä) in Finnish. A more literal translation (yllätyksestä) could have been accepted as the original frame element. In Example 18 below, the role Depictive changes into Time which causes problems for the projection of the semantic labels. In Example 19, the element to himself would be annotated as the Addressee in the frame Communication_manner, but the Finnish word-to-word translation itsekseen should rather be interpreted as the Manner of communication.

(17)

[Sound_source Piper] YELPED [Internal_causewith surprise]. (Make_noise)

 

[Sound_source Piper]

ÄLÄHTI

[Manneryllättyneenä].

 

                 Piper

yelp.PST

           surprise.PTCP.PST.ESS

(18)

[DepictiveMoved to rectify this situation], [Agent he] ABANDONED [Theme plans of working in the missionary field] (Abandonment)

[TimeSiirryttyään

korjaamaan

tätä

tilannetta]

hän

HYLKÄSI

          after.moving.to

rectify.INF.ILL

this.PART

situation.PART

he

abandon.PST

[Theme

suunnitelman

tehdä

lähetystyötä]

 

plan.ACC

do.INF

missionary.work.PART

(19)

[Speaker He] keeps MUMBLING [Addresseeto himself]. (Communication_manner)

[Speaker Hän]

jatkaa

JUPINAA

[Manneritsekseen].

he

continue.3SG

mumbling.PART

          by.himself

Sometimes an example (that was otherwise good) had to be rejected because an English frame element did not have a match in Finnish. In Finnish, it is common to express the element DEGREE with a comparative morpheme, i.e. an English expression more polite would translate as kohteliaampi. However, this results in the element Degree (more) having no exact one-word equivalent in the Finnish translation. Also the separate possessive pronoun is usually left out in Finnish and expressed with a clitic, which results in the annotation between the languages not being parallel, as in Example 20 where the Wearer is left out:

(20)

Mitch took off [Wearer his] HELMET. (Accoutrements)

 

Mitch otti

pois

KYPÄRÄNSÄ.

 

Mitch  take.PST

away

helmet.ACC.3POSS

It is important to note that the translation example can be a good example of the frame in question despite some problems in a single frame element. A problematic frame element is thus only a problem for the annotation projection method and not for the frame parallelism as such. In our data, a problem caused by an erroneous frame element was mentioned as the reason for rejection for 3733 examples. This corresponds to about 4% of the examples. Only the primary reason for rejection was recorded, which is why annotation projection errors can be found in more than 4% of the translation counterparts.

5 Frame non-parallelism

When other problems had been ruled out, we were left with the cases where the primary reason was a frame change in the translation process. We hoped this data would shine a light on the reasons why some of the frames do not survive translation. The example sentences with frame changes went through an inter-annotator assessment, where the aim was to find more specific explanations for the loss of the original frame. The problematic samples were tagged with an appropriate keyword: idiom, syntax or semantics. Below are the criteria for each frame change subcategory:

  1. 1.

    The expression in the frame is idiomatic or has become a conventionalized metaphor. Metaphoric expressions often go through stages of conventionalized use before becoming idioms or idiomatic expressions, which may make it difficult to draw a line between compositional use of new meanings of the parts and a conventionalized metaphoric expression.

  2. 2.

    Syntactic reasons cause the change of the frame. This includes cases where the frame is tied to a word class, e.g. the frames Sounds and Make_noise, or the intransitivity/transitivity of the FEE, e.g. the frames Becoming_separated and Separating. Many frames in the original FrameNet have at least implicit restrictions on the word class of the FEE. For example the frame Aesthetics only includes estimates of appearance, which is typically expressed with an adjective, such as ‘ugly’ and ‘elegant’.

  3. 3.

    The original semantic content otherwise changes or does not come across as intended, thus needing a change of frame.

These subcases are hierarchically organized, i.e. a problem with an idiomatic expression overrides the other two problem types, because idiomatic use is the most specific category. However, often the reason for frame non-parallelism was not so easy to identify, as can be seen from the inter-annotator quantitative results. The results of the inter-annotator analysis are discussed in more detail in Sect. 5.1.

As stated above, the cases where the frame did not transfer from English to Finnish despite the translation being as accurate as possible and other frame elements being correct gave us an explanation of why some frames do not work similarly in the two languages. In reality, the causes of frame change are not mutually exclusive: a syntactic shift often affects semantics as well, idioms may have divergent syntactic structures and so on. Because of this, the selected category indicates the primarily identified (but not the only) motivation for a frame change. Some examples are open to various interpretations, which is also clearly seen in the inter-annotator evaluation results.

5.1 Inter-annotator evaluation results

In order to further inspect the reasons for frame non-parallelism, we conducted two experiments on the 3732 examples (4.2% of the total amount) in which frame non-parallelism had been detected. In the first inter-annotator experiment (Experiment 1), three annotators independently went through 2272 translated examples that had previously been tagged to indicate that the example sentence had lost its original frame during the translation process. The example sentences had been verified to be correctly translated into Finnish. The annotators could either tag the example with a more specific explanation for frame non-parallelism (a tag denoting idiomaticness, syntactic reason or semantic reason), or tag the example with some other previously used label, if the example was thought not to be an actual frame transfer problem. They could also remove the error labels altogether, if the example was thought to be valid after all. During the inter-annotator evaluation 52 examples were unanimously found to be correctly annotated after all, and in a further 112 examples two out of three annotators found them to be correctly annotated examples. In total, 164 examples were re-labeled as valid examples. 18 examples were found to be incorrectly translated into Finnish and were thus re-labeled, 9 examples were thought to be incorrectly annotated originally in English and 11 examples were dismissed due to a bad frame evoking element or frame element. Only in 47 examples were there no agreement on the primary reason for non-frame related rejection between annotators. This left 2023 examples that more than one annotator thought were correctly labeled as having a frame transfer problem, meaning they were non-parallel with the original English frame.

The examples that were non-parallel with the original frame were further divided as follows: out of 2023 examples labeled as non-parallel, 1056 were unanimously labeled as either semantics, syntax or idiom related, meaning all annotators agreed on the primary label. Out of these 1056 examples, 1003 were unanimously deemed to have a semantic non-parallelism leading to a change of frame, evidently making it the largest category. 45 examples were unanimously labeled with a syntactic reason for frame change, and only 8 examples were unanimously labeled as pure idioms. The unanimous inter-annotator agreement rate was 52.2%.

Out of the total of 2023 examples labeled with frame non-parallelism, 847 were majority-tagged, meaning that two out of three annotators agreed on the label. 688 examples were majority-tagged with semantic reason for non-parallelism. 59 examples were majority-tagged with syntactic reason. Two out of three annotators saw idioms as the reason for non-parallelism in 100 cases. This left us with 120 examples that were seen to be non-parallel but where the annotators could not agree on the primary reason for frame loss. It is often very difficult to draw the line between different categories and reasons for non-parallelism, for example idioms and more general semantic non-parallelism. Cyrus (2006: 1242) also agrees that finding objective criteria for semantic shifts is problematic. Padó and Lapata (2009: 313) note that frame disagreements often arise even within a single language when conducting inter-annotator analysis. Especially since we decided to treat pure idioms and other semantic non-parallelism as separate cases, the distinction was often rather challenging.

If one includes the majority cases in the inter-annotator agreement result, the total amount adds up to 1903. Out of 2023 examples with frame non-parallelism, the annotators could agree on the primary reason for frame loss in 1903 cases. Semantic non-parallelism was the primary cause in 1691 (83.6%) examples, syntactic reasons for non-parallelism were found in 104 (5.1%) examples and idioms in 108 (5.3%). The majority inter-annotator agreement rate for frame non-parallelism then becomes 94.0%. The figures for the first annotation experiment are presented in Table 2.

Table 2 Frame non-parallelism percentages in experiment 1

In the second experiment, 1460 additional translated examples were discussed between annotators in order to reach a consensus, and tags were successfully agreed for each. 100 examples were dismissed because of erroneous original annotation, and 94 because of errors in the Finnish translation. 6 examples had problems with the frame evoking element, and 11 with a frame element. There were 124 examples that after more thorough inspection were deemed to be valid examples. As for the examples that were deemed to have a problem with frame transferability, in total 1125 examples, the catch-all label ‘semantics’ again remained the most common reason for non-parallelism with 852 examples (75.7%). Syntactic reasons amounted to 178 examples (15.8%), and idioms to 95 (8.4%). The tendency to label mismatches as syntax or idiom-related was slightly higher when annotating simultaneously in a group. When asked why, the annotators said they felt reluctant to use a more specific label when annotating individually but in a group setting they were able to confirm each other’s views of idiomaticness and syntax; the higher rates of idioms, for instance, may also be explained with certain frames that incorporated more idiom-type FEEs being included in the second data set. The figures for the second annotation experiment are presented in Table 3. Table 4 summarizes the results from the two annotation experiments. The explanations for different reasons of frame non-parallelism are further analyzed in Sects. 5.2, 5.3, 5.4.

Table 3 Frame Non-parallelism Percentages in Experiment 2
Table 4 Frame non-parallelism percentages in experiment 1 and 2 in total

5.2 Idioms and idiomatic expressions

Categorized as formulaic language, an idiom’s figurative meaning is different from the literal meaning (McArthur and McArthur 1992). This is the same definition that is used in BFN, where an idiom is a construct that is a semantic unit whose meaning cannot be deduced from the actual meanings of its constituents (Ruppenhofer et al. 2010:31). Some idioms are transparent (Gibbs 1987), i.e. much of their meaning does get through if they are taken (or translated) literally. For example, lay one’s cards on the table meaning to reveal previously unknown intentions, or to reveal a secret is a transparent idiom as most cultures are familiar with card games.

In the ideal case, e.g. …. KICK THE BUCKET … is one FEE indicating the Death frame. However, in some example sentences the idioms have obviously been perceived as transparent in BFN by the annotators and parts of idioms have been annotated. In many of these cases, the translation of idiomatic phrases or structures caused a radical change in the sentence, thus leading to frame non-parallelism. Pure idioms, however, are not very common in the data. Only 8 (0.3%) examples were unanimously labeled as idioms in Experiment 1. If one includes the cases that were labeled as idioms by a majority of annotators (2 vs. 1), the total number is 107 examples (4.9% of data in Experiment 1). In Experiment 2, they made up 8.4% of non-parallel cases (95 examples). In total, then, idiom as a reason for frame non-parallelism appeared in 203 examples, totaling 6.4% of all non-parallels cases. As can be seen from the results, especially regarding the small number of unanimous labels for idiom, distinguishing between pure idioms and more general semantic non-parallelism is not straightforward.

It seems that the only cases when the frame changes completely, without a possibility to trace the two frames back into a more general, semantically wider frame higher in the frame hierarchy, arise when the frame changes because of idiomatic or metaphoric language use. This can be seen from Example 21 containing a variant of sweating blood and tears to signify making a substantial effort. One could perhaps argue that Excreting is the incorrect frame also in English for this metaphoric use, but this is the frame the BFN project systematically assigns to this particular use of sweating, i.e. it is not an individual annotation mistake in the BFN database:

(21)

[Excreter I] have SWEATED [Excreta blood] [Purpose to get him to see me at all]. (Excreting)

 

Olen

NÄHNYT

vaivaa

suostutellakseni

hänet

edes

 

be.1SG

see.PTCP.PST

trouble.PART

persuade.INF.TRANSL.1SG:POSS

he.ACC

even

 

tapaamaan

minut.

    
 

meet.INF.ILL

1SG.ACC

    

Padó (2007) and Cyrus (2006) call the phenomenon in Example 21 extreme non-parallelism or mutation. It happens in cases where the idiom exists only in the original language, not in the target one. According to Padó (2007: 6), mutation happens when the lexical meaning of the translation differs substantially from the original. Extreme idiosyncrasy includes instances when a lot of world knowledge is needed to interpret the situation, and which are difficult to classify in terms of a general set of lexical relations (ibid.). Idioms are therefore the most extreme case of mutation, possibly resulting in absolute non-parallelism where the frames in question have no common higher denominator.

Sometimes an idiomatic expression and its translation differ less radically. For example in Example 22 the expression oxygenstarvation would translate into Finnish as hapen-puute (‘lack of oxygen’), which can no longer be seen as a representation of the Death-frame.

(22)

The complication occurs during the actual process of birth, caused by difficulties in labour, or “anoxia”, known as [Explanation oxygen] STARVATION. (Death)

In Example 23, the idiomatic expression “murder a song” has an equivalent Finnish expression, but the Finnish idiom uses the verb pahoinpidellä, “batter”, instead of a verb fitting the Killing-frame.

(23)

[Killer somebody like Tommy Drennan] MURDER [Victim a good song] (Killing)

 

[Killer jonkun

Tommy

Drennanin

kaltaisen]

PAHOINPITELEVÄN

[Victim hyvän

laulun]

 

somebody.GEN

Tommy

Drennan.GEN

alike.GEN

batter.PTCP.PRS.GEN

good.ACC

song.ACC

In Examples 24–25, the English examples are rather well-established metaphors:

(24)

[Accoutrement BELTS] [Material of oak] (Accoutrements)

 

tammiALUEITA [= oak AREAS]

(Locale)

 

oak.area.PL.PART

 

(25)

[Accoutrement RIBBON] [Material of moonlight] (Accoutrements)

 

kuunSILTA [= BRIDGE of moon(light)]

(Roadways)

 

moon.GEN.bridge

 

Some of the examples can be translated literally, but there is a chance that the result is too obscure and will lose the original meaning (especially in cases where the idiom exists in English but not in Finnish). For instance, in Example 21 above, it would be possible to translate the expression sweatedblood literally as hikoillutverta, but the intended meaning (to do something with great effort) is not necessarily apparent to a Finnish speaker.

5.3 Syntax

Sometimes syntactic reasons cause the frame to change. This happens in (rather rare) cases where the frame is tied to a certain word class, like with the frames Sounds (FEE is a noun) and Make_noise (FEE is a verb), or verb valence requirement of the FEE, like with the frames Separating and Becoming_separated. Many frames in the original BFN have these kinds of restrictions. In Experiment 1, in 45 cases the frame change was due to syntactic reasons according to all three annotators of the project. In 59 examples, two out of three annotators agreed that the reason for frame change was syntactic. This amounts to 104 examples or 5.1% of all non-parallel cases in the first experiment. In Experiment 2, the consensus for syntactic reason for non-parallelism was reached in 178 cases (15.8%). In total, they make up 9.0% of all non-parallel frames. From this we see that syntactic frame non-parallelism is somewhat infrequent.

Especially intransitivity/transitivity distinctions are a major reason for frame changes. The problem is illustrated in Example 26, where in the Killing frame the FEE is a transitive verb requiring an agent or a cause and an undergoer. The literal translation which preserves the roles of cause and undergoer (Kymmenet tuhannet joutuivat aaltojen hukuttamaksi) is somewhat unnatural in Finnish. The transitive frame Killing usually changes into the intransitive frame Death when translated into Finnish, if the killer is inanimate.

(26)

[Victim Tens of thousands] were DROWNED [Cause by the waves] (Killing)

 

[Protagonist

Kymmenet

tuhannet]

HUKKUIVAT

[Explanation aaltoihin] (Death)

  

ten.PL

thousand.PL

drown.PST.3PL

           wave.PL.ILL

A second syntactic phenomenon causing frame non-parallelism is category change. In principle, category change, which leads to corresponding words belonging to different syntactic categories (Cyrus 2006: 1242), is unproblematic in frame semantic annotation (see for example Padó 2007). The requirement for two FEEs evoking the same frame is the ability to use the FEEs as semantic paraphrases of each other. Therefore, the FEEs do not have to belong to the same lexical category—it is the common cognitive frame that is the key. For example in the frame Commitment both nouns (oath, promise) and verbs (wow, promise) can be handled as frame evoking.

That said, there are some frames in BFN that are only differentiated by the word class of the FEEs, and these frames often proved to cause problems for frame parallelism. An example of such a differentiation would be the frames Sounds and Make_noise, where the nouns describing sounds (her GIGGLE) belong to the Sounds frame and the verbs describing the sound making (she GIGGLED) belong to the other frame. It can be argued that these sentences allow different frame elements: a predicate in the frame Make_noise can for example include an Addressee (She LAUGHED at me) and a description of a Sound can include the role Path (a HOWL across the air). It can be noted, however, that these frame elements or semantic roles are non-core elements and thus optional. Hence, they do not define the frame, and the sentence She giggled and the phrase Her giggle can be used to verbalize the same real-life event. FrameNet states that the differentiation is due to the fact that the nouns do not denote the sound emission or production. This is, however, only a question of point of view, since in the phrase her giggle the sound emitter is also present and it is a question of interpretation whether it denotes sound production. According to Fillmore and Baker (2010: 335) paraphrases belong to the same frame which leaves it open to debate whether Sounds and Make_noise should be separate frames in the first place.

The fact that the FEE of a frame sometimes needs to be of a certain word class, results occasionally in frame change, because the same meaning cannot be conveyed with a FEE of the same word class as the original English example. For example in the frame People_by_morality the target elements need to be words referring to certain kinds of people—thus the FEE needs to be a noun. However, when a corresponding noun does not exist in Finnish, the translation causes the word class to change into a nominalized adjective, as for instance in Example 27.

(27)

[Person She] is not an incompetent but a DEGENERATE. (People_by_morality)

 

[Evaluee

Hän]…

TURMELTUNUT. (Morality_evaluation)

  

she

corrupt

It is worth pointing out that the frame People_by_morality is in a Using relation with the frame Morality_evaluation, which means that they are closely related and there is clear correspondence in the frame elements.

As expected, in the frame Make_noise, the translations causes frame non-parallelism, because at times it is more natural to translate a verb with a noun, as in Example 28.

(28)

Like the sea it is a music primeval and here is no storm, only [Sound_source the silken waves] [Sound SOUGHING]. (Make_noise)

 

… [Sound_source

silkkisten

aaltojen]

HUMINA. (Sounds)

  

silken.PL.GEN

wave.PL.GEN

hum

Such cases are rare in our data as (1) most frames do not require a certain category for the FEE and (2) most of the time the translation does not require a category change. However, the Example 28 above demonstrates that the two frames could possibly be merged into one frame since the meaning of the sentence is the same despite of the word class of the FEE.

5.4 Semantics

The semantic reason for frame non-parallelism was by far the most common in the data. As stated above, in the first experiment, 1003 examples were unanimously labeled with a semantic explanation. If you include the majority (2 vs. 1) agreement result (688 examples), in total semantic non-parallelism in Experiment 1 was the primary cause in 1691 examples (83.6% of examples). In the second experiment, in 852 examples (75.7%) the reason for non-parallelism was deemed to be semantic. In total, 2543 examples or 80.8% of all non-parallel cases could be explained with semantics. We decided to treat a more general case of semantic non-parallelism as separate from so-called pure idioms. With semantic non-parallelism the meaning of the frame does not change completely (as in more idiomatic expressions), but it still leads to a frame change because some relevant aspect of the semantics of the original expression fails to appear in the translation. Following the classification of Padó (2007), this can be seen as modification, which assumes that lexical meaning is preserved to a large extent. According to Padó (2007: 5), frames can be parallel for “mild” cases of modification, while “serious” cases can result in non-parallelism.

The following examples demonstrate the problems with different semantic meaning of the sentences in the two languages, even when the translation is the closest possible with a matching number of frame elements. The translated sentences do not fit in the original frames even though the general semantic idea is still the same. The ‘semantics’ tag that we used for labelling this category of frame non-parallelism incorporated various kinds of shifts in the meaning that resulted in frame change. The ways in which these mismatches appeared are too numerous to discuss exhaustively, and the examples are meant to offer a mere snapshot of how the frame semantic content does not always survive the translation process when following a strict interpretation of semantic frames as presented in the BFN project.

(29)

[Agent an employee] has COMPLETED [Activity 20 years of service with the organization] (Activity_finish)

 

työntekijä

ON

ollut

töissä

20

vuotta (?)

 

employee

be.3SG

be.PTCP.PST

work.PL.GEN

20

year.PART

In Example 30, the verb functioning as the FEE of the sentence, to complete, is not usable in this context in Finnish. In Finnish, one does not complete years of service; instead the meaning is best expressed simply by saying something like “an employee has been working for 20 years”. The frame-relevant meaning of completion is thus left out, and its inclusion would require a more unnatural translation into Finnish.

(30)

[Shape_prop 4in] [Shape CUBES] [Substance of timber] (Shapes)

 

[Piece_prop neljän

tuuman] [Substance

puu][Piece PALIKKA] (Part_piece)

 

            four.GEN

inch.GEN

wood.block

In Example 30, the Finnish translation for cube, palikka, roughly means a brick or a block of some material. The actual word for cube, kuutio, is not very naturally utilizable to refer to a piece of wood in this way, and so the frame Shapes is no longer present.

(31)

had [Manner artificially] INFLATED [Item the price] (Cause_expansion)

 

olisi

[Manner

keinotekoisesti]

NOSTANUT

[Attribute

hintaa]

 

be.COND

 

artificially

raise.PTCP.PST

 

price.PART (Cause_change_of_position_on_a_scale)

In English, the verb inflate can be used metaphorically to refer to rising prices. A verb from the Cause_expansion frame is not available for this purpose in Finnish, and therefore in Example 31, the verb nostaa, to lift, is needed instead.

(32)

[Process This] was LAUNCHED [Place at the meeting]. (Launch_process)

 

[Work

Tämä]

JULKAISTIIN

[Place

kokouksessa].

(Publishing)

  

this

publish.PASS.PST

 

meeting.INE

 

In Example 32, the verb launched is used to designate a meaning such as to make public, publish, announce or the like. Again, Finnish does not have the option to use the equivalent verb to preserve the original frame. Instead, the more specific verb julkaista ‘to publish’ is preferable.

The Finnish translations emphasized language fluency, which often caused the semantic content to change slightly. The translators were not aware of the purpose of the translations, so the only instructions they were given were to aim for natural Finnish sentences and, whenever possible, to find an equivalent for each frame element, i.e. to aim for structural conformity. The advantage of this method includes the sentences being as natural as possible, which provides the opportunity to observe which frames truly transfer from English to Finnish with ease with all the frame elements included. If the translations had been made with the frame system in mind, it might have resulted in more unnatural example sentences and overly literal translations.

5.5 Easy versus difficult frames

There are frames that translate from English into Finnish more easily than others. In English, frames can be evoked by words in any of the major lexical categories of noun, verb, and adjective, as well as by adverbs and prepositions (Ruppenhofer et al. 2010: 35). With regard to frame example translations, the most unproblematic frames are the ones that typically contain a single noun as the FEE and the only core element. A good example is the frame Food, where the entire frame consists of some concrete food item (FEE) as in Example 33 and its possible attributes. Another similar case is the frame Clothing as in Example 34.

(33)

[Descriptor real] [Type cream] CHEESE(Food)

 

[Descriptor aito]

[Type kerma]JUUSTO

 

genuine

        cream.cheese

(34)

[Wearer his] [Descriptor ragged] CLOTHING (Clothing)

 

[Wearer

hänen]

[Descriptor

rähjäiset]

VAATTEENSA

  

his

 

ragged.PL

clothes.3:POSS

Most commonly, however, FEEs are verbs. In general, we can conclude that on a higher, more general semantic level in the frame hierarchy the translation process is relatively easy and the frames transfer from one language to another rather well. This is partly due to the fact that the frames higher up in the hierarchy cast a wider net in terms of the types of situations and phenomena conveyed, and are therefore more tolerant of a wider range of alternative translations. For instance, the frame Motion in Examples 35 and 36 includes all kinds of phenomena describing movement, and it incorporates FEEs such as slide, travel and zigzag.

(35)

[Theme The boat] GLIDED [Path underneath a bridge]. (Motion)

 

[Theme

Vene]

LIPUI

[Path

sillan

alle].

  

boat

glide.PST

 

bridge.GEN

under.ALL

(36)

[Theme He] MOVED [Goal over to the window]. (Motion)

 

[Theme

Hän]

SIIRTYI

[Goal

ikkunaan].

  

he

move.PST

 

window.ILL

It is on the lower levels of the frame hierarchy, the problems appear due to idiomatic expressions and other language-specific ways of expressing certain things. As Čulo (2013: 151) states, the diverging frames in the original sentence and in the translation can be related to the original frame by means of frame relations, therefore leading to an interpretation of the scenes consistent enough to be able to convey roughly the same message. Aside from clear idioms and other metaphoric uses of language, the general semantic idea is usually the same even though the translation would trigger a different frame. This can be seen as an argument in favour of the semantic frame universality hypothesis. As Padó (2007: 6) states, there seems to be a tendency in FrameNet to construct more fine-grained frames which require a higher degree of conceptual similarity, which leads to problems from a cross-lingual point of view, since it leads to a higher number of instances with frame non-parallelism. In short, the more fine-grained and specific the frames, the more difficult it is to reach perfect parallelism. This was clearly apparent in our research data as well.

Sometimes it is difficult to distinguish two specialized English frames from each other, when there exists only one way of expressing the general act or phenomenon in question in Finnish. This leads to the use of a more general frame, which is a case of modification (Padó 2007: 4), more specifically a case of generalization. This becomes a problem when trying to translate the more fine-grained frames, as in the following Examples 37–38 where the same semantic idea has been phrased differently in English. The change in the point of view can be difficult to adequately describe in the target language, when the original BFN frames denoting a start of a motion or a journey somewhat lose this meaning in the Finnish translation, resulting in more general frames of movement.

(37)

[Self_mover She] SET OFF [Direction towards the exit]. (Setting_out)

 

[Self_mover

Hän]

LÄHTI (= left)

[Direction kohti uloskäyntiä]. (Departing)

  

she

leave.PST

towards exit.PART

(38)

[Self_mover I]

smiled and HEADED OUT

[Path on the road]. (Getting_underway)

 

[Self_mover Minä]

SUUNTASIN (= navigated towards)

[Path tielle]. (Self_motion)

 

1SG

head.for.PST.1SG

        road.ALL

Some frames, such as Achieving_first and Successfully_communicate_message, have a translation in Finnish, but the exact semantic content is hard to keep intact. In Example 39, the English verb pioneer (to be the first to do or achieve something) and its Finnish translation, panna alulle (to start or to initiate something without necessarily achieving it first) are not quite the same semantically. The verb pioneer does not have a clear Finnish equivalent.

(39)

[Cognizer the APU] has

PIONEERED

[New_idea new testing methods]. (Achieving_first)

 

[CognizerAPU] on

PANNUT

ALULLE (= initiate, start) [New_ideauusia testausmetodeita]. (Activity_start)

 

APU be.3SG

put.PTCP.PST

beginning.ALL

In Example 40, the English verb convey carries the meaning of the addressee successfully understanding the meaning of the message the person communicating is trying to get through. This meaning is not apparent in the Finnish translation, which results in a more general frame.

(40)

[Communicator He] was admitted to Islington Infirmary, unable to speak, and yet desperately trying to CONVEY [Message a message] [Addressee to his daughter-in-law Margaret]. (Successfully_communicate_message)

 

[Communicator Hänet] …

KERTOA (= tell)

[Message viestiään]

 

                 he.ACC

tell.INF

           message.PART.3:POSS

[Addresseeminiälleen

Margaretille] (Telling)

            daughter-in-law.ALL.3:POSS

Margaret.ALL

Some frames lack a good Finnish equivalent altogether, as in Example 41. It turns out to be challenging to translate them into Finnish in a concise way that also preserves the semantic content relevant to the original frame. Therefore, no acceptable translated example exists in the data. The frame Rising_to_a_challenge is included in the list of 24 frames that were left without a good Finnish equivalent.

(41)

Yet [Protagonist they] ROSE [Activity to the occasion]. (Rising_to_a_challenge)

 

? [Protagonist He]

NOUSIVAT (= rose)

[Activity tilanteen

tasalle].

 

     they

rise.PST.3PL

          situation.GEN

level.ALL

The translators typically aimed to produce a functional equivalent of the original sentence, but the construct of the source language may not be available in the target language. The structural adaptations may lead to shifts on a grammatical level (see Sect. 5.3) leading to a category change or a slightly shifted meaning (see Sect. 5.4). Especially in the former case, the frames are often linked in some way via the frame hierarchy or frame relations (Čulo 2013). For example, Killing is the causative of Dying as in Example 26.

It would be interesting to see if the same problems with the same kinds of frames exist between other languages as well, i.e. are the problems tied to certain kinds of frames or phenomena. Additionally, it is also justifiable to ask whether all the very specific subframes in BFN are indeed relevant in the first place. As more frames are created in the original FrameNet project, the semantic division between frames and also inside the frame hierarchy becomes more and more fine-grained. To the extent that it promotes or defines inter-substitutability between expressions of a frame, increased granularity can of course still be defended.

6 Conclusions and further research

The aim of this project was to inspect how well semantic frames established in the BFN project transfer from English to Finnish. This is especially interesting since frame parallelism between unrelated languages has not been studied very extensively. In our data, 88% of the examples translated without problems into Finnish. Frame non-parallelism was caused by syntax, semantics or idiomatic expressions. At times, it was difficult to name the reason for non-parallelism, but in 94% of the non-parallel examples, the majority of the annotators agreed on a reason.

We also looked into the success of the annotation projection method and examined whether the English annotations could be projected on the Finnish translation. There were problems in the annotation projection of frame elements in at least 4% of the example sentences. In addition, some of the English frame evoking elements could not be translated into Finnish within the same frame. Despite the many interesting problematic cases, the projection still succeeded quite well between Finnish and English in the majority of cases. It would therefore be interesting to examine how well the method works between Finnish and its related languages. Finnish has several related under-resourced languages that could benefit from a functioning annotation projection method since these languages often lack resources that would allow costly manual annotation of large amounts of linguistic data.

Some of the problems we have encountered and discussed in this article could probably be applied to other language pairs as well. In particular, the data may be useful for developing automatic translation and semantic relations between other language pairs. Successful transfer of semantic frames between languages unrelated to each other has many important applications. One field of utilization for the FinnTransFrame corpus will be automatic translation development and evaluation. In particular, the FinnTransFrame corpus could be utilized to identify the cases where the frame needs to be modified or abandoned to achieve a natural translation result.

The Finnish annotated examples can also be used on their own. FrameNet data has been used for automatic semantic parsing (for example Das et al. 2010): automatically finding words that evoke frames, selecting correct frames for them and locating arguments for each frame. Semantically annotated data can be used for improving question answering, recognizing paraphrases and extracting information (Shen and Lapata 2007; Surdeanu et al. 2003). Encouraged by the success rate of the frame projection of the FrameNet annotation scheme, we have also applied FrameNet annotations to authentic Finnish sentences in a separate FinnFrameNet database project.