Some Key General Features of DICTION Software [Source: Murphy 2013, pp. 60–61, with some minor editing]
The program deploys 10,000 search words divided into 31 word lists … or dictionaries, compiled after the analysis of 20,000 texts of several different types. They contain individual words, not phrases. None of which are present in more than one dictionary. Homographs undergo statistical weighting procedures intended to correct for context.
The program’s main strength is that it analyzes texts on the basis of five master variables: activity, optimism, certainty, realism, and commonality. These are ‘created by combining (after standardization) the subaltern variables’ (Hart 2001, p. 45). They mostly represent semantic fields, such as praise, satisfaction, inspiration, blame, hardship, denial. Four of the variables included in the master variable calculations do not represent semantic fields: insistence (the degree to which a text relies on the repetition of lexical words), embellishment (ratio of descriptive to functional words), variety (type-token ratio), and complexity (word length).
According to Hart (2001, p. 43), the master variables, chosen deliberately, are the five elements that ‘provide the most robust understanding’ of a text. They are broadly defined as follows:
ACTIVITY: language featuring movement, change, the implementation of ideas, and the avoidance of inertia.
CERTAINTY: language indicating resoluteness, inflexibility, and completeness and a tendency to speak ex cathedra.
OPTIMISM: language endorsing some person, group, concept or event, or highlighting their positive entailments.
REALISM: language describing tangible, immediate, recognizable matters that affect people’s everyday lives.
COMMONALITY: language highlighting the agreed-upon values of a group and rejecting idiosyncratic modes of engagement.
… virtually no statistical relationship exists among the five variables, which means that each cluster sheds new and different light on the passage being examined’ (Hart 2001, p. 45).
DICTION calculates frequency scores for each variable and rates them as being within, above or below a normal range. This range is calculated on a text type which the researcher chooses as comparable to the one under analysis. There are six broad classes of text types: Business, Daily Life, Entertainment, Journalism, Literature, Politics and Scholarship. These classes are further subdivided into thirty-six individual text types, representing both speech and writing. These texts are not incorporated in the software; only the calculations for each variable in these texts are included in DICTION.
Although the initial impression may be one of rigidity, DICTION offers some user-controlled features. The program standardizes scores either on 500-word units, ignoring the remaining part of the text, or by segmenting the text into 500-word units and averaging the scores for each unit. …
Component Variables of the REALISM Master Variable [Source: Hart and Carroll 2013]
FAMILIARITY: Consists of a selected number of C.K. Ogden’s (1968) operation words which he calculates to be the most common words in the English language. Included are common prepositions (across, over, through), demonstrative pronouns (this, that), and interrogative pronouns (who, what), and a variety of particles, conjunctions, and connectives (a, for, so).
SPATIAL AWARENESS: Terms referring to geographical entities, physical distances, and modes of measurement. Included are general geographical terms (abroad, elbow-room, locale, outdoors) as well as specific ones (Ceylon, Kuwait, Poland). Also included are politically defined locations (county, fatherland, municipality, ward), points on the compass (east, southwest), and the globe (latitude, coastal, border, snowbelt), as well as terms of scale (kilometer, map, spacious), quality (vacant, out-of-the-way, disoriented), and change (pilgrimage, migrated, frontier.)
TEMPORAL AWARENESS: Terms that fix a person, idea, or event within a specific time interval, thereby signaling a concern for concrete and practical matters. The dictionary designates literal time (century, instant, mid-morning) as well as metaphorical designations (lingering, seniority, nowadays). Also included are calendrical terms (autumn, year-round, weekend), elliptical terms (spontaneously, postpone, transitional), and judgmental terms (premature, obsolete, punctual).
PRESENT CONCERN: A selective list of present-tense verbs extrapolated from C. K. Ogden’s list of general and picturable terms, all of which occur with great frequency in standard American English. The dictionary is not topic specific but points instead to general physical activity (cough, taste, sing, take), social operations (canvass, touch, govern, meet), and task performance (make, cook, print, paint).
HUMAN INTEREST: An adaptation of Rudolf Flesch’s notion that concentrating on people and their activities gives discourse a life-like quality. Included are standard personal pronouns (he, his, ourselves, them), family members and relations (cousin, wife, grandchild, uncle), and generic terms (friend, baby, human, persons).
CONCRETENESS: A large dictionary possessing no thematic unity other than tangibility and materiality. Included are sociological units (peasants, African Americans, Catholics), occupational groups (carpenter, manufacturer, policewoman), and political alignments (Communists, congressman, Europeans). Also incorporated are physical structures (courthouse, temple, store), forms of diversion (television, football, CD-ROM), terms of accountancy (mortgage, wages, finances), and modes of transportation (airplane, ship, bicycle). In addition, the dictionary includes body parts (stomach, eyes, lips), articles of clothing (slacks, pants, shirt), household animals (cat, insects, horse) and foodstuffs (wine, grain, sugar), and general elements of nature (oil, silk, sand).
PAST CONCERN: The past-tense forms of the verbs contained in the Present Concern dictionary.
COMPLEXITY: A simple measure of the average number of characters-per-word in a given input file. Borrows Rudolph Flesch’s (1951) notion that convoluted phrasings make a text’s ideas abstract and its implications unclear. (p. 8).