Using Common Book Groups

Donovan, Peter; Mack, John

doi:10.1007/978-3-319-08278-3_10

Peter Donovan³ &
John Mack⁴

1071 Accesses

Abstract

By late 1941, sufficiently many book groups in the JN-25B code book had been found to enable the compilation of an accurate list of those most commonly found in decrypted messages. The list would be regularly updated as more GATs were decrypted. A new method of attacking the problem of finding additive table entries was now based on the information in this list. This chapter describes the successive refinements of this method that included the application of Bayesian ideas in devising scoring systems for choosing the correct decryption from a small number of possibilities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The Lietwiler letter is to be found in NARA College Park in RG38, Crane Inactive Stations, 3200/11. It has been made widely accessible by being incorporated in the appendix to the 2001 Ottawa MA thesis of Timothy Wilford, Pearl Harbor Redefined: USN Radio Intelligence in 1941, which is included in the ProQuest electronic data base. Wilford later furthered his studies and obtained a PhD degree from that university. That thesis is also on ProQuest and is of interest. Some of the writings of Wilford’s thesis adviser, Brian Villa (see below), are also quite relevant. Duane Whitlock in an oral history interview with the National Cryptological Museum explains that Lietwiler was intended to be Fabian’s relief but Fabian refused to be relieved. So Fabian, no longer commandant, was evacuated from Manila ahead of Lietwiler and, reaching Melbourne first, became commandant of the relocated Sigint team.
This letter is mentioned by Wilford in Intelligence and National Security, December 2002 in Note 20 of his paper and by Phillip Jacobsen in Cryptologia, 27(3), July 2003, 193–205. Sadly Peter Donovan in his first paper on JN-25 (in Cryptologia 28(4), October 2004, 325–340) made no reference to it. The most recent paper about the revisionist line appears to be Signals Intelligence and Pearl Harbor: The State of the Question, by Brian Villa and Timothy Wilford. It appeared in the journal Intelligence and National Security, volume 21, issue 4, August 2006. John Zimmerman’s paper Pearl Harbor Revisionism: Robert Stinnett’s ‘Day of Deceit’ in Intelligence and National Security vol 17, June 2002, 127–146 should be studied very carefully before any conclusion is made about the letter. Chapters 9 and 10 of this book appear to give the first account of the technical matters being raised by Lietwiler.
2.
Other correspondence from Lietwiler and copied by Wilford (Note 1) give some hints about the JEEP IV machine. SRH-355 Naval Security Group History to World War II (NARA RG457) states on page 433: ‘Also reported was the arrival of LT Hess, a USNR officer, from the Navy Department, bringing a device known as JEEP IV. This evidently was some sort of mechanical device used for recovering additives (or subtractors) which formed the cipher key for encrypting messages in the Operations code (later JN-25).’
3.
Section 5.5 explains the extent to which the USN used ideas from the FECB.
4.
In an oral history interview with the National Cryptologic Museum, ‘Ham’ Wright says that in early December 1941 Rochefort’s Combat Intelligence Unit at Pearl Harbor had only the 100 most frequently used groups in order of frequency. Evidently the CIU had to use these to prepare its own table of differences.
5.
One could speculate on the extent to which the construction of the 24,000 cards was automated. Presumably Lietwiler considered (almost) halving the length of the table by using minor differences only to be a side issue not worth mentioning.
By July 1943 the preparation of tables of differences had been automated to a considerable extent. The GYP-1 Bible mentions on page 579 that this could be done using an NC3 and an NC4. These were specially adapted IBM equipment manufactured under secret contracts between IBM and the Navy Department. A description of their functions would be out of the scope of this book. However Canberra NAA file A425 C1947/514 1947–1949 entitled Hollerith tabulating equipment taken over by RAN from USN records that Frumel had its NC4.
Edward Simpson’s Bayes at Bletchley Park in Significance 7(2) 76–80 (May 2010) throws some extra light on Notes 4 and 5.
6.
Section 10.9 below explains how reliably found correct decryptions of greater depths (15 or more) may be used to test for correctness a potential decryption method for small (9 or less) depths.
7.
Edward Simpson confirms Lietwiler’s comment on pages 134–135 of his account in The Bletchley Park Codebreakers, the augmented second edition (2011) of Action This Day, edited by Ralph Erskine and Michael Smith.
‘Horizontal decryption’ or ‘horizontal stripping’ would also be useful when it was known that a particular message was likely to contain GATs which were encrypted from a short list of book groups.
8.
See Erskine and Smith (cited in Note 6 above), Appendix VII for ‘At one stage when we were struggling to keep up with the increasing quantity of incoming traffic Washington sent us a dozen or so calculating machines made by the National Cash Register Company’.
9.
A list of the principal errors made with JN-25A and JN-25B is given in Chap. 9. This error is the seventh in the list of seven given there.
10.
See page 56 of Holmes’ Double-Edged Secrets, Naval Institute Press, 1978.
11.
See Erskine and Smith (cited in Notes 6 and 7 above).
12.
Alexander’s report is HW 25/1 in the British National Archives and is available on the web.
13.
The reference is NARA RG38 Radio Intelligence Publication Collection boxes 169–172.

Author information

Authors and Affiliations

School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
Peter Donovan
School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia
John Mack

Authors

Peter Donovan
View author publications
You can also search for this author in PubMed Google Scholar
John Mack
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix 1 Minor Differences

Although this Appendix is particularly relevant to the interpretation of Lietwiler’s letter, it is generally applicable to additive cipher systems with groups of any fixed size. Indeed, the examples are 4-digit groups! No special role for scanning groups comes in. Section 12.2 examines the use of differences in attacking additive cipher systems in general.

There is a simple device that almost halves the number of differences to be stored. A group x is said to be minor if x ≤ −x. With a few exceptions, for groups p and q, exactly one of p −q and its negative q −p will be minor. As these differences occur with the same frequency, in counting frequencies it is sufficient to work with the minor differences. So for the groups:

Table 1

Full size table

the first 18 (ordinary type) are minor while their negatives, the next 18 (slanted type), are not. The groups made up of the digits 0 and 5 only are their own negatives and would all be taken as minor.

Appendix 2 Bayesian Inference

In this appendix ‘JN-25X’ denotes an additive system whose book groups are all 5-digit and scannable. The Allied cryptanalysts are assumed to have some statistics on the frequency of the 200 (say) most common book groups. A method is needed to determine whether a proposed decryption of a depth of intercepted GATs should be accepted. As Friedman wrote (quoted in Sect. 4.4):

Like the experimental scientist [the Cryptologist] is observing phenomena or occurrences to determine whether they are random or systematic.

So the cryptologist needed to use statistically sound methods that did not require computation other than the simplest arithmetic. Ideally the system being used would be based on assigning a ‘score’ to each of the 200 (say) most common book groups. Initially it is assumed that no ‘horizontal’ information is available and so the cryptologist should add up the scores in the decrypted depth and then check whether the total exceeds some preset threshold.

$$\begin{array}{l|l|l} \begin{array}{llr} \multicolumn{3}{c}{\mathbf{COUNTING}}\\ 20859\ \surd &\rightarrow & 1 \\ 91173\ \surd &\rightarrow & 0 \\ 76692\ \surd &\rightarrow & 0 \\ 14187\ \surd &\rightarrow & 1 \\ 58053\ \surd &\rightarrow & 0 \\ 23187\ \surd &\rightarrow & 1 \\ 13854\ \surd &\rightarrow & 1 \\ 03099\ \surd &\rightarrow & 0 \\ 63168\ \surd &\rightarrow & 0 \\ TOTAL& = & 4\\ \end{array} & \begin{array}{llr} \multicolumn{3}{c} {\mathbf{CRUDE}}\\ 20859\ \surd &\rightarrow &1 \\ 91173\ \surd &\rightarrow &0 \\ 76692\ \surd &\rightarrow &0 \\ 14187\ \surd &\rightarrow &3 \\ 58053\ \surd &\rightarrow &0 \\ 23187\ \surd &\rightarrow &1 \\ 13854\ \surd &\rightarrow &2 \\ 03099\ \surd &\rightarrow &0 \\ 63168\ \surd &\rightarrow &0 \\ TOTAL& = &7 \\ \end{array} & \begin{array}{llr} \multicolumn{3}{c}{\mathbf{ACCURATE}}\\ 20859\ \surd &\rightarrow & 6 \\ 91173\ \surd &\rightarrow & 0 \\ 76692\ \surd &\rightarrow & 3 \\ 14187\ \surd &\rightarrow & 23 \\ 58053\ \surd &\rightarrow & 0 \\ 23187\ \surd &\rightarrow & 9 \\ 13854\ \surd &\rightarrow & 14 \\ 03099\ \surd &\rightarrow & 5 \\ 63168\ \surd &\rightarrow & 0 \\ TOTAL& = & 60 \\ \end{array}\\ \end{array}$$

Three examples (out of many) of scoring systems are shown above. Each would have its own threshold score for accepting the proposed decryption. This would vary with the number of GATs in the original depth. The ‘counting’ system simply allocates one point for each GAT in the most frequent hundred. This is not particularly natural: nothing is special about 100. But it was used at least initially by the Rochefort team at Pearl Harbor in 1942. The ‘crude’ system is in fact less crude than the ‘counting’ one. Here 3 points are allocated for the ten most common, 2 points for the next 20 and 1 point for the next seventy. The ‘accurate’ system illustrates (with totally contrived scores) the Bayesian method developed by Alan Turing in 1940–1941.

The cryptologist using the ‘accurate’ system would have instructions of the type ‘for a depth of 9 the total score has to be at least 47’. So the decryption shown above should be accepted. As already mentioned, the ‘threshold’ 47 is determined experimentally using depths of 14 or more for which determining the correct decryption is much easier.

Any scoring system working on vertical evidence of an alignment being correct may be adapted to allow for bonus points being awarded for observed horizontal evidence. For example, if the previous column has 12345 in its ninth place and it is known that 63168 often follows 12345, supplementary rules might award bonus points to this decryption. Apparently no documentation has survived on how a numerical scoring system for evaluating vertical evidence may have accommodated scores for non-numerical horizontal evidence.

It is quite possible that Turing became aware that the ‘counting’ method was in use at FECB and started thinking about the general mathematical problem of designing an optimal scoring system. It is also quite possible that he realised from other problems that the general issue of designing scoring systems was of the greatest importance. By around August 1941 he wrote a general report on probabilistic methods in cryptology which may be read in conjunction with a key paper by his assistant, Jack Good.

The method amounts to choosing a constant K > 0 and then assigning the weight K log₁₀(p _g) to the book group g. Here p _g is chosen so that the book group g occurs on average p _g times in every 33,334 groups of decrypted traffic. Thus a group that occurs 7.3 times in every 10,000 decrypted groups occurs 24.334 times in every 33,334 groups. Tuing and Good found it convenient to take K = 20 and then round to the nearest integer. Here 20log₁₀(24. 334) = 20 × 1. 3862 is rounded and becomes 28. The unit of weight of evidence implicit in this was called the ‘halfdeciban’ or just ‘hdb’.

One evident merit of this system is that if p _g = 1, that is if g occurs with the average of the frequencies of the 33,334 scannable groups, then K log₁₀(p _g) = 0. The observation of such a g gives no evidence whatsoever as to whether a proposed decryption is correct.

In that era it was standard to use rounded base 10 logarithms to carry out multiplications. Here in effect the base 10 logarithms are being rounded to the nearest multiple of.05.

An interesting example of this method arises from the frequencies of the 26 letters in ordinary language with British spelling: These are displayed on a ‘per 10,000’ basis immediately below the letters in the table below:

Table 2

Full size table

The frequencies are then converted by multiplication by.0026 to a ‘per 26’ basis (not shown). The third line above gives the hdb scores obtained by multiplying the base 10 logarithms of the converted frequencies by 20 and rounding. These can be used to test, by counting the occurrences of the 26 letters in a string of possible text, whether it has the distribution that would arise from natural language. For example,

Table 3

Full size table

Appendix 3 Turing and Bayes

The following is taken from Hugh Alexander’s Cryptographic History of Work on the German Naval Enigma ^{Footnote 12} (GCCS, 1945):

Also in November [1942] Turing left the section for a visit to America. Although this did not mark the official end of his connection with the section he never did any more work in it and therefore it is a fitting place to recognize the great contribution that he made. There should be no question in anyone’s mind that Turing’s work was the biggest factor in Hut 8s success. In the early days he was the only cryptographer who thought the problem worth tackling and not only was he primarily responsible for the main theoretical work within the Hut (particularly the developing of a satisfactory scoring technique for dealing with Banburismus) but he also shared with Welchman and Keen the chief credit for the invention of the Bombe. It is always difficult to say that anyone is absolutely indispensable but if anyone was indispensable to Hut 8 it was Turing. The pioneer work always tends to be forgotten when experience and routine later make everything seem easy and many of us in Hut 8 felt that the magnitude of Turing’s contribution was never fully realized by the outside world.

Alexander previously gave due credit to the Polish cryptologists for their earlier work on Enigma. This of course included the exploitation of redundant encryption.

Edward Simpson’s paper Bayes at Bletchley Park explains the importance of Bayes in decrypting Enigma.

Turing then worked with the cryptanalytic research section of the US Naval Communications Intelligence Staff. A series of internal Cryptanalytic Research Papers ^{Footnote 13} has survived. The introduction to that series states:

RIP 450 is concerned mainly with the techniques themselves, while this series considers the cryptanalytic or mathematical theories which underlie the techniques. On the other hand machine research (from an engineering point of view) is not covered in this series. Some of the papers in this series are expository but most represent original work. It must always be borne in mind that we owe to the British the basic solution of the Enigma, and many of the basic subsidiary techniques, together with the underlying mechanical and mathematical theories. Much of what we call ‘original’ is only a retracing of steps previously taken by the British, and the Editor has striven to point this out in the Index. But there is also a great deal that extends or improves British methods, and some that strikes out in new directions.

The most sensational of these ‘new directions’ is the use of Hall weights (Sect. 15.3).

The 1941 report by Alan Turing on such methods ultimately led to mechanized applications of it, such as in Colossus. This was the machine that attacked the ‘Tunny’ encrypted teleprinter. Some quotations from the 1945 GCCS General Report on Tunny, by Jack Good, Donald Michie and Geoffrey Timms (British Archives HW25/4, available on the Web) are:

The fact that Tunny can be broken at all depends upon the fact that P, χ, Ψ′, K and D have marked statistical, periodic or linguistic characteristics which distinguish them from random sequences of letters.

The importance of exploiting statistical characteristics is noted in such sentences as:

First method, stage 1. Solution of $Z =\chi +D$.

Various χ-patterns (or settings) are tried mechanically and the correct one is distinguished by the statistical properties of Δ D.

Recall that ‘In order to break a machine cipher, two things are needed’.

The special case of Bayes’ Theorem $\mathbf{O}[H\vert E]/\mathbf{O}[H] = \mathbf{P}[E\vert H]/\mathbf{P}[E\vert \bar{H}] = f$ was first used in Bletchley Park by A. M. Turing. The fact that it was a special case of Bayes’ theorem was pointed out by I. J. Good. The great advance of Turing consisted of the invention and application of the deciban in Hut 8. Deciban is abbreviated to ‘db’. This is defined simply as 10log₁₀(f) where f is the factor defined above.

Turing’s use of the ‘needle in a haystack’ phrase turns up in this text:

In cryptography one looks for needles in haystacks and the object chosen has to have a large factor in favour of being a needle in order to overcome its prior odds. It will be observed that one could take a long time to find the needle if one could not estimate the factor very quickly—hence the necessity of machines in such problems.

It is worth noting here a few lines from Abraham Wald’s highly relevant paper Sequential Tests of Statistical Hypotheses published in the Annals of Mathematical Statistics 12 (2) (1945) pages 117–186 but written in 1943:

The National Defense Research Committee considered these developments sufficiently useful for the war effort to make it desirable to keep the results out of the reach of the enemy.

Appendix 4 Contrived Examples

List A immediately below contains 144 groups in 12 rows of 12:

Table 4

Full size table

If List A is considered as a depth of 144 GATs, an electronic calculation shows that there are 8363 decrypting groups. As an exercise, how may this be used to produce a depth of 8363 GATs with 144 decrypting groups?

Next, the first 9 rows (108 groups) of List A may be used as a new List B which has 9558 decrypting groups.

Alternatively, the last 9 rows (108 groups) of List A may be used as a new List C which has 9557.

The reader may wish to seek a depth of 96 with 9408 decryptions.

One may also consider List D, which consists of the 72 groups common to both Lists B and C. This, considered as a depth of 72, has 10,752 decrypting groups. List E, consisting of the 54 italicized groups in List A, has 12,288. Next, 18 more groups can be deleted from List E to yield List F:

Table 5

Full size table

List F has 36 GATs and 13,824 decrypting groups. List G, consisting of the 24 italicized groups of List F, turns out to have 15,552.

It is possible to use any of the above contrived examples to construct a totally non-historical JN-25 that resists some of the usual methods of attack. The use of such a system in the modern era cannot be recommended!

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Donovan, P., Mack, J. (2014). Using Common Book Groups. In: Code Breaking in the Pacific. Springer, Cham. https://doi.org/10.1007/978-3-319-08278-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-08278-3_10
Published: 09 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08277-6
Online ISBN: 978-3-319-08278-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Common Book Groups

Abstract

Access this chapter

Notes

Author information

Authors and Affiliations

Appendices

Appendix 1 Minor Differences

Appendix 2 Bayesian Inference

Appendix 3 Turing and Bayes

Appendix 4 Contrived Examples

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation