Keywords

1 Introduction

Non-linearity in cryptographic primitives is usually provided by so-called S-Boxes, functions which map a few inputs bits to a few output bits and which are often specified as look-up tables. These have been a topic of intensive research since their properties are crucial for resilience of a cipher against differential [13] and linear [4, 5] attacks. Further, the structure or the method used to build the S-Box can provide other benefits.

Indeed, the structure of an S-Box can be leveraged for instance to improve the implementation of a primitive using it. The hash function Whirlpool [6] and the block ciphers Khazad [7], Fantomas, Robin [8] and Zorro [9] among others use \(8 \times 8\) bits S-Boxes built from smaller \(4 \times 4\) ones, since storing several \(4 \times 4\) permutations as tables of 16 4-bits nibbles is more memory efficient than storing one \(8 \times 8\) permutation as a table of 256 bytes. Except for implementation advantage, knowledge of the internal structure helps to produce more efficient masked implementations against side-channel attacks, a notable example here being the AES [10] with its algebraic S-box based on a power function.

In some cases the design process of an S-Box might be kept secret for the purpose of implementing white-box cryptography, as described e.g. in [11]. In this paper, Biryukov et al. describe a memory-hard white-box encryption scheme based on a Substitution-Permutation Network where the S-Boxes are very large and are built using a so-called ASASA or ASASASA structure where “A” denotes an affine layer and “S” a non-linear S-Box layer. Preventing an adversary from decomposing these S-Boxes into their “A” and “S” layers is at the core of the security claims for this scheme.

Moreover such memory-hard white-box implementations with hidden structure of components can be of use in crypto-currencies, for example in cases where an entity is interested in issuing a crypto-currency of its own. One of the dangers is that powerful adversaries may launch a 51 % attack taking control of the mining process. Memory hard S-Boxes with hidden structure can offer a distinct advantage in such setting since efficient implementation of the proof-of-work function may be kept secret by the owners of the currency.

Examples of algorithms for which the components are known but the rationale behind their choice is not (at least at the time of release), are the block ciphers designed by or with the help of the US National Security Agency (NSA), namely the DES [12], Skipjack [13], SIMON and SPECK [14] (the last two do not use S-Boxes though). Although the design criteria for the S-Boxes of DES were later released [15] they were kept secret for 20 years in order to hide the existence of differential cryptanalysis, a technique only known by IBM and NSA at the time. Skipjack also uses an S-Box, denoted F, which is a permutation of \(\{ 0,1 \}^{8}\). However, nothing was known so far about how this S-Box was chosen.

Our Contribution. Different methods can be used to recover the hidden structure of an S-Box. We propose that a cryptanalyst follows the strategy given below to try and decompose an unknown S-Box S:

  1. 1.

    Draw the “Pollock” visual representation of the LAT and DDT of S (see Sect. 4).

  2. 2.

    Check whether the linear and differential properties of S are compatible with a random function/permutation (see Sect. 2).

  3. 3.

    Compute the signature \(\sigma (S)\) of S.

  4. 4.

    If \(\sigma (S)\) is even, you may:

    1. (a)

      Try an attack on SASAS [16],

    2. (b)

      Try to distinguish S from a Feistel Network with XOR, using the distinguishers in [17],

    3. (c)

      If one of the Feistel Network distinguishers worked, run DecomposeFeistel \((S,R, \oplus )\) for an appropriate R (see Sect. 3.2).

  5. 5.

    Regardless of \(\sigma (S)\), run DecomposeFeistel \((S,R, \boxplus )\) for \(R \in [2, 5]\) (see Sect. 3.2).

  6. 6.

    Regardless of \(\sigma (S)\), run BreakArithmetic(S) (see Sect. 3.1).

We study in Sect. 2 the seemingly average linear properties of F. After a careful investigation and despite the fact that these properties are not impressive, we show that the probability for a random permutation of \(\{ 0,1 \}^{8}\) to have linear properties at least as good as those of F is negligible. This implies three things. First, F was not chosen uniformly at random. Second, F is very unlikely to have been picked among random candidates according to some criteria. Third, the method used to build it improved the linear properties. We also provide a candidate algorithm which can be used to generate S-Boxes with very similar differential and linear properties.

In Sect. 3 we consider a general problem of decomposition of an S-box with hidden structure and describe two algorithms which can be used to decompose S-Boxes based on: (a) multiple iterations of simple arithmetic operations (for ex. like those found in a typical microprocessor) and (b) Feistel Networks with up to five independent rounds. The first algorithm is an optimised tree-search and the second one involves a SAT-solver.

Finally, we show in Sect. 4 how visual representations of the difference distribution table (DDT) or the linear approximation table (LAT) of an S-Box can help a cryptographer to spot non-randomness at a glance. As a bonus, we present an algorithm which generates non-bijective S-Boxes such that large set of entries in their DDT are set according to the designer’s choices. We illustrate it by embedding images in the visual representation of the S-Box’s DDT.

2 Partially Reverse-Engineering the S-Box of Skipjack

2.1 Overview of the S-Box of Skipjack and Useful Definitions

Skipjack is a block cipher with a block size of 64 bits and key size of 80 bits. The interested reader may refer to the official specification [13] or to the best attack on the cipher [18], an impossible differential attack leveraging its particular round structure. Further analysis trying to discover the design criteria of Skipjack is given in [19, 20].

Skipjack’s specification contains and 8\(\,\times \,\)8 bit bijective S-box which is called “F-Table” and which is given as a lookup table (we list it in the Appendix A). In order to study it we need to introduce the following concepts.

Definition 1

(Permutations Set). We denote \(\mathfrak {S}_{2^{n}}\) the set of all the permutations of \(\{ 0,1 \}^n\).

Definition 2

(Difference Distribution Table). Let \(s : \{ 0,1 \}^n \rightarrow \{ 0,1 \}^n\) be a function. Its difference distribution table (DDT) is a \(2^n \times 2^n\) matrix where the number at line i and column j is

$$\begin{aligned} d_{i,j} = \#\{ x \in \{ 0,1 \}^n\;\, | \;\, s(x \oplus i) \oplus s(x) = j \}. \end{aligned}$$

The maximum coefficient in this table (minus the first line and column) is the differential uniformity of s which we denote \(\varDelta (s)\): \(\varDelta (s) = \max _{i>0, j>0}(d_{i,j})\).

Differential cryptanalysis relies on finding differential transitions with high probabilities, i.e. pairs (ab) such that \(s(x \oplus a) \oplus s(x) = b\) has many solutions which is equivalent to \(d_{a,b}\) being high. Therefore, cryptographers usually attempt to use S-Boxes s with as low a value of \(\varDelta (s))\) as possible. A function differentially 2-uniform, the best possible, is called Almost Perfect Nonlinear (APN). The existence of APN permutations of \(GF(2^n)\) for even n was only proved recently by BrowningFootnote 1 et al. [21] in the case \(n=6\), while the case \(n=8\) and beyond still remains an open problem. Hence, the differential uniformity of the S-Boxes of the AES [10] and of most modern S-Box based ciphers is equal to 4.

The distribution of the coefficients in the DDT of Skipjack is summarized in Table 1 along with the theoretical distribution identified in [22] for a random permutation of \(GF(2^8)\). As we can see it is differentially 12-uniform, the same as you would expect from a random permutation, which is surprising since minimizing the differential uniformity is usually one of the corner stones of provable resilience against differential attacks.

Table 1. Distribution of the coefficients in the DDT of F.

We briefly mention the linear properties of F before studying them thoroughly in Sect. 2.2. In particular, we define the Linear Approximations Table of an S-Box.

Definition 3

(Linear Approximations Table). Let \(s : \{ 0,1 \}^n \rightarrow \{ 0,1 \}^n\) be a function. Its linear approximations table (LAT) is a \(2^n \times 2^n\) matrix where the number at line i and column j is

$$\begin{aligned} c_{i, j} = \#\{ x \in \{ 0,1 \}^n \;\,| \;\,x \cdot i = s(x) \cdot j \} - 2^{n-1} = \frac{1}{2} \sum _{x \in \{ 0,1 \}^{m}} (-1)^{i \cdot x \oplus j \cdot s(x)} \end{aligned}$$

with “\(\cdot \)” denoting the scalar product. The maximum absolute value of the \(c_{i, j}\) is the linearity of s, \(\varLambda (s)\), where \(\varLambda (s) = \max _{i>0, j>0}(| c_{i, j} |)\).

The quantity \(c_{i, j}\) has different names in the literature. It is called “bias” or “Imbalance” of the Boolean function \(x \mapsto i \cdot x \oplus j \cdot s(x)\) in, for example, [22]. In papers from the Boolean functions community, it is more often defined in terms of Walsh Spectrum, the Walsh Spectrum of a Boolean function being the multiset \(\{ c_{i, j} / 2 \}_{i \ge 0, j \ge 0}\). The maximum coefficient in the LAT of F is \(\varLambda (F) = 28\) and it occurs in absolute value 3 times.

For the sake of completeness, we also give the sizes of the cycles in which F can be decomposed: 2, 10, 45, 68, 131.

2.2 The Linear Properties are Too Good to be True

Figure 1 contains the distribution of the value of the coefficients of the LAT (minus the first line and column) along with the theoretical proportions for a random permutation of \(GF(2^8)\) described below.

Fig. 1.
figure 1

Coefficients of the LAT of F, random permutations and some outputs of Improve-\(R(s)\).

The probability distribution for the coefficients \(c_{i, j}\) in the LAT of a permutation of \(\mathfrak {S}_{2^{n}}\) is described in [23]:

$$\begin{aligned} P[c_{i, j} = 2z] = \frac{{2^{n-1} \atopwithdelims ()2^{n-2} + z}^2}{{2^n \atopwithdelims ()2^{n-1}}}. \end{aligned}$$

Using Sect. 3.4 of [22], we derived that \(\varLambda (s)\) has a mean over all permutations \(s \in \mathfrak {S}_{2^{8}}\) of approximately 34.8 which is notably larger than for F since \(\varLambda (F)~=~28\).

Given the probability distribution of the coefficients of the LAT, it is easy to compute the probability that \(\varLambda (f) \le 28\) assuming that f is a permutation chosen uniformly at random and that the coefficients’ values correspond to independent sample of the same distribution. Note that there are only \((2^8-1)^2\) such trials because the first line and column are ignored here.

$$\begin{aligned} P[\varLambda (f) \le 28] = \Big ( \sum _{j=-14}^{14} P[c_{i, j} = 2j]) \Big )^{(2^8-1)^2} \approx 2^{-25.62}. \end{aligned}$$

This probability is low but it would be feasible to generate a set of about \(2^{26}\) random permutations from \(\mathfrak {S}_{2^{8}}\) and compute the LAT for each of them. In such a set, the best S-Box s should verify \(\varLambda (s) = 28\). However, we must also take into account that in order to resist linear cryptanalysis it is not only best to have a low maximum value, it is also better to have a low number of occurrences of it. In this regard, F and its only three occurrences of 28 could almost be considered as having a maximum value of 26 for which \(P[\varLambda (f) = 26] = 2^{-66.4}\).

More rigorously, we compute the probability to have at most q coefficients equal to 28 in the LAT of a permutation picked uniformly at random from \(\mathfrak {S}_{2^{8}}\). If we let \(p(2i) = P[c_{i, j} = 2i]\), then this probability is equal to \(P_{28, q}\) where

$$\begin{aligned} P_{28, q} = \sum _{j = 0}^{q} \Big [ {(2^8-1)^2 \atopwithdelims ()j} \big ( p(28) + p(-28) \big )^{j} \Big ( \sum _{k = -13}^{13} p(2k) \Big )^{(2^8-1)^2 - j} \Big ]. \end{aligned}$$

Unsurprisingly, we find that this probability is equal to \(2^{-66.4}\) for \(q=0\), i.e. the probability to have \(\varLambda (s) \le 26\). It also converges to \(2^{-25.6} = P[\varLambda (s) \le 28]\) when q increases. For \(q=3\), the case of Skipjack’s F, we find:

$$\begin{aligned} P_{28,3} = 2^{-54.4}. \end{aligned}$$

The probability for a random permutation to have linear properties comparable to those of Skipjack’s F is thus at most \(2^{-54.4}\). Hence, we claim:

  • F was not chosen uniformly at random in \(\mathfrak {S}_{2^{8}}\),

  • the designers of Skipjack did not generate many random permutation to then pick the best according to some criteria as they would need to have generated at least about \(2^{55}\) S-Boxes,

  • the method used to build F improved its linear properties.

2.3 A Possible Design Criteria

We tried to create an algorithm capable of generating S-Boxes with linear and differential properties similar to those of F. It turns out that such an algorithm is easy to write. First, we introduce a quantity we denote \(R(f)\) and define as follows:

$$\begin{aligned} R(f) = \sum _{\ell \ge 0} N_{\ell } \cdot 2^{\ell }, \end{aligned}$$

where \(N_{\ell }\) counts coefficients with absolute value \(\ell \) in the LAT of f: \(N_{\ell } = \#\{ c_{i, j} \in (\text {LAT of f}), |c_{i, j}| = \ell \}\).

Algorithm 1 starts from a random permutation s of \(\mathfrak {S}_{2^{8}}\) and returns a new permutation \(s'\) such that \(R(s') < R(s)\) and such that \(s'\) is identical to s except for two entries x and y which are swapped: \(s'(x) = s(y)\) and \(s'(y) = s(x)\). It works by identifying one of the highest coefficient in the LAT, removing it through swapping two entries and checking whether \(R(s)\) was actually improved. This algorithm can be used in two different ways: either we keep iterating it until it reaches a point at which no swap can improve \(R(s)\) or we stop as soon as \(R(s)\) is below an arbitrary threshold.

figure a

We implemented both variants. For the second one, we stop when \(R(s) < 10^{10}\) because \(R(F) \approx 10^{9.92}\). We denote \(N_{\ell }\) the average number of coefficient with absolute value \(\ell \) in the LAT or the DDT of the S-Boxes obtained. For the LAT, \(\log _{2}(N_{\ell })\) is given in Table 2 and in Fig. 1; for the DDT it is in Table 3. “Random” corresponds to the average over 200 S-Boxes picked uniformly at random in \(\mathfrak {S}_{2^{8}}\); “F” to the distribution for the S-Box of Skipjack; “F-like” to the average over 100 S-Boxes obtained using Improve-\(R()\) and stopping when \(R(s) < 10^{10}\); “best” to the average over 100 S-Boxes obtained using Improve-\(R()\) and stopping only when it fails.

Table 2. Distribution of \(\log _{2}(N_{\ell })\) in the LAT of different S-Boxes.
Table 3. Distribution of \(\log _{2}(N_{\ell })\) in the DDT of different S-Boxes.

Using Improve-\(R()\) with an appropriate threshold allows us to create S-Boxes with both linear and differential properties very close to F. However, in order to achieve this, we need to choose a threshold value computed from F and which does not correspond to anything specific. In fact, to the best of our knowledge, the quantity \(R(s)\) does not have any particular importance unlike for instance the linearity \(\varLambda (s)\). Still, replacing \(R(s)\) by the linearity \(\varLambda (s)\) or a pair \((\varLambda (s), \#\{ (i,j), c_{i, j} = \varLambda (s) \})\) yields S-Boxes which are very different from F. Such S-Boxes indeed have a value of \(N_{\varLambda (s)-2}\) much higher than in the random case, which is not the case for F.

While our definition of \(R(s)\) may seem arbitrary, it is the only one we could find that leads to linear properties similar to those of F. For instance it may have been tempting to base \(R(s)\) on the square of \(\ell \) which is used when computing the correlation potential of a linear trail, a quantity useful when looking for linear attacks. We would thus define \(R(s) = \sum _{\ell \ge 0} N_{\ell } \ell ^{2}\). However this quantity is worthless as an optimization criteria since it is constant: Parseval’s equality on the Walsh spectrum of a Boolean function imposes that the sum of the \((c_{i, j})^{2}\) over each column is equal to \(2^{2n-2}\).

To conclude: we have found new non-random properties of the S-box of Skipjack which are improving its strength against linear cryptanalysis and we developed and algorithm which could be used to generate such S-boxes.

2.4 Public Information About the Design of Skipjack

The only information indirectly published by the NSA on Skipjack corresponds to an “Interim Report” [24] written by external cryptographers and it contains no information on the specifics of the design. The most relevant parts of this report as far as the S-Box is concerned are the following ones.

SKIPJACK was designed to be evaluatable [...]. In summary, SKIPJACK is based on some of NSA’s best technology. Considerable care went into its design and evaluation in accordance with the care given to algorithms that protect classified data.

Furthermore, after the “leakage” of an alleged version of Skipjack to usenetFootnote 2, Schneier replied with a detailed analysis of the cipher [26] which contained in particular the following quote indicating that the S-box was changed in August 1992.

The only other thing I found [through documents released under FOIA] was a SECRET memo. [...] The date is 25 August 1992. [...] [P]aragraph 1 reads:

  1. 1.

    (U) The enclosed Informal Technical Report revises the F-table in SKIPJACK

  2. 2.

    No other aspect of the algorithm is changed.

Note also that the first linear cryptanalysis of DES [4] had not been published yet in August 1992 when the F-Table was changed. Gilbert et al. suggested at CRYPTO’90 [27] to use linear equation to help with key guessing in differential attack to attack FEAL. This block cipher was later attacked at CRYPTO’91 [28] and EUROCRYPT’92 [29] using directly some linear equations involving plaintext, ciphertext and key bits. We can but speculate about a connection between these papers and the change of S-Box of Skipjack.

3 Algorithm Decomposing Particular Structures

A powerful tool able to discard quickly some possible structures for an S-Box is its signature, as shown in Lemma 1.

Definition 4

(Permutation Signature). A permutation s of \(\{0,1\}^n\) has an odd signature if and only if it can be decomposed into an odd number of transpositions, a transposition being a function permuting two elements of \(\{0,1\}^n\). Otherwise, its signature is even.

The signature of \(f \circ g\) is even if and only if f and g have the same signature.

Lemma 1

The following \(b \times b\) permutations always have an even signature:

  • Feistel Networks using XOR to combine the output of the Feistel function with the other branch,

  • Substitution-Permutation Networks for which the diffusion layer is linear in \(GF(2)^b\) or can be decomposed into a sequence of permutations ignoring a fraction of the internal state.

Proof

Let b be the block size of the block ciphers considered. The proof for the case of Feistel Networks with XOR can be found in [30].

Let us look at substitution permutation networks. An S-Box layer consists in the parallel application of several invertible S-Boxes operating on n bits, with n dividing b. This operation can be seen as the successive application of the S-Box on each n bit block, one after another. Such an operation ignores \(2^{b-n}\) bits, meaning that its cycle decomposition consists in \(2^{b-n}\) replicas of the same set of cycles. Since \(2^{b-n}\) is even, the application of each S-Box is even; which in turn implies that the successive application of the S-Box on each block is even. More generally, any permutation which can be decomposed into a sequence of sub-permutations ignoring a fraction of the internal state is even. The fact that permutations linear in \(GF(2)^b\) are even is showed in the proof of Lemma 2 in [31].    \(\square \)

The restriction put on the diffusion layer of SPN’s is usually not important, e.g. the diffusion layer of the AES fits the requirement. However, for small block sizes, it must be taken into account.

So far, we have proved that F has been built in contrast to being picked out of a set of random S-Boxes according to some criteria. The signature of F is odd so Lemma 1 implies that F cannot be a Feistel Network with XOR. The generic attack on the SASAS structure [16] fails on F, meaning that it is not a simple SPN either. Finally, F is not affine equivalent to a monomial of \(GF(2^n)\) like for instance the S-Box of the AES. Indeed, such functions have the same coefficients in the lines of their DDT, only the order is different. This observation lead to the definition of the differential spectrum by Blondeau et al. [32]. It also implies that, for a monomial, the number of coefficients equal to d in its DDT must divide \(2^n-1\). As it is not the case for F, we can also rule out this structure.

However, this is not sufficient to conclude that F does not have a particular structure. It could be based on simple operations such as rotations, addition modulo \(2^{n}\) and multiplication available in a typical microprocessor (thus offering the designer a benefit of memory-efficient implementation) or on a Feistel Network which uses modular addition to combine the output of the Feistel function with the other branch. We study these two possibilities in this section by first describing an algorithm capable of decomposing S-Boxes built from multiple simple arithmetic operations and then by presenting a new attack recovering all Feistel functions of a small Feistel Network of up to 5-rounds regardless of whether XOR or modular addition is used.

The purpose of the algorithms we present in this section can be linked to the more general Functional Decomposition Problem (FDP) tackled notably over two rounds in [33]. In this paper, Faugère et al. introduce a general algorithm capable of decomposing \(h = (h_1,...,h_u)\) into \(\big (f_1(g_1,...,g_n), ..., f_u(g_1,...,g_n) \big )\) where the \(h_i\)’s, \(f_i\)’s and \(g_i\)’s are polynomials of n variables. The time complexity of this algorithm (see Theorem 3 of [33]) is lower bounded by \(\text {O}\big (n^{3 \cdot (d_f d_g - 1)}\big )\) where \(d_f\) (respectively \(d_g\)) is the maximum algebraic degree of the \(f_i\)’s (respectively the \(g_i\)’s). Note that this lower bound on the time complexity is not tight. In fact, the ratio n / u of the number of input variables over the number of coordinates of h is also of importance, the lower being the better.

3.1 Iterated Simple Arithmetic Permutation

A plausible assumptions for an efficient yet compact S-box design is that the S-box is constructed using a formula containing basic instructions available in the microprocessor. Indeed, a simple code:

figure b

generates an S-box which may have a differential uniformity better than Skipjack’s F’s for a proper choice of constants abcd and e.

We introduce BreakArithmetic(s), an optimized tree-search capable of recovering the simple operations used to create such an S-Box constructed as an arbitrary sequence of basic processor instructions. It is based on the following observation. Suppose that \(s = \phi _{r} \circ ... \circ \phi _{1}\), where the \(\phi _{i}\)’s are one of the following algebraic operations: constant XOR, constant addition modulo \(2^{n}\), multiplication by a constant modulo \(2^{n}\) and bit rotation by a constant. Then \(s \circ \phi _{1}^{-1} = \big ( \phi _{r} \circ ... \circ \phi _{1} \big ) \circ \phi _{1}^{-1} = \phi _{r} \circ ... \circ \phi _{2}\), meaning that \(s \circ \phi _{1}^{-1}\) is “less complex”, “closer from the identity” than s itself. The aim of this algorithm is to peel of the \(\phi _{i}\)’s one after another by performing a tree-search among all possible simple operations which selects operations to consider first based on how closer they get us to the identity.

In order for this to work, we need to capture the concept of “distance to the identity” using an actual metric which can be implemented efficiently. We chose to base this metric on the DDT since it is less expensive to compute than the LATFootnote 3. We define the following metric: \(M(s) = \sum _{\ell \ge 2}N_{\ell } (\ell - 2)^{2}\). Our tree-search privileges candidates \(\phi _{1}\) such that \(M(s \circ \phi _{1}^{-1})\) is closer from \(M(\text {Id})\), where \(\text {Id}\) is the identity function.

Our implementation of this algorithm is for example capable of recovering the decomposition of \(s : x \mapsto \psi \big ( \psi \big ( \psi (x) \big ) \big )\) with \(\psi : x \mapsto 0xa7 \cdot \big ( (3 \cdot x \oplus 0x53) >>> 4 \big ) \oplus 0x8b\). However, our algorithm could not find any such decomposition for Skipjack’s F despite running for 96 hours on a CUDA computer with more than 1000 cores for fast computation of the DDT.

3.2 Decomposing Feistel Structures

Another possible structure for F which is compatible with its having an odd signature is a Feistel Network where the XOR is replaced by a modular addition. In this section, we describe an algorithm which uses a SAT-solver to recover the Feistel functions of small Feistel Networks which use either XOR or modular addition. We describe below the key idea of this attack, namely the encoding of the truth table of each Feistel function using Boolean variables and then how we can use this encoding to actually decompose a small Feistel Network.

Methods to distinguish Feistel Networks from random permutations have been actively investigated, notably in the work by Luby and Rackoff [34] as well as by Patarin [35, 36]. Here, we present a method which goes beyond distinguishing: it actually recovers all the Feistel functions for up to 5-rounds of Feistel Networks with low branch width.

Encoding of the Feistel Function. Let \(f : \{ 0,1 \}^{n} \rightarrow \{ 0,1 \}^{n}\) be an unknown function. We associate to each of its output bits i on each possible input x a unique variable \(z_{i}^{x}\). The truth-table of f is thus as shown in Table 4 for \(n=3\). We encode the fact that a vector of Boolean variables \(y_{i}, i \in [0, n-1]\) is the output of f given input variables \(x_{i}, i \in [0,n-1]\) using the truth-table of f by building a CNFFootnote 4 involving \(\{ x_{i} \}_{i < n}, \{ y_{i} \}_{i < n}\) and \(\{ z_{i}^{x} \}_{i < n, ~x < 2^{n}}\) which is true if and only if \((y_{n-1},...,y_{0}) = f(x_{n-1}, ..., x_{0})\).

Table 4. The variables used to encode an unknown function \(f : \{ 0,1 \}^{3} \rightarrow \{ 0,1 \}^{3}\), where (\(y_{2},y_{1},y_{0}) = f(x_{2},x_{1},x_{0})\).

We denote \(\text {bit}_{i}(b)\) the i-th of the binary expansion of any integer \(b < 2^{n}\) in little-endian notation so that \(b = \sum _{i < n} \text {bit}_{i}(n)2^{n-i}\). We also denote \(a^{1}\) the variable a itself and \(a^{0}\) its negation. The procedure used to build this CNF is based on the following implication: if \(\{ x_{i} \}_{i<n}\) corresponds to the binary expansion of an integer \(x < 2^{n}\) and \(\{ y_{i} \}_{i<n}\) to the binary expansion of the integer \(y = f(x)\), then \(y_{i} \oplus z_{i}^{x} = 0\) for all \(i < n\). Using the notations we just introduced, this idea can be written as n implications, the conjunction of which for \(j < n\) must hold:

$$\begin{aligned} \big ( \bigwedge _{i<n} x_{i}^{\text {bit}_{i}(x)} \big ) \implies \big ( y_{j} \oplus z_{j}^{x} = 0\big ). \end{aligned}$$

Each of these can be turned into a CNF made of two clauses using that \((a \implies b) \equiv (a^{0} \vee b^{1})\), that \((a \oplus b = 0) \equiv \big ( (a^{1} \vee b^{0}) \wedge (a^{0} \vee b^{1})\big )\) and basic linear algebra as follows:

$$\begin{aligned} \Big ( \big ( \bigvee _{i<n} x_{i}^{1-\text {bit}_{i}(x)} \big ) ~\vee ~ y_{j}^{1} \vee z_{j}^{0} \Big ) \wedge \Big ( \big ( \bigvee _{i<n} x_{i}^{1-\text {bit}_{i}(x)} \big ) ~\vee ~ y_{j}^{0} \vee z_{j}^{1} \Big ). \end{aligned}$$

If we concatenate the CNF generated in this way for all values of \(x < 2^{n}\), we obtain a CNF which we denote “\(\text {CNF}\big (f, \{ x_{i} \} , \{y_{i}\} \big )\)” with \(2n 2^{n}\) clauses involving \(n 2^{n} + 2 n\) variables. It holds if and only if the assignment of the variables \(\{ x_{i} \}_{i<n}\) and \(\{ y_{i} \}_{i<n}\) is such that \((y_{n-1},...,y_{0}) = f(x_{n-1},...,x_{0})\).

Generating the Full CNF and Solving. Using \(\text {CNF}\big (f, \{x_{i}\} , \{y_{i}\} \big )\), a SAT-solver and the full codebook of a S-Box \(S : \{ 0,1 \}^{2n} \rightarrow \{ 0,1 \}^{2n}\), we can recover the Feistel functions used to generate S if it was indeed generated using a Feistel network or prove that it was not constructed in this fashion using DecomposeFeistel \((S, R, \text {operation})\) (see Algorithm 2). To describe it, we introduce variables \(\{ x_{i}^{r} \}_{i<2n}, \{ y_{i}^{r} \}_{i<n}\) and, if the combining function is a modular addition instead of a XOR, \(\{ c_{i}^{r} \}_{i<n}\) for \(r < R\) where R is the number of rounds we consider were used. These are summarized in Fig. 2.

Fig. 2.
figure 2

The variables used to encode round r of a Feistel Network operating on blocks of 2n bits.

The general idea consists in building the CNF representation of the fact that \(S(p)=c\) for each input/output pair (pc) separately, concatenate these CNF’s and then have a SAT-solver solve the CNF obtained in this fashion. To each Feistel functions is associated a unique set of \(n 2^{n}\) variables as described in the previous section. These are used when encoding that half of the internal state at round \(r+1\) of the Feistel Network goes through the corresponding Feistel function. The only difficulty left is the combination of the left branch with the output of the Feistel function. In the case where a XOR is used, we can simply encode that \(x_{i}^{r+1} = y_{i}^{r} \oplus x_{i+n}^{r}\) separately for each bit i. However, in the case of a modular addition, we need to introduce a new set of variables for each evaluation of the addition corresponding to the carry bits: \(\{ c_{i}^{r} \}_{i<n}\). The addition is then encoded into a CNF using the CNF encoding of the following equations:

figure c

A useful heuristic when trying to decompose more than 4 rounds is to look for decompositions with particular patterns in the sequence of the Feistel functions. For instance, decomposing a 5-rounds Feistel Network with round functions \((S_{a},S_{b},S_{c},S_{d},S_{a})\) is easier than decomposing a similar structure with round functions \((S_{a},S_{b},S_{c},S_{d},S_{e})\) if this knowledge is hard-coded in the CNF by using the same sets of variables to encode both \(S_{e}\) and \(S_{a}\). In this case, DecomposeFeistel \((S,R,\text {operation})\) also takes the assumed sequence of the S-Boxes as an additional input.

Another improvement comes from the observation that constants can be XOR-ed (or added/subtracted) in the input of Feistel functions in the first \(R-2\) rounds — provided they are cancelled by XOR-ing (or adding/substracting) in the later rounds — without changing the output of the function. Using this, we can arbitrarily decide that the first Feistel functions all map, say, 0 to 0. This simplification of the CNF helps the SAT-solver a lot and is actually necessary to attack 5 independent rounds.

We implemented Algorithm 2 and used the SAT-solver Minisat [37] to solve the CNF formula generated. The time taken to decompose S-Boxes actually made of small Feistel Networks is smaller than the time taken to discard an S-Box which is not based on such a structure. Decomposing \(8 \times 8\) S-Boxes built using 4-rounds Feistel Networks, regardless of whether \(\oplus \) or \(\boxplus \) is used, takes less than a second on a regular desktop PCFootnote 5 and discarding S-Boxes built in other ways requires about 5 seconds. Decomposing 5-rounds requires a bit less than a minute but discarding this structure takes longer, for instance 3 min to prove that F is not a 5-rounds \(\oplus \)-Feistel and 23 min to show that is it not a 5-rounds \(\boxplus \)-Feistel. It is also possible to attack larger instances provided enough RAM is available. A 4-rounds Feistel Network corresponding to a \(14 \times 14\) S-Box can be broken in about 2 hours using up to about 38 Go of RAMFootnote 6.

The CNF formulas equivalent to F being a Feistel Network with 3,4 or 5 rounds, using either \(\oplus \) or \(\boxplus \) are all unsatisfiable, meaning that F is not a Feistel Network with at most 5 rounds.

For the sake of completeness, we mention the existence of another time efficient attack on 5-round Feistel Networks by Gaëtan Leurent based on a boomerang-like property [38]. Indeed one of the open problems is how far cryptanalytic techniques can go in analysis of ciphers with small block, where the full code-book is available to the attacker.

4 From an S-Box to a Picture and Back Again

In order to distinguish an S-Box from a random one we propose a new method which we call Pollock’s Pattern Recognition Footnote 7. It is based on turning the DDT and the LAT of the S-Box into a picture and then use the natural pattern finding power of the human eye to identify not-random properties. We also describe a method to perform (partially) the inverse operation: Seurat’s Steganography Footnote 8. It creates an S-Box such that an image is embeded in the picture representation of its DDT.

4.1 Pollock’s Pattern Recognition

As is clear from Sect. 2, the distribution of the coefficients in the LAT of an S-Box provides a powerful tool to distinguish a random-looking S-Box from a permutation chosen uniformly at random from the set of all permutations. We suggest here another method for looking at these coefficients which can also be applied to the DDT. The idea is to look at the whole table at once, be it a DDT or LAT, and then rely on the pattern matching capabilities of the pair human eye/human brain to possibly discard that the S-Box was chosen uniformly at random. In order to look at the whole table, we associate to the values of the coefficients different colors. Exactly which color scale to use is a question which can only be answered by trying different ones. As an illustration of the power of this method, we provide pictures allowing us to discard the randomness of 4 S-Boxes using merely a quick glance in Appendix B.

  • Zorro. The S-Box of this cipher [9] is based on a 4-rounds Feistel Network with a complex diffusion layer. As a consequence, the algorithm presented in Sect. 3.2 fails on it. The picture representation of its LAT, given in Fig. 4a, contains “stripes”. These correspond to coefficients equal to 6 (orange) and 2 (green). These never appear for half of the input masks according to a repeating pattern. Such a behaviour is not expected from a random permutation. The color scheme was chosen so as to highlight this property. We note that the congruence modulo \(2^k\) for some k of the coefficients of the LAT is related to the algebraic degree of \(i \cdot x \oplus j \cdot S(x)\) as explained for example in [39] (Proposition 6.1).

  • CLEFIA. This block cipher [40] uses two distinct S-Boxes. The one denoted \(S_{0}\) has a particular structure based on smaller 4\(\,\times \,\)4 S-Boxes. The LAT of this S-Box is given in Fig. 4b: note the “dents” on the top and left side of the picture as well as the low number of colors compared to Fig. 4c which also depicts a LAT and uses the same color-scale. This low number of colors is a consequence of the fact that no coefficient in the LAT is congruent to 2 modulo 4 which in turn is related to this S-Box having an algebraic degree equal to 6 on all of its coordinates. Neither this nor the “dents” are expected from a random permutation.

  • SAFER+. This block cipher [41] uses an S-Box based on exponentiation in \(\mathbb {Z}/256\mathbb {Z}\). Its LAT is given in Fig. 4c; note in particular the vertical lines whih appear in this representation.

  • Arithmetic. The DDT can also be used in the same fashion. For example, we can look at the DDT of an S-Box generated using a simple algebraic expression similar to those discussed in Sect. 3.1, namely \(s : x \mapsto \psi \big ( \psi (x) \big )\) with \(\psi : x \mapsto 3 \cdot \big ( (3 \cdot x \oplus 0x53) >>> 4 \big ) \oplus 0x8b\). The representation of its DDT is in Fig. 4d. Note the white rectangles corresponding to subsets of impossible differentials and the loose similarity between the top left and bottom right quadrants on one hand and the top right and bottom left quadrants on the other hand. None of these characteristics are expected from the DDT of a random permutation. Note that with 3 iterations of \(\phi \) this S-box becomes reasonably good.

We however were not able to spot any particular pattern in the Pollock representation of neither the DDT nor the LAT of Skipjack’s F. Such representations are given respectively in Figs. 3a and b in Appendix B. We used the function matrix_plot from the SAGE [42] software package to draw the Pollock representations.

4.2 Seurat’s Steganography

In this section, we present an algorithm allowing the creation of a non-bijective S-Box such that the picture representation of its DDT contains a particular image. Since we draw this image dot after dot like in pointillism and since it hides said image, we call the method we present below Seurat’s Steganography. The pictures we embed are black and white, the white parts corresponding to places where differentials are impossible and black parts to places where the differentials have non-zero probability.

The Algorithm. We define white and black equations as those giving the corresponding pixel color in the Pollock representation of the DDT of an S-Box.

  • White Equations. \(W_{a,b} : \forall x \in \{ 0,1 \}^m, ~ S(x+a) + S(x) \ne b\).

  • Black Equations. \(B_{a,b} : \exists x \in \{ 0,1 \}^m, ~ S(x+a) + S(x) = b\).

The inputs considered in this Section are:

  • B The complete list of the black equations.

  • \({T}_{w}\) A table of booleans of size \(u \times v\) (the dimensions of the image) where \(T_w[a,b]\) is false if and only if the pixel at (ab) cannot be white.

  • S A partially unspecified S-Box such that all equations \(B_j\) for \(j < i\) hold and such that none of the \(W_j\) has a solution for any j.

  • i The index of the equation in B for which we need to find a solution.

We first need a sub-routine checking if adding an entry \(S(x)=y\) to a partially assigned S-Box, i.e. an S-Box for which some of the outputs are unspecified, leads to at least one of the white equations not holding anymore. It is described in Algorithm 3.

figure d

We now describe Seurat’s Steganoraphy, namely Algorithm 4, which uses two lists of equations to iteratively build an S-Box such that a particular picture appears in its DDT. It works by first making a list L of all the ways entries could be added to the S-Box in order to satisfy the black equation \(B_i\). If none are found, the function fails. The function is finally called recursively on the candidates found to look for a solution for the next equation. If no solution are found for the next equation, the function fails.

figure e

Some optimizations are possible. First of all, it is not necessary to write this algorithm using recursion. It is also not necessary to let L be as large as possible. In fact \(| L | \le 2\) is sufficient, although \(| L | = 1\) does not work unless the picture is very simple. It is also possible to allow some noise by tweaking \(\text {CheckW}(S,x,y,T_w)\) to return true with low probability for pairs (xy) even if they blacken a white pixel.

Two outputs of this algorithm are presented in Appendix C: the S-Boxes are given along with the Pollock representation of their DDT which clearly show the pictures we chose to embed in them. The differential and linear properties of the S-Box described in Table 6 are close from what would be expected from a random function (differential uniformity of 14, linearity of 39), meaning that it could be used in a context were a \(8 \times 8\) random function would be sufficient.

Counting Possible S-Boxes. Let S be a random function from \(\{ 0,1 \}^m\) to \(\{ 0,1 \}^n\). Then \(W_{a,b}\) holds if and only if \(d_{a,b} = 0\), which happens with probability \(P[d_{a,b} = 0] = \exp \big ( - 2^{m-n-1} \big )\) because the coefficients in the DDT of a random function follow approximately a Poisson distribution with parameter 1/2 (see [22]). Hence, if we have b black equations, w white ones and if we consider that their having solutions are independent events, then the probability that an S-Box has the correct image at the center of its DDT is \(P_{\text {success}} = \big ( \exp (-2^{m-n-1}) \big )^{w} \times \big ( 1-\exp (-2^{m-n-1}) \big )^b\). In the case where \(m=n\), we use that \(\log _{2}(\exp (-1/2)) \approx -1.35\) and that \(\log _{2}(1-\exp (-1/2)) \approx -0.72\) to approximate this probability by

$$\begin{aligned} P_{\text {success}} = 2^{-(0.72 \cdot w + 1.35 \cdot b)}. \end{aligned}$$

As there are \(2^{n 2^n}\) possible \(n \times n\) S-Boxes, we expect to have very roughly the following amount of solutions:

$$\begin{aligned} N_{\text {Solutions}} = 2^{n 2^n - (0.72 \cdot w + 1.35 \cdot b)}. \end{aligned}$$

Therefore, we need \(0.72 \cdot w + 1.35 \cdot b < n 2^{n}\) in order to have a non-empty set of S-Box with the image we want inside their DDT. Black pixels are about twice as expensive as white ones according to this model. However, in practice, it is only possible to build a S-Box such that its DDT contains a black square of size \(22 \times 22\) or a white one of size \(62 \times 62\) without any noise, meaning that black pixels are, from our algorithm’s point of view, about 8 times more expensive. Stirling’s equation gives an approximate number of \(2^{(n-1.44) \cdot 2^n}\) permutations of \(\{0,1\}^n\), so we need that \(0.72 \cdot w + 1.35 \cdot b < (n - 1.44) 2^{n}\) in order for permutations with the correct black/white pixels to exist with non negligible probability. However, our algorithm will require significant changes in order to search for permutations.

Since our algorithm does not require the pixels to be organised inside a square, we can also use it to force white or black pixels to appear anywhere in the DDT of an S-Box. This could be used to place a sort of trapdoor by for instance ensuring that a truncated differential compatible with the general structure of a cipher is present. Another possible use could be to “sign” a S-Box: Alice would agree with Bob to generate a S-Box for him and tell him before hand where some black/white pixels will be. Bob can then check that those are placed as agreed.

5 Conclusion

Knowledge of the internal structure of an S-box gives clear advantages to the designer of a cipher in terms of efficient or side-channel resistant implementation. It is also crucial in the white-box or crypto-currency setting. Hiding the S-box’s structure can be also a way to hide superior cryptanalysis techniques or trapdoors.

In this paper we have introduced several approaches and algorithms to decompose an S-Box with unknown structure and we illustrated them by studying the S-Box of the NSA’s block cipher Skipjack. This allowed us to rule out some possible structure, and to prove that its linear properties are too unlikely to have happened at random. We also provided an algorithm capable of generating very similar S-Boxes (Table 5).

An open problem related to this work is the study of block ciphers with small block sizes: how far can cryptanalysis go given a whole codebook? How many rounds of small-block Feistel Network or SPN is it feasible to break?