Chapter 4 Cryptography:the secret life of primes(1 / 1)

The reader will now appreciate that the collection of counting numbers has, from the earliest times, been recognized as the repository of riddles and secrets, many of which have never been revealed to this day. For many of us, this is enough to justify the continued serious study of numbers but others may take a different attitude. Intriguing and difficult as these conundrums may be, it might be imagined that they have little bearing on the rest of human wisdom. But that would be a mistake.

Over the last few decades it has emerged that ordinary secrets, of thekindsweallindulgeinfromtimetotime,canbecodedas secrets about numbers. This has now all been put into practice and our most precious secrets, whether they be commercial or military, personal orfinancial, political or downright scandalous,can all be protected on the Internet by masking them using secrets about ordinary counting numbers.

Secrets turned into numbers

How is all this possible?Any information, whether it be a poem or a bank statement, a blueprint for a weapon or a computer program, can be described in words. We may, however, need to augment the alphabet that is used to make up our words beyond the ordinary letters of the alphabet. We may include number symbols, punctuation symbols including special symbols for space between ordinary words, but it is nonetheless the case that all the information we wish to transfer, including instructions for producing pictures and diagrams, can be expressed using words from an alphabet of, let us say, no more than one thousand symbols. We can count these symbols and so represent each symbol uniquely as a number. Since numbers are cheap and inexhaustible, it may be convenient to use numbers all with the same number of digits for this purpose(so, for example, every symbol was represented uniquely by its own four-digit PIN). We could string the symbols together as required to give one big long numberthattoldtheentirestory.Wecanevenworkinbinaryif we wish and so devise a way of translating any information into one long string of 0s and 1s. Every message we might ever want to send could then be coded as a binary string and then decoded at the other end by a suitably programmed computer, to be compiled in ordinary language that we can all comprehend. This then is the first realization:in order to send messages between one person and another, it is enough, both in theory and in practice, to be able to send numbers from one person to another.

Turning messages into numbers, however, is not the big idea. To be sure, the exact process by which all the information is digitized may be hidden from the general public, but nonetheless is not the source of protection from eavesdroppers. Indeed, from the point of view of cryptography, we may identify any message, the so-called plaintext, with the number that represents it and thereby think of that number as the plaintext itself, as it is assumed that anyone has access to the wherewithal that will allow one to be transformed into the other. Secrecy only comes on to the scene when we mask these plaintext numbers with other numbers.

Let me introduce you to thefictitious characters that populate the various scenarious of cryptography, which is the study of ciphers(secret codes). We imagine Alice and Bob, who want to communicate with each other, without being overheard by the eavesdropper, Eve. Instinctively, we might sympathize with Alice and Bob, picturing Eve as up to no good, but of course the reverse may be true, with Eve representing a noble policing authority striving to protect us all from the evil plots of Bob and Alice.

Whatever the moral standing of the participants, there is an age-old approach that Alice and Bob may employ to cut Eve out of the conversation even if Eve intercepts messages that pass between them. They can encrypt the data using a cipher key that is known only to Alice and Bob. What they may arrange to do is to meet in a secure environment where they exchange with one another a secret number(let us say 57)and then return home.When the time comes, Alice will want to send a message to Bob and, just to illustrate the point, suppose that message can be represented by a single digit between 1 and 9. On the big day, Alice wants to send the message‘8’to Bob. She takes her message and adds the secret ingredient, that is to say she masks its true value by adding 57 and so sends the message to Bob, across an insecure channel, of 8+57=65. Bob receives this message and subtracts the secret number to retrieve Alice’s plaintext 65-57=8.The nefarious Eve, however, has a good idea what these two are up to and indeed does manage to intercept the enciphered message,65. But what can she do with it?She may know, as we do, that Alice has sent one of the nine possible messages 1,2,3,···,9to Bob and also knows that she has encrypted it by adding a number to the message, which must therefore lie between 65-9=55and65-1=64. However, because she cannot tell which of these nine masking numbers has been used(she isn’t in on the secret), she is none the wiser as to the actual plaintext message that Alice sent to Bob, which is still just as likely to be any one of the nine possibilities. All she knows is that Alice has sent a message to Bob but has no idea what it is.

It might seem that Alice and Bob are now impervious to the malice of Eve and can communicate with impunity using the magic number 57 to disguise all they have to say. That, however,isn’t quite the case. They would be well advised to change that number, indeed they are better off using a new secret number every time because if they don’t, the system will begin to leak information to Eve. For example, say in a future week Alice wants to send to Bob the same message number 8. Everything would run as before and once again Eve would intercept the mysterious number 65 from the airwaves, but this time it would tell her something. Eve would know that, whatever this message is, it is the same message that Alice sent to Bob in thefirst week–this is just the sort of thing Alice and Bob would not want Eve to know.

This, however, looks to be no big problem for Alice and Bob.When theyfirst meet up to‘exchange keys’, instead of agreeing on just one secret number, Alice could provide Bob with a long ordered list of thousands of secret numbers, to be used one after another, thus avoiding the possibility of meaningful coincidences in their publicly available communications.

And this is indeed what is done in practice. This kind of cipher system is known in the trade as a one-time pad.Thesenderand receiver mask their plaintext with a single-use number from the‘pad’. That leaf of the pad is then discarded by both the sender and receiver after the message has been sent and deciphered. The one-time pad represents a completely secure system in that the insecure message that travels in the public domain contains no information about the content of the plaintext. To decipher it, the interceptor needs to get hold of that pad in order to obtain the encryption–decryption key.

Keysandkeyexchange

It would seem then that the problem of secure communication is completely solved by the one-time pad and, in a way, that is true.The difficulty with ciphers like the one-time pad, however, is that they require the participants to exchange a key in order to use them. In practice, this takes a lot of effort. For high-level communications, such as those between the White House and the Kremlin, money is no object and the necessary exchanges are carried out under conditions of maximum security. In the everyday world on the other hand, all sorts of people and institutions need to communicate with one another in a confidential fashion. The participants cannot afford the time and energy required to secure key exchanges and, even if this were arranged by a trusted third party, it can be an expensive business.

The common drawback of all ciphers that had been used for thousands of years up until the 1970s was that they were allsymmetric ciphers, meaning that the encryption and decryption keys were essentially the same. Whether it was the simple alphabet-shift cipher of Julius Caesar, or the complex Enigma Cipher of the Second World War, they all suffered from the common weakness that once an adversary learned how you were encoding your messages, they could decode them just as well as you. In order to make use of a symmetric cipher, the communicating partners needed to exchange the cipher key in a secure way.

It seemed to have been tacitly assumed that this was an unavoidable principle of secret codes–for a cipher to be used the partners needed, somehow or other, to exchange the key to the cipher and to keep it secret from the enemy. Indeed, this might be regarded as mathematical common sense.

This is the kind of assumption that makes a mathematician suspicious. We are dealing with what is essentially a mathematical situation, so one would expect such a‘principle’to be well founded and represented by some form of mathematical theorem. Yet there was no such theorem, and the reason that there was no such theorem was that the principle simply is not valid, as the following thought experiment reveals.

Transmission of a secure message from Alice to Bob does not in itself necessitate the exchange of the key to a cipher, for they can proceed as follows. Alice writes her plaintext message for Bob, and places it in a box that she secures with her own padlock. Only Alice has the key to this lock. She then posts the box to Bob, who of course cannot open it. Bob, however, then adds a second padlock to the box, for which he alone possesses the key. The box is then returned to Alice, who then removes her own lock, and sends the box for a second time to Bob. This time, Bob may unlock the box and read Alice’s message, secure in the knowledge that the meddling Eve could not have peeked at the contents during the delivery process. In this way, a secret message may be securely sent on an insecure channel without Alice and Bob ever exchanging keys. This imaginary scenario shows that there is no law that says that a key must change hands in the exchange of secure messages.In a real system, Alice and Bob’s‘locks’might be their own coding of the message rather than a physical device separating the would-be eavesdropper from the plaintext. Alice and Bob may then use this initial exchange to set up an ordinary symmetric cipher that would be used to mask all their future communication.

Indeed, this is the way a secure communication channel is often established in the real world. Replacing physical locking devices by personal codes is not, however, so easy to do. Unlike the locks,the encodings of Alice and Bob may interfere with one another,making the unscrambling(that is, the unlocking)that is carried outfirst by Alice and then by Bob unworkable. However, that this method can be effective wasfirst publicly demonstrated by Whitfield Diffie and Martin Hellman in 1976.

A second related approach is the idea of asymmetric or public key cryptography in which everyone publishes their own public key that is then used to encipher messages meant for that person.However, each person also holds a private key, without which the messages enciphered using their own unique public key cannot be read. In terms of the padlock metaphor, Alice provides Bob with a box in which to place his plaintext message together with an open padlock(her public key)to which she alone holds the key(her private key).

A workable public key system might seem too much to ask for as the twin requirements of security and ease of use seem to conflict.Fast, safe encryption is, however, available to the general public on the Internet, even if they barely realize that it is there, safeguarding their interests. And it is all down to numbers, and prime numbers at that.

How secret primes protect our secrets

Remember that every plaintext message is regarded just as a single number, so it is natural to try to mask this number using other numbers. The most common way to do this is through employing the so-called RSA enciphering process, published in 1978 by its founders, Ron Rivest, Adi Shamir, and Leonard Adleman. In RSA,each person’s private key consists of three numbers, p, q,andd,where p and q are(very large)prime numbers and the third ingredient d is Alice’s secret deciphering number, the role of which will be explained in due course. Alice provides the public with n=pq, the product of her two secret primes, and an enciphering number e(which is an ordinary whole number, in no way related to the special constant called e mentioned in Chapter 2).

A simple example for the purposes of illustration would be for Alice to have the primes p=5andq=13sothatn=5×13=65.If Alice sets her enciphering number to be e=11, then her public key would be(n,e)=(65,11). To encrypt a message m, Bob only needs nand e. However, to decipher the encrypted message E(m)that Bob transmits to Alice requires the deciphering number d,which in this case turns out to be d=35, as we shall show a little further on. The mathematics that allows d to be calculated requires that the primes p and q are known. In this toy example,given that n=65, anyone would soon discover that p=5and q=13. However, if the primes p and q are extremely large(typically they are hundereds of digits in length), this task becomes a practical impossibility for almost any computer system,at least in a reasonably short time, such as two or three weeks. In summary, the RSA system of enciphering is based on the empirical fact that it is prohibitively difficult tofind the prime factors of a very, very large number n. The clever part, which we shall explain in the remainder of the chapter, lies in devising a way that the message number mcan be enciphered just using the publicly known numbers nand e but, in practice, deciphering requires possession of the prime factors of n.

Here is how it works. What Bob sends through the ether is not m itself but the remainder when me is divided by n. Alice can then recover mby taking this remainder r and similarly calculating the remainder when rd is divided by n. The underlying mathematics ensures that the outcome for Alice is the original message m,which can then be decoded into ordinary plaintext by Alice’s computer system. This is, of course, happening seamlessly behind the scenes for any real-life Alice and Bob.

It would seem that the only thing that Eve lacks that really matters is this deciphering number d. If Eve knew that, she could decipher the message just as well as Alice. It turns out that d is a solution of a certain equation. Solving this equation is computationally quite easy and relies on the Euclidean Algorithm, published in the Books of Euclid in 300 BC. That is not the difficulty. The trouble is that it is not possible tofind out exactly what equation to solve unless you know at least one of the primes p and q, and that is the obstacle that stops Eve in her tracks.

We can explain more about how the numbers involved in all this work into the system. First, there is apparently quite a problem with Bob’s initial task. The number mis big, the number nis monstrous(of the order of 200 digits)and even if e is not that large, the number me is going to be extremely large as well. After calculating it, we have to divide me by the number nto get the remainder r, which represents the enciphered text. It might seem that the calculations are too unwieldy to be practical. We should be aware that even though modern computers are extremely powerful, they yet have their limitations. When calculations involve very high powers, they can exceed the capacity of any computer system. We certainly cannot assume that any practical calculation that we set for a computer can be done in a short period of time.

The saving grace for Bob is that it is possible tofind the required remainder r, without doing the long division at all. Indeed, the remainders just depend on remainders, and here is an example to illustrate the point. What are thefinal two digits of 739?(Thatisto say, what is the remainder when this number is divided by 100?)In order to answer this question, we might begin by calculating thefirst few powers of 7:71=7,72=49,73=343,74=2,401,75=16,807,···. It will soon become clear, however,that the sheer size of these numbers is going to become unmanageable well before we get anywhere near 739. On the other hand, as we calculate one power after another, we see a pattern emerging. The key observation is that, as we calculate succeeding powers, thefinal two digits of the answer depend only on thefinal two digits of the preceding number, as when we carry out the multiplication, digits in the hundreds column and beyond have no effect on what end up in the units and tens columns.

What is more, since 74 has 01 as thefinal digit pair, the next four powers will end in 07,49,43, and then 01 again. Hence, as we compute succeeding powers, the pattern of the last two digits will simply repeat this cycle of length four, over and over again. To return to the question in hand, since 39=4×9+3, we will pass through this four-cycle nine times and then take three more steps in calculating thefinal two digits of 739, which must therefore be 43.

And this works quite generally. In order tofind the remainder when some power ab is divided by nsay, we need only take the remainder r when a is divided by nand keep track of the remainders as we take successive powers of r.Whenweworkwith the remainder r, which will be a number in the range from 0 to n-1, mathematicians say that we are working modulo n,discarding any higher multiples of nthatmayarise,astheyleavea remainder of 0 when divided by n, and so cannot contribute to the value of thefinal remainder r.

All this serves to give an indication that the number sent by Bob to Alice, me modulo n, can indeed be calculated without too much effort on behalf of Bob’s computer. All the same, the numbers involved are in practice mighty big, so more explanation is needed to show that they can be handled. The large powers involved in computing me can be dealt with in stages by a process known asfast exponentiation. Without going into detail, the method involves successive squaring and multiplying of powers to arrive at me modulo n with the binary form of e guiding the algorithm through to quicklyfind the required remainder in relatively few steps.

Euclid shows Alice how tofind her deciphering number

Alice’s computer canfind d using an algebraic tool that is over2,300 years old, the Euclidean Algorithm, which will be explained in a moment. Eve’s computer could of course do the same thing if it just knew which equation to solve. However, since p and q are privatetoAlice,sois(p-1)(q-1)and Eve does not know where to begin.

Returning to the Euclidean Algorithm, this begins from the observation that it is possible tofind the highest common factor(hcf)of two numbers a>b by successive subtraction.(The hcf is also known as the gcd–greatest common divisor.)We just note that r=a-b has the property that any common factor of any two of the three numbers a,b,andr will also be a factor of the third.Forexample,ifc is a common factor of a and b,sothata=ca1 and b=cb1 say, we see that r=a-b=ca1-cb1=c(a1-b1), giving us a factorization of r involving the divisor c. In particular, the hcf of a and b is the same as the hcf of b and r. Since both these numbers are less than a, we now have the same problem but applied to a smaller number pair. Repetition of this idea then will eventually lead to a pair where the hcf is obvious.(Indeed, the two numbers in hand will eventually be the same, for if not we could proceed one more step;their common value is then the number we seek.)

Forexample,tofindthehcfofa=558 and b=396, thefirst subtraction would give us r=558-396=162, so our new pair would be 396 and 162. Since 396-162=234, our third pair becomes 234 and 162, and as we continue the full list of number pairs is:

and so the hcf of 558 and 396 is 18.

It is possible to write down the hcf of a number pair from the prime factorizations of the numbers in question. In this example,558=2×32×31, while 396=22×32×11;taking the common power of each prime entering into the factorizations, we obtain the hcf as 2×32=18. Nevertheless, for larger numbers it takes much less work to use Euclid’s Algorithm as it is generally easier to perform subtractions than tofind prime factorizations.

Another bonus of the Euclidean Algorithm is that it is always possible to work it backwards and in so doing express the hcf in terms of the original two numbers. To see this in action in the previous example, it is best to compress the calculation when the same number appears several times over in the course of the subtractions, representing this as a single equation as follows:

558=396+162

396=2×162+72

162=2×72+18

72=4×18.

Beginning with the second to last line, we now use each little equation to eliminate the intermediate remainders, one at a time.In this example, by usingfirst the penultimate equation, and then the one above that we obtain:

18=162-2×72=162-2×(396-2×162)=5×162-2×396

andfinally using thefirst equation we can eliminate thefirst intermediate remainder of 162:

=5×(558-396)-2×396=5×558-7×396=18.

48=4×11+4

11=2×4+3

4=1×3+1confirming that the hcf of k and e is indeed 1. Reversing the algorithmweobtain:

1=4-3=4-(11-2×4)=3×4-11=3(48-4×11)-11=3×48-13×11.

This gives an initial value of d=-13 as the solution to the requirement that 11d leaves remainder 1 upon division by 48, so in order to get a positive value of d in the required range we add 48to this number to get d=48-13=35.

And in passing it is well to point out that the Euclidean Algorithm provides the missing link in our proof of the uniqueness of prime factorization as it allows us to verify the euclidean property that if aprimep is a factor of the product ab,sothatab=pc say, then p is a factor of at least one of a and b. The reason for this is that if p is not afactorofa then, since p is prime, the hcf of a and p is 1. By reversing the Euclidean Algorithm when applied to the pair a and p, we can thenfind integers r and s say such that ra+sp=1.This is enough to show that p is then a factor of b for, since ab=pc,we have:

b=b×1=b(ra+sp)=r(ab)+psb=r(pc)+psb=p(rc+sb).

This is the required factorization of b that features the prime p as afactor.

In conclusion, the number theory underlying RSA enciphering makes the system sound, although various protocols that have not been explained here must be respected in order to safeguard the integrity of the system. There are issues of authentification(what if Eve contacts Alice pretending to be Bob?), non-repudiation(what if Bob pretends that it was Eve who sent his message to Alice?), and identity fraud(what if Alice abuses confidential identification sent to her by Bob and tries to impersonate him online?). Moreover, other weaknesses in the system can be exposed when predictable or repeated messages proliferate.However, these difficulties may potentially arise in any public key cryptosystem. They can be overcome and in the main are unrelated to the underlying mathematical techniques that ensure high quality and robust encyryption.

This chapter has demonstrated a major application of prime numbers and the theory of divisibility and remainders. The ancient mathematics of Euclid and the 18th-century contribution of Euler allows this to be explained, not only in broad principle,but infine detail.

The first part of our book closes with Chapter 5 which introduces some special classes of integers associated with the enumeration of some naturally occurring groupings.