Baillehache Pascal's personal website

Vanilla RSA 64bits implementation in C

RSA is a cryptosystem upon which secure communication on Internet relies. It is based on two keys, one public (everyone knows it) used to cipher data, and one private (only the reader knows it) to decipher data. Its security comes from the difficulty of factoring the product of large prime numbers (which are used for the key generation).

Of course it was on my to-study list since a while, now the time has come to loose a few hours into this rabbit hole. This article is the result cheat sheet. Disclaimer: I'm not an expert in cipher algorithm, this article is the result of personal study with material available on the web and motivated by curiosity only. So, double check what I say, and you're responsible for what you do with it (anyway you won't be able to much out of it).

Key generation

To create the public and private keys one needs two prime numbers \(p\) and \(q\). They must be choosen randomly and be very large (several hundreds of digit) to achieve any significant level of security. How to obtain them is actually trickier than it seems, so I'll just consider we have them. But, not started yet and already a practical problem: even a uint64_t gives you only 19 digits. So we need a big int implementation, which I don't have (yet, obviously). Rabbit hole I said...

For now I'll satisfy myself with uint64_t. It immediately discards the implementation below for practical use as it limits it to easy-to-crack keys, but stays valid in principle and serves its educational purpose. Everything works exactly the same with any primes, and I'll use super small ones here for tests.

First, we calculate what's called the modulus: \(n=pq\). Except for potential overflow nothing complicated here.

Next, we calculate what's called the totient of the modulus: \(\lambda=lcm(p-1,q-1)\). And here comes a nice occasion to get lost into the weeds... The previous equation is the one given in the Wikipedia article, but if you crawl the web you'll easily find people saying that the totient is \(\phi=(p-1)(q-1)\), definitely not the same thing. The web is a pile of crap, sure, but Introduction to algorithms also uses \(\phi\) (at least in the fourth edition). I have as much faith in such a book as I have in Wikipedia, so what ? Well this post gives us the answer: \(\phi\) is Euler's totient, used in the original RSA spec, and \(\lambda\) is the Carmichael's totient, used in modern implementation. Conclusion, I'll stick with \(\lambda\).

The lowest common multiple can be computed as follow: \(lcm(a,b)=|ab|/gcd(a,b)\) (we can ignore the absolute value here as \(a\) and \(b\) will always be strictly positive) and the greatest common divisor can be computed using the Stein algorithm.

uint64_t GCD(
  uint64_t a,
  uint64_t b) {
  uint8_t shift = 0;
  while(((a | b) & 1) == 0) {
    ++shift;
    a >>= 1;
    b >>= 1;
  }
  while((a & 1) == 0) a >>= 1;
  do {
    while((b & 1) == 0) b >>= 1;
    if(a > b) {
      uint64_t const t = b;
      b = a;
      a = t;
    }
    b -= a;
  } while(b != 0);
  return a << shift;
}

uint64_t LCM(
  uint64_t const a,
  uint64_t const b) {
  return (a * b) / GCD(a, b);
}

Next, we choose what's called the public exponent \(e\) such as \(2\lt e\lt \lambda\) and \(e\) is not coprime to \(\lambda\). That's another tricky choice to make as it affects the resilience to attack and speed of ciphering. According to the Wikipedia article 65537 is a common choice. Then, referring to this article to handle small \(\lambda\) in my toy examples, I'll use the largest value in \([3,5,17,257,65537]\) (Fermat numbers).

uint64_t GetPublicExplonent(uint64_t const lambda) {
  uint64_t const candidates[5] = {65537, 257, 17, 5, 3};
  for(int iCandidate = 0; iCandidate < 5; iCandidate += 1) {
    if(
      candidates[iCandidate] < lambda &&
      (candidates[iCandidate] % lambda) != 0
    ) {
      return candidates[iCandidate];
    }
  }
  return 3;
}

The public key consists of \(n\) and \(e\).

Next, we calculate what's called the private exponent using modular multiplicative inverse: \(d\cong e^{-1}\quad mod\quad\lambda\). The congruence can be rewritten as \(ed\cong 1\quad mod\quad\lambda\), and is equivalent to solving \(ed+\lambda k=1\) for \((d,k)\) which is a linear diophantine equation. As, by definition of \(e\), \(gcd(e,\lambda)=1\), that equation can be solved using the extended Euclidian algorithm which finds the solution \((x,y)\) to the equation \(ax+by=gcd(a,b)\).

However there is another technical complication here. The extended Euclidian algorithm needs signed integers, while we've been using unsigned ones so far. Using signed ones instead is not satisfying as it reduces the range of values we can use for the keys, and it creates other complications later anyway. Fortunately, Jeffrey Hurchalla provides a solution for modular multiplicative inverse with unsigned integers in this article. Converted to C it becomes:

uint64_t GcdDecompositionUnsignedInput(
  uint64_t a,
  uint64_t b,
  int64_t* x,
  int64_t* y) {
  int64_t x1 = 1;
  int64_t y1 = 0;
  uint64_t a1 = a;
  int64_t x0 = 0;
  int64_t y0 = 1;
  uint64_t a2 = b;
  uint64_t q = 0;
  while (a2 != 0) {
    int64_t x2 = x0 - ((int64_t)q) * x1;
    int64_t y2 = y0 - ((int64_t)q) * y1;
    x0 = x1;
    y0 = y1;
    uint64_t a0 = a1;
    x1 = x2;
    y1 = y2;
    a1 = a2;
    q = a0 / a1;
    a2 = a0 - q * a1;
  }
  *x = x1;
  *y = y1;
  return a1;
}

Note that it may still return a negative value for \(d\), but as it's calculated modulo \(\lambda\) it's trivial to bring it back to a positive value.

The private key consists of \(n\) and \(d\).

And now we have everything we need for key generation:

typedef struct RSACipherKey {
  uint64_t n;
  uint64_t e;
  uint64_t d;
} RSACipherKey;

RSACipherKey GenerateKeys(
  uint64_t const p,
  uint64_t const q) {
  assert(UINT64_MAX / p >= q);
  RSACipherKey keys = {0};
  keys.n = p * q;
  uint64_t const lambda = LCM(p - 1, q - 1);
  keys.e = GetPublicExplonent(lambda);
  int64_t x = 0;
  int64_t y = 0;
  uint64_t gcd = GcdDecompositionUnsignedInput(keys.e, lambda, &x, &y);
  assert(gcd == 1);
  if(x > 0) keys.d = (uint64_t)x;
  else keys.d = lambda - (uint64_t)(-x);
  return keys;
}

Here are some examples to test the implementation.

p	q	d	n	e
61	53	173	3233	257
11	5	13	55	17
7	17	17	119	17
53	59	1109	3127	257
11922649	74112287	426228107209745	883614784488263	65537

Ciphering/Deciphering

Ciphering a value \(m\) is then performed as follow: \(m'=m^e\quad mod\quad n\). Deciphering the value back is performed as follow: \(m=m'^d\quad mod\quad n\). Now we need a way to calculate modular exponentation (given the large number we're manipulating direct calculation is not an option). This can be done using the exponentation by squaring. Buuut, here again overflows are waiting around the corner to smash you in the face. The correct way to implement it is:

uint64_t MultMod(
  uint64_t a,
  uint64_t b,
  uint64_t c) {
  uint64_t res = 0;
  a = a % c;
  while (b > 0) {
    if (b % 2 == 1) res = (res + a) % c;
    a = (a * 2) % c;
    b /= 2;
  }
  return res % c;
}

uint64_t PowMod(
  uint64_t a,
  uint64_t b,
  uint64_t c) {
  uint64_t res = 1;
  uint64_t mask = 0x8000000000000000;
  while(!(b & mask)) mask >>= 1;
  while(mask) {
    res = MultMod(res, res, c);
    if(b & mask) {
      res = MultMod(res, a, c);
    }
    mask >>= 1;
  }
  return res;
}

The Chinese remainder theorem can also be used to optimise the calculation under the condition you know the private key (i.e. can be applied to decipher only). I'll lazily satisfy myself with exponentiation by squaring here.

However, the smart reader you are will immediately notice that as we use modulo \(n\) the ciphered/deciphered value cannot be greater than or equal to \(n\). How then can we cipher/decipher arbitrarily large messages ? Well in practice we don't. RSA is used with very large \(n\) to cipher very short messages whose integer representation is smaller than \(n\).

For example, a 2048 bits key with a large value could cipher almost up to 2048/8=256 bytes long messages. It isn't much, but sufficient to cipher the key used by another type of ciphering algorithm (like AES), more suited for large amount of data. Why, then, don't we simply use the other ciphering algorithm only ? RSA is slow but allows to securely exchange information, AES is fast but need a way to exchange securely its key. With both working in tandem you get the best of both worlds.

uint64_t Cipher(
      uint64_t const message,
  RSACipherKey const key) {
  assert(message < key.n);
  return PowMod(message, key.e, key.n);
}

uint64_t Decipher(
      uint64_t const message,
  RSACipherKey const key) {
  assert(message < key.n);
  return PowMod(message, key.d, key.n);
}

A little test on "Hello world!" and ciphering/deciphering byte per byte:

void TestCipherDecipher(
  char const* const message,
  uint64_t const p,
  uint64_t const q) {
  printf("Test Cipher/Decipher (p=%lu, q=%lu) \"%s\"\n", p, q, message);
  RSACipherKey key = GenerateKeys(p, q);
  size_t const len = strlen(message);
  uint64_t cipheredMessage[len];
  uint8_t decipheredMessage[len];
  for(size_t i = 0; i < len; i += 1) {
    cipheredMessage[i] = Cipher((uint64_t)(message[i]), key);
    decipheredMessage[i] = (uint8_t)Decipher(cipheredMessage[i], key);
    assert(decipheredMessage[i] == message[i]);
  }
  printf("message: ");
  for(size_t i = 0; i < len; i += 1) {
    printf("%u ", message[i]);
  }
  printf("\n");
  printf("cipher: ");
  for(size_t i = 0; i < len * 8; i += 1) {
    printf("%u ", ((uint8_t*)cipheredMessage)[i]);
  }
  printf("\n");
  printf("decipher: ");
  for(size_t i = 0; i < len; i += 1) {
    printf("%u ", decipheredMessage[i]);
  }
  printf("\n");
}

Using \(p=61\) and \(q=53\):

message: 72 101 108 108 111 32 119 111 114 108 100 33 
cipher: 24 4 0 0 0 0 0 0 58 8 0 0 0 0 0 0 39 8 0 0 0 0 0 0 39 8 0 0 0 0 0 0 112 5 0 0 0 0 0 0 28 1 0 0 0 0 0 0 193 5 0 0 0 0 0 0 112 5 0 0 0 0 0 0 95 5 0 0 0 0 0 0 39 8 0 0 0 0 0 0 5 5 0 0 0 0 0 0 133 1 0 0 0 0 0 0 
decipher: 72 101 108 108 111 32 119 111 114 108 100 33

And using \(p=11922649\) and \(q=74112287\):

Test Cipher/Decipher (p=11922649, q=74112287) "Hello world!"
message: 72 101 108 108 111 32 119 111 114 108 100 33 
cipher: 220 58 76 242 180 169 2 0 23 194 253 231 225 49 1 0 157 119 92 118 68 236 2 0 157 119 92 118 68 236 2 0 11 30 249 231 14 83 2 0 160 239 169 136 150 206 1 0 200 240 209 55 250 115 0 0 11 30 249 231 14 83 2 0 253 122 64 58 214 43 0 0 157 119 92 118 68 236 2 0 247 91 91 213 98 168 1 0 207 161 186 89 243 84 1 0 
decipher: 72 101 108 108 111 32 119 111 114 108 100 33

Block ciphering

Still, we should be able to cipher messages of any size. The (not so) simple way to do it is to split the original message into blocks small enough that there integer representation is smaller than \(n\). That's what I did in the previous example, but going done to byte per byte is a bit extreme. Let's see a more efficient way to do it (even if in practice it's never used that way because cf the "padding" section below).

To ensure that a message is smaller than \(n\) its size \(s\) (in byte) must be such as \(n\gt2^{8s}-1\), or \(log_2(n+1)/8\gt s\). However, the result of ciphering may be up to \(n-1\) and then require at least \(s+1\) bytes. This leads to the following scheme: for ciphering, cipher per block of \(s\) bytes of data padded with 0s up to \(s+1\) bytes, and for deciphering, decipher per block of \(s+1\) bytes and discard the extra byte. \(log2()\) uses floating point values, and given those we are manipulating here we certainly want to stay away from accuracy problems. Instead I'll calculate it that way:

size_t GetBlockSize(uint64_t const modulus) {
  uint64_t size = 1;
  while(size < 8 && (1ULL << (8 * size)) < modulus) size += 1;
  assert(size <= 8);
  return (size_t)(size - 1);
}

For example, if \(n=3233\) then \(s=1\), and if \(n=883614784488263\) then \(s=6\).

Here lies another trap: when padding and discarding one must take into account the endianness of the architecture it is working on. Padding the wrong way will make your message's integer representation larger, which is exactly the opposite of what we are trying to do. A simple test for endianness can be perform as follow:

bool IsBigEndian(void) {
  uint16_t const a = 0x0100;
  return (*((uint8_t*)&a) == 1);
}

If you're compiling with GCC you can also use the predefined macro __BYTE_ORDER__ (cf here).

And now we're ready to cipher/decipher several bytes at once:

typedef struct RSACipherData {
  size_t size;
  uint8_t* data;
} RSACipherData;

RSACipherData CipherBlock(
  RSACipherData const message,
   RSACipherKey const keys) {
  RSACipherData result = {0};
  size_t const blockSizeIn = GetBlockSize(keys.n);
  size_t const blockSizeOut = blockSizeIn + 1;
  size_t nbBlock = message.size / blockSizeIn;
  size_t const nbByteLeft = message.size - nbBlock * blockSizeIn;
  nbBlock += (nbByteLeft > 0 ? 1 : 0);
  result.data = malloc(nbBlock * blockSizeOut);
  assert(result.data != NULL);
  result.size = nbBlock * blockSizeOut;
  for(size_t iBlock = 0; iBlock < nbBlock; iBlock += 1) {
    uint64_t inp = 0;
    uint8_t* p = (uint8_t*)&inp;
    if(IsBigEndian() && iBlock == nbBlock - 1 && nbByteLeft > 0) {
      p += blockSizeIn - nbByteLeft;
    }
    for( size_t iByte = 0; iByte < blockSizeIn; iByte += 1) {
      if(iBlock * blockSizeIn + iByte < message.size) {
        *p = message.data[iBlock * blockSizeIn + iByte];
        p += 1;
      }
    }
    uint64_t const out = PowMod(inp, keys.e, keys.n);
    p = (uint8_t*)&out;
    if(IsBigEndian()) {
      p += sizeof(uint64_t) - blockSizeOut;
    }
    for(size_t iByte = 0; iByte < blockSizeOut; iByte += 1) {
      result.data[iBlock * blockSizeOut + iByte] = *p;
      p += 1;
    }
  }
  return result;
}

RSACipherData DecipherBlock(
  RSACipherData const message,
  RSACipherKey const keys) {
  RSACipherData result = {0};
  size_t const blockSizeOut = GetBlockSize(keys.n);
  size_t const blockSizeIn = blockSizeOut + 1;
  size_t nbBlock = message.size / blockSizeIn;
  result.data = malloc(nbBlock * blockSizeOut);
  assert(result.data != NULL);
  result.size = nbBlock * blockSizeOut;
  for(size_t iBlock = 0; iBlock < nbBlock; iBlock += 1) {
    uint64_t inp = 0;
    uint8_t* p = (uint8_t*)&inp;
    if(IsBigEndian()) {
      p += sizeof(uint64_t) - blockSizeIn;
    }
    for(size_t iByte = 0; iByte < blockSizeIn; iByte += 1) {
      *p = message.data[iBlock * blockSizeIn + iByte];
      p += 1;
    }
    uint64_t const out = PowMod(inp, keys.d, keys.n);
    p = (uint8_t*)&out;
    if(IsBigEndian()) {
      p += sizeof(uint64_t) - blockSizeOut;
    }
    for(size_t iByte = 0; iByte < blockSizeOut; iByte += 1) {
      result.data[iBlock * blockSizeOut + iByte] = *p;
      p += 1;
    }
  }
  return result;
}

Testing again on "Hello world!":

void TestCipherDecipherBlock(
  char const* const str,
  uint64_t const p,
  uint64_t const q) {
  printf("Test Cipher/Decipher block (p=%lu, q=%lu) \"%s\"\n", p, q, str);
  RSACipherKey keys = GenerateKeys(p, q);
  RSACipherData message = {
    .data = (uint8_t*)str,
    .size = strlen(str),
  };
  RSACipherData cipheredMessage = CipherBlock(message, keys);
  RSACipherData decipheredMessage = DecipherBlock(cipheredMessage, keys);
  printf("message: ");
  for(size_t i = 0; i < message.size; i += 1) {
    printf("%u ", message.data[i]);
  }
  printf("\n");
  printf("cipher: ");
  for(size_t i = 0; i < cipheredMessage.size; i += 1) {
    printf("%u ", cipheredMessage.data[i]);
  }
  printf("\n");
  printf("decipher: ");
  for(size_t i = 0; i < decipheredMessage.size; i += 1) {
    printf("%u ", decipheredMessage.data[i]);
  }
  printf("\n");
  assert(memcmp(message.data, decipheredMessage.data, message.size) == 0);
  free(cipheredMessage.data);
  free(decipheredMessage.data);
}

Using \(p=61\) and \(q=53\):

Test Cipher/Decipher block (p=61, q=53) "Hello world!"
message: 72 101 108 108 111 32 119 111 114 108 100 33 
cipher: 24 4 58 8 39 8 39 8 112 5 28 1 193 5 112 5 95 5 39 8 5 5 133 1 
decipher: 72 101 108 108 111 32 119 111 114 108 100 33

And using \(p=11922649\) and \(q=74112287\):

Test Cipher/Decipher block (p=11922649, q=74112287) "Hello world!"
message: 72 101 108 108 111 32 119 111 114 108 100 33 
cipher: 138 97 65 100 14 121 1 115 255 122 205 84 225 2 
decipher: 72 101 108 108 111 32 119 111 114 108 100 33

Great, we surely deserve a cookie and a cup of coffee for going so far! But the story does not end here, far from it...

Padding scheme and Signing

In the present "vanilla" implementation, RSA is a deterministic cipher algorithm (the same input always gives the same output), which makes it actually quite insecure. To correct this, what's called padding scheme (not the same as padding in block ciphering) is introduced to add randomness to the ciphered message.

As explained in the previous section, we can cipher several bytes at once up to a certain number depending on the modulus. A padding scheme sacrifices a few of these bytes to insert padding bits (part random, part specified by the scheme) instead of the message data. Of course, the whole message is still encrypted, it just eventually takes a few more blocks if necessary. There are several schemes, each in several versions, the rabbit hole keeps getting deeper and deeper and I'm running out of time so I'll give up here. Just as a note for my future self, the currently recommended scheme is OAEP and more detailed can be found in the RFC 8017.

In addition to padding scheme, there is yet another operation which add even more security. Even if the whole ciphering model described so far is secured, anyone can still send a message claiming he/she is someone else. The signing mechanism prevents this. The sender hashes the message and cipher the result with his/her own private key, this gives the signature. Which hash function is used depends on the implementation (I've found references to MD5 and SHA1). The signature is appended to the message which can then be ciphered with the receiver public key. The receiver deciphers the received message with his/her private key, removes the signature (whose size is known by specification), deciphers the signature with the sender public key and checks that it matches the hash of the deciphered message. If it does, the identity of the sender is assured (unless the private key has been compromised of course).

Conclusion

The implementation introduced here is far from complete, or even useful. Nonetheless it's already capable of ciphering/deciphering messages of any size using the original RSA algorithm. It allowed me to gain a good comprehension of that algorithm, and I hope to revisit it in the future to add padding scheme, signature, and refactor everything with an implementation of large integers. I also hope it may help someone else to better understand how that fundamental algorithm works. You can also check this article by Allan which has motived me to spend time on this algorithm.

All the code in this article is available as one C file here. Compile with gcc -o rsa rsa.c.

2024-10-01
in All, C programming,
40 views