Tuesday, 20 January 2015

Introduction to SSL and Public Key Cryptography

This week for work, I was asked by my boss to set up SSL for a server that hosts an older version of an internal Seneca web application built around BigBlueButton. I didn't understand very much about how SSL worked, so I decided to look into it a little bit to better understand what I was actually doing. I knew it had something to do with security, and I remember all that fuss about the heartbleed bug last year, but beyond that I knew nothing.

SSL stands for "Secure Sockets Layer". Newer versions of SSL are called "TLS", for "Transport Layer Security", but people still often refer to it as "SSL", so I will too.

SSL is a protocol through which two computers can establish a secure connection to send information back and forth. Once such a connection is established, they can be absolutely certain that if anyone is listening in on that connection, they won't be able to understand anything. All of the messages will look like gibberish. So, I guess you could think of it like a protocol that lets two computers invent a language that only they can understand :)

But how does it do that?


Background


Before we can talk about SSL, we need a bit of background information. We can imagine that there are three algorithms which we can call the "encryption algorithm", the "decryption algorithm", and the "key-pair generation algorithm". We don't need to know how they are implemented, as each algorithm set is different. Also, they all use a bunch of crazy math that most of us don't understand. What matters is how these three algorithm are related to each other, which we will talk about next.

The key-pair generation algorithm uses a randomly generated number to output two really big numbers, which we call "key1", and "key2". These two keys are magic, because when used with the other two algorithms (the encryption and decryption algorithms), either key can be used to decrypt a message that was encrypted with the other key. If we were to think of these algorithms as functions, they would be prototyped something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
/* Takes as input a random number, and outputs a pair of (really big) numbers. */

   keyPair keyGen(randomNumber);


/* Takes as input, a plain-text message, and a key (generated from the key-pair
 * algorithm) and outputs an encrypted string (called "ciphertext"). */

   cipherText encrypt(message, key);


/* Takes as input, ciphertext, encrypted by the encryption algorithm above, and
 * outputs a string. This string will either be more gibberish, or the original message
 * used in the encryption algorithm, depending on whether the right key was used. */

   string decrypt(cipherText, key);

The idea is that the encryption algorithm takes a message that we want to encrypt, together with one of the numbers from our key generation algorithm. The decryption algorithm will only be able to recover the original message, if the key that it uses is the OTHER key that the key generation algorithm created.

Put another way, we can say that key1 can encrypt a message, but it cannot be used to decrypt the ciphertext that is generated, nor can any other number except one: key2. Only key2 can decrypt the message encrypted with key1. The opposite holds as well: key2 can be used to encrypt messages, but only key1 will be able to decrypt the resulting ciphertext. When two keys stand in such a situation as this (either key can decrypt messages encrypted only with the other), it is called "asymmetric cryptography". 


Public Key Cryptography


Whenever two computers want to communicate securely, they typically use a secure protocol (think of HTTP vs HTTPS). But how would this work? We can apply what we learned above to see how! 

Let us designate one of those computers as a client, and one as a server, though the same would apply were it a peer-to-peer arrangement. Both client and server generate a set of keys using the keyGen algorithm above. (The server would usually only generate the key-pair once, where as the client would regenerate a new key-pair for every server it connects to, and for every session. We'll see why in a second, when we talk about certificate authorities.) Each computer designates one of those keys as its "public key", and the other as its "private key". The private key is stored on the local computer, and should not be given out to anyone under any circumstances. The public key on the other hand, can be given to anyone who asks for it; there is no danger in doing so.

Continuing with the example, the two computers would exchange public keys so that now, secure communication can take place. Whenever the client wants to send a secure message to the server, it would encrypt the message with the server's public key. Then, since only the server knows its own private key, it would be the only computer that would be capable of decrypting the message. Likewise, when the server wants to send a message to the client, the server encrypts the message using the client's public key, so that only the client could decrypt the message with the client's private key. 

Great! Well... not really... This works in theory, but there are two major problems.

First, how do two computers initially exchange public keys? The request for the server's public key is necessarily insecure (since the client doesn't yet know the servers public key). A malicious computer could intercept this message, and pretend to be the server by replying to the client with the malicious computer's public key, and forward the request to the server with the malicious public key. The the malicious computer would act like a middle agent between the client and the server, decrypting every message with its "malicious private key", reading the messages, and then re-encrypting with its "malicious public key", forwarding the message to the destination without either client or server ever knowing. This is called a "man in the middle" attack.

The second thing wrong with this, is that most of the encryption algorithms are insanely strong. It would take a gazillion years to figure out what a computer's private key is, knowing only the ciphertext. Why is this a problem? The reason it's so strong is that it's computationally expensive, and therefore, not feasible to encrypt every single message exchanged between client and server, as this would greatly reduce performance.


Certificate Authorities


The solution to the man-in-the-middle problem mention above, is to use trusted organizations called "certificate authorities" to verify that the public key you receive back really does belong to who you think it does. In our man-in-the-middle example above, once our client receives the server's public key back, it verifies it with a certificate authority. If our message was intercepted, and we were given back a phony public key (by a malicious agent), the certificate authority would tell us. If you've ever seen those "site is not secure" or "Your connection is not private" messages, that is because the site's public key was not recognized by a certificate authority as belonging to the computer you are connecting to. This could mean that someone is trying to hack you. But often, this is simply because the site owners didn't wanted to pay for a certificate authority to verify who they are, but they still wanted to have secure connections to their site. Or it could mean that they "self-signed" their certificate as well. Both of these sort of defeat the purpose of using SSL, but they are perhaps better than no security at all.

Browsers come built-in with a list of certificate authorities and their public keys, so that we can be absolutely certain that our communication with certificate authorities is secure. For example, on Chrome (version 39), to view the recognized certificate authorities, I can select:

        1. => "Settings"
        2. => "Show advanced settings"
        3. => "HTTPS/SSL"
        4. => "Manage Certificates"
        5. => "Authorities"


Session Keys


The solution to the second problem (that encrypting and decrypting is CPU intensive) is to not use the public keys to encrypt every message. The public keys are just used to set up a "handshake", and to verify the identities of the computers communicating.

Basically, once the server and client trust each other, they generate a "session key", which is a number to be used with a different, less expensive encryption algorithm to both encrypt and decrypt messages. The process of agreeing on this session key is done through public key encryption, but once agreed upon, this less expensive encryption technique is used instead.

As the name suggest, the session key only lasts for a short time. So even though the session key method uses a weaker form of encryption, and is therefore easier to crack, it is valid for only a short time. By the time anyone would be able to crack it, the session key would have expired.


SSL


Now we can finally get some understanding of how SSL would work. Let us suppose that a web browser, call it simply "myBrowser", wants to connect to a web server, call it "server.com", using HTTPS. Something like this would happen:

1. myBrowser examines the url of the request. Since the request is using HTTPS, myBrowser will issue a request (usually to port 443, since port 80 is conventionally for plain old HTTP), for the server.com's public key (and some other SSL stuff). This message is unencrypted.

2. server.com will receive the request, and respond with its public key (certificate, and some other SSL information). This message is unencrypted. 

3. myBrowser will receive this public key, and verify it with a certificate authority. This message is encrypted with the certificate authorities' public key. If myBrowser cannot verify the public key from server.com, then myBrowser warns the user.

4. If server.com's public key is verified by a certificate authority, then myBrowser will generate a key-pair, and use one key as its private key. myBrowser will also generate a big random number.

5. myBrowser sends an "OK, let's use SSL!" message to server.com. In this message, it sends myBrowser's public key, and the big random number it just generated. This message will be encrypted using server.com's public key.

6. server.com will decrypt the message using its private key. It uses the big random number sent by my browser, does a bunch of math on it, and creates another big random number, called a "master secret". It then creates a "session key" from this master secret, that it will use to encrypt and decrypt all messages exchanged with myBrowser.

7. server.com sends this big random number back to the myBrowser. This message will be encrypted using myBrowser's public key.

8. myBrowser will decrypt the message using its private key. myBrowser performs the same math on the master secret that server.com did, to generate the same session key that server.com did.

9. Now, myBrowser and server.com send messages back and forth both encrypting and decrypting their messages with the less expensive "session key encryption algorithms", instead of public keys and private keys.


Conclusion


It was a lot of fun learning the basics of how SSL works! As a bonus, along the way I picked up a bunch of cool and fancy sounding terms like "symmetric session key" to help me sound smart :P

What was really clever, was how two computers can establish a secure connection in the first place, and that if it is done properly, then there is no point at which someone can intercept a message and do any harm. All of the data that a malicious computer can intercept and actually understand is public anyway, and all the stuff that is secret or sensitive cannot be cracked. Cool, huh?