tag:blogger.com,1999:blog-11047416185372514402024-03-13T14:59:10.574-07:00 Adam's BlogAdam Sharpehttp://www.blogger.com/profile/04248709186954753893noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-1104741618537251440.post-89288008947533061982015-01-20T16:46:00.000-08:002015-01-20T16:56:55.072-08:00Introduction to SSL and Public Key CryptographyThis week for work, I was asked by my boss to set up SSL for a server that hosts an older version of an internal Seneca web application built around BigBlueButton. I didn't understand very much about how SSL worked, so I decided to look into it a little bit to better understand what I was actually doing. I knew it had something to do with security, and I remember all that fuss about the heartbleed bug last year, but beyond that I knew nothing.<br />
<div>
<br />
SSL stands for "Secure Sockets Layer". Newer versions of SSL are called "TLS", for "Transport Layer Security", but people still often refer to it as "SSL", so I will too.<br />
<br />
SSL is a protocol through which two computers can establish a secure connection to send information back and forth. Once such a connection is established, they can be absolutely certain that if anyone is listening in on that connection, they won't be able to understand anything. All of the messages will look like gibberish. So, I guess you could think of it like a protocol that lets two computers invent a language that only they can understand :)<br />
<br />
But how does it do that?<br />
<br />
<br /></div>
<div>
<h3>
Background</h3>
</div>
<div>
<br /></div>
<div>
Before we can talk about SSL, we need a bit of background information. We can imagine that there are three algorithms which we can call the "encryption algorithm", the "decryption algorithm", and the "key-pair generation algorithm". We don't need to know how they are implemented, as each algorithm set is different. Also, they all use a bunch of crazy math that most of us don't understand. What matters is how these three algorithm are related to each other, which we will talk about next.</div>
<div>
<br /></div>
<div>
The key-pair generation algorithm uses a randomly generated number to output two really big numbers, which we call "key1", and "key2". These two keys are magic, because when used with the other two algorithms (the encryption and decryption algorithms), either key can be used to decrypt a message that was encrypted with the other key. If we were to think of these algorithms as functions, they would be prototyped something like this:</div>
<div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #d0d0d0;">/* Takes as input a random number, and outputs a pair of (really big) numbers. */</span>
<span style="color: #d0d0d0;"> keyPair keyGen(randomNumber);</span>
<span style="color: #d0d0d0;">/* Takes as input, a plain-text message, and a key (generated from the key-pair</span>
<span style="color: #d0d0d0;"> * algorithm) and outputs an encrypted string (called "ciphertext"). */</span>
<span style="color: #d0d0d0;"> cipherText encrypt(message, key);</span>
<span style="color: #d0d0d0;">/* Takes as input, ciphertext, encrypted by the encryption algorithm above, and</span>
<span style="color: #d0d0d0;"> * outputs a string. This string will either be more gibberish, or the original message</span>
<span style="color: #d0d0d0;"> * used in the encryption algorithm, depending on whether the right key was used. */</span>
<span style="color: #d0d0d0;"> string decrypt(cipherText, key);</span>
</pre>
</td></tr>
</tbody></table>
</div>
</div>
<div>
<br />
The idea is that the encryption algorithm takes a message that we want to encrypt, together with one of the numbers from our key generation algorithm. The decryption algorithm will only be able to recover the original message, if the key that it uses is the OTHER key that the key generation algorithm created.</div>
<div>
<br /></div>
<div>
Put another way, we can say that key1 can encrypt a message, but it cannot be used to decrypt the ciphertext that is generated, nor can any other number except one: key2. Only key2 can decrypt the message encrypted with key1. The opposite holds as well: key2 can be used to encrypt messages, but only key1 will be able to decrypt the resulting ciphertext. When two keys stand in such a situation as this (either key can decrypt messages encrypted only with the other), it is called "asymmetric cryptography". </div>
<div>
<br />
<br /></div>
<div>
<h3>
Public Key Cryptography</h3>
</div>
<div>
<br /></div>
<div>
Whenever two computers want to communicate securely, they typically use a secure protocol (think of HTTP vs HTTPS). But how would this work? We can apply what we learned above to see how! </div>
<div>
<br /></div>
<div>
Let us designate one of those computers as a client, and one as a server, though the same would apply were it a peer-to-peer arrangement. Both client and server generate a set of keys using the keyGen algorithm above. (The server would usually only generate the key-pair once, where as the client would regenerate a new key-pair for every server it connects to, and for every session. We'll see why in a second, when we talk about certificate authorities.) Each computer designates one of those keys as its "public key", and the other as its "private key". The private key is stored on the local computer, and should not be given out to anyone under any circumstances. The public key on the other hand, can be given to anyone who asks for it; there is no danger in doing so.</div>
<div>
<br /></div>
<div>
Continuing with the example, the two computers would exchange public keys so that now, secure communication can take place. Whenever the client wants to send a secure message to the server, it would encrypt the message with the server's public key. Then, since only the server knows its own private key, it would be the only computer that would be capable of decrypting the message. Likewise, when the server wants to send a message to the client, the server encrypts the message using the client's public key, so that only the client could decrypt the message with the client's private key. </div>
<div>
<br /></div>
<div>
Great! Well... not really... This works in theory, but there are two major problems.</div>
<div>
<br /></div>
<div>
First, how do two computers initially exchange public keys? The request for the server's public key is necessarily insecure (since the client doesn't yet know the servers public key). A malicious computer could intercept this message, and pretend to be the server by replying to the client with the malicious computer's public key, and forward the request to the server with the malicious public key. The the malicious computer would act like a middle agent between the client and the server, decrypting every message with its "malicious private key", reading the messages, and then re-encrypting with its "malicious public key", forwarding the message to the destination without either client or server ever knowing. This is called a "man in the middle" attack.</div>
<div>
<br /></div>
<div>
The second thing wrong with this, is that most of the encryption algorithms are insanely strong. It would take a gazillion years to figure out what a computer's private key is, knowing only the ciphertext. Why is this a problem? The reason it's so strong is that it's computationally expensive, and therefore, not feasible to encrypt every single message exchanged between client and server, as this would greatly reduce performance.</div>
<div>
<br />
<br /></div>
<div>
<h3>
Certificate Authorities</h3>
</div>
<div>
<br /></div>
<div>
The solution to the man-in-the-middle problem mention above, is to use trusted organizations called "certificate authorities" to verify that the public key you receive back really does belong to who you think it does. In our man-in-the-middle example above, once our client receives the server's public key back, it verifies it with a certificate authority. If our message was intercepted, and we were given back a phony public key (by a malicious agent), the certificate authority would tell us. If you've ever seen those "site is not secure" or "Your connection is not private" messages, that is because the site's public key was not recognized by a certificate authority as belonging to the computer you are connecting to. This could mean that someone is trying to hack you. But often, this is simply because the site owners didn't wanted to pay for a certificate authority to verify who they are, but they still wanted to have secure connections to their site. Or it could mean that they "self-signed" their certificate as well. Both of these sort of defeat the purpose of using SSL, but they are perhaps better than no security at all.</div>
<div>
<br /></div>
<div>
Browsers come built-in with a list of certificate authorities and their public keys, so that we can be absolutely certain that our communication with certificate authorities is secure. For example, on Chrome (version 39), to view the recognized certificate authorities, I can select:</div>
<div>
<br /></div>
<div>
1. => "Settings"<br />
2. => "Show advanced settings"<br />
3. => "HTTPS/SSL"<br />
4. => "Manage Certificates"<br />
5. => "Authorities"<br />
<br />
<br /></div>
<div>
<h3>
Session Keys</h3>
<br />
The solution to the second problem (that encrypting and decrypting is CPU intensive) is to not use the public keys to encrypt every message. The public keys are just used to set up a "handshake", and to verify the identities of the computers communicating.<br />
<br />
Basically, once the server and client trust each other, they generate a "session key", which is a number to be used with a different, less expensive encryption algorithm to both encrypt and decrypt messages. The process of agreeing on this session key is done through public key encryption, but once agreed upon, this less expensive encryption technique is used instead.</div>
<div>
<br /></div>
<div>
As the name suggest, the session key only lasts for a short time. So even though the session key method uses a weaker form of encryption, and is therefore easier to crack, it is valid for only a short time. By the time anyone would be able to crack it, the session key would have expired.</div>
<div>
<br />
<br /></div>
<div>
<h3>
SSL</h3>
</div>
<div>
<br /></div>
<div>
Now we can finally get some understanding of how SSL would work. Let us suppose that a web browser, call it simply "myBrowser", wants to connect to a web server, call it "server.com", using HTTPS. Something like this would happen:</div>
<div>
<br /></div>
<div>
1. myBrowser examines the url of the request. Since the request is using HTTPS, myBrowser will issue a request (usually to port 443, since port 80 is conventionally for plain old HTTP), for the server.com's public key (and some other SSL stuff). This message is unencrypted.</div>
<div>
<br /></div>
<div>
2. server.com will receive the request, and respond with its public key (certificate, and some other SSL information). This message is unencrypted. </div>
<div>
<br /></div>
<div>
3. myBrowser will receive this public key, and verify it with a certificate authority. This message is encrypted with the certificate authorities' public key. If myBrowser cannot verify the public key from server.com, then myBrowser warns the user.</div>
<div>
<br /></div>
<div>
4. If server.com's public key is verified by a certificate authority, then myBrowser will generate a key-pair, and use one key as its private key. myBrowser will also generate a big random number.</div>
<div>
<br /></div>
<div>
5. myBrowser sends an "OK, let's use SSL!" message to server.com. In this message, it sends myBrowser's public key, and the big random number it just generated. This message will be encrypted using server.com's public key.</div>
<div>
<br /></div>
<div>
6. server.com will decrypt the message using its private key. It uses the big random number sent by my browser, does a bunch of math on it, and creates another big random number, called a "master secret". It then creates a "session key" from this master secret, that it will use to encrypt and decrypt all messages exchanged with myBrowser.</div>
<div>
<br /></div>
<div>
7. server.com sends this big random number back to the myBrowser. This message will be encrypted using myBrowser's public key.</div>
<div>
<br /></div>
<div>
8. myBrowser will decrypt the message using its private key. myBrowser performs the same math on the master secret that server.com did, to generate the same session key that server.com did.</div>
<div>
<br /></div>
<div>
9. Now, myBrowser and server.com send messages back and forth both encrypting and decrypting their messages with the less expensive "session key encryption algorithms", instead of public keys and private keys.</div>
<div>
<div>
<br />
<br class="Apple-interchange-newline" /></div>
<div>
<h3>
Conclusion</h3>
</div>
</div>
<div>
<br /></div>
<div>
It was a lot of fun learning the basics of how SSL works! As a bonus, along the way I picked up a bunch of cool and fancy sounding terms like "symmetric session key" to help me sound smart :P</div>
<div>
<br /></div>
<div>
What was really clever, was how two computers can establish a secure connection in the first place, and that if it is done properly, then there is no point at which someone can intercept a message and do any harm. All of the data that a malicious computer can intercept and actually understand is public anyway, and all the stuff that is secret or sensitive cannot be cracked. Cool, huh?</div>
<div>
<br /></div>
Adam Sharpehttp://www.blogger.com/profile/04248709186954753893noreply@blogger.com0tag:blogger.com,1999:blog-1104741618537251440.post-44891038683439516312014-09-22T23:07:00.000-07:002014-09-22T23:30:04.833-07:00Assembly Generated from Function Calls on x86-64Two weeks ago in SPO600 we were given a task: compile a hello world C program, look at the Assembly code that gets generated then modify the code in small ways and notice how the Assembly code changes.<br />
<br />
A second and separate task we were given was to learn about some feature of Assembly, teach it to the other students in the class in the form of a short presentation, and blog about what we discovered. I chose to investigate what happens when a function gets called in C, in x86-64 Assembly, and in particular what happens to the arguments passed into the function.<br />
<br />
These two tasks are two separate labs, but since they are similar (and I'm lazy :P), I will combine them into a single blog entry.<br />
<br />
WARNING/DISCLAIMER! What follows is a combination of personal research from reading materials I found on the web, trial and error with compiling and ojbdump-ing, and at times, wild speculation based on what I'm observing. Don't trust anything I say as authoritative!<br />
<br />
A function that only references local variables and arguments is a standalone entity. At compile-time, it has no knowledge of where the arguments came from, or what values they should have. Therefore, when a function begins execution, it must look elsewhere to obtain the value of its arguments, as they are not defined within the function itself. Functions need to make assumptions about where to look for arguments, where to look for return values, and what stuff remains the same after a different function gets called and then returns. For a given computer system, such a set of rules governing the placement of arguments, return values, and other expected behavior, is called the "calling conventions" for that specific system.<br />
<br />
A stack frame (I've also seen this called an "activation record") of a function at some moment in time, is the region of memory where the function stores local variables, its arguments, and information needed to restore the state of the caller upon returning. I gave the definition with respect to "some moment in time", because according to my understanding, it is possible for a stack frame to grow and shrink throughout the duration of the function's execution.<br />
<br />
The way I, and most other stuff I read, visualize memory made the following assumptions about orientation. This is important to clarify, so that when I write about one location being 'above' or 'below' another, or about the direction of memory growth, your mental picture is the same thing as mine is. Throughout the rest of this post I will assume that:<br />
<br />
1. Higher memory addresses are visualized as being above lower memory addresses.<br />
<br />
2. The stack frame is a stack data structure, that grows downward, towards lower memory addresses.<br />
<br />
This implies that if I have two local variables, X and Y, and Y was declared after X, then Y will have an address that is less than X.<br />
<br />
I read some basic tutorials, and watched some videos about what happens when a function gets called. Typically, most explanations I saw told a story about what 'would' happen in an 'ideal case', but the tedious details of what actually happens is very specific to an instruction set architecture and an operating system. The 'ideal case' scenario would go something like this:<br />
<br />
A function begins execution. There is a register that holds what is called the stack pointer (SP) and base pointer (BP) of the functions activations record. Above the base pointer is the address that would have been held by the program counter (PC), had the function not been called. This is where the function will return to upon returning, by loading this value back into the PC. Assuming that the size of pointer types is 8, then just above these 8 bytes, should be the arguments of the function. How much space each argument takes, is inferred from its type. So, for example, if my function is passes an int, int, and a long double, in that order, and assuming that the size of int and long double is 4 and 10 respectively, then each of these arguments can be addressed by BP + 8, BP + 12, and BP + 16 respectively. Remember! We are adding 8 to account for the saved PC of the function that called us! How did those values get there? It was the responsibility of the function that called our function to put them there, as well as set the SP to point to the right place. So, suppose we wanted to call another function, it would be OUR responsibility to decrement the SP by enough to store the values of the arguments to the function, and put the right values in there. When ever a local variable is declared, the stack pointer moves down as many bytes as is needed to make room for that local variable. So, with the numbers we've been using so far, that would be 4 bytes for an int declaration, 8 for a pointer declaration, and 10 for a double declaration.<br />
<br />
But Alas! Things are not really this simple on x86-64 and most 'real' architectures, mostly because we can optimize on this behavior, and because of alignment issues.<br />
<br />
1. First, we must manually store the value of the callers BP, by pushing it onto the stack. Then we must set our BP to be equal to our SP which was decremented by the caller.<br />
<br />
2. Compilers are smart enough not to have to move the SP for every declaration and function call, but can move it by the right size just once at the beginning of the function. So, if I declare four ints, and then call a function that accepts two ints, then (ignoring alignment) the stack pointer would be decremented by 24 byes at the very beginning.<br />
<br />
3. The stack pointer must always point to an address that is a multiple of 16. Also, the compiler may allocate more memory than you would expect to improve efficiency by aligning some variables in such a way as to waste memory but improve speed.<br />
<br />
4. Perhaps most relevant to the code below, the caller will, whenever possible, pass the values of the arguments to a called function using registers directly as opposed to pushing them onto the memory stack. The called function can infer whether to look in the registers or above the base pointer for a particular argument from its type.<br />
<br />
So let's start compiling functions to see what actually happens! :D First, let's compile 6 simple functions which are exactly the same except except for the types of the arguments and return value:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #447fcf;">i</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #447fcf;">c</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #447fcf;">ll</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #447fcf;">f</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #447fcf;">d</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #447fcf;">ld</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">f)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">f);</span>
<span style="color: #d0d0d0;">}</span></pre>
</td></tr>
</tbody></table>
</div>
<br />
Let's take a look at the Assembly output. I turned on some basic optimization ("-O1" flag) because it makes the calling convention more readily transparent. For example, I noticed that without optimization, the compiler would 'always' store the arguments from their registers onto its stack frame, even if it was not necessary. The Assembly output:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #d0d0d0;">ex1.o:</span> file format <span style="color: #ed9d13;">elf64-x86-64</span>
Disassembly of section <span style="color: #d0d0d0;">.text:</span>
<span style="color: #3677a9;">0000000000000000</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">i</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 0: 01 f7 add %esi,%edi</span>
<span style="color: #d0d0d0;"> 2: 01 fa add %edi,%edx</span>
<span style="color: #d0d0d0;"> 4: 41 01 c8 add %ecx,%r8d</span>
<span style="color: #d0d0d0;"> 7: 45 01 c1 add %r8d,%r9d</span>
<span style="color: #d0d0d0;"> a: 89 d0 mov %edx,%eax</span>
<span style="color: #d0d0d0;"> c: 41 0f af c1 imul %r9d,%eax</span>
<span style="color: #d0d0d0;"> 10: c3 retq</span>
<span style="color: #3677a9;">0000000000000011</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">c</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 11: 01 f7 add %esi,%edi</span>
<span style="color: #d0d0d0;"> 13: 01 fa add %edi,%edx</span>
<span style="color: #d0d0d0;"> 15: 41 01 c9 add %ecx,%r9d</span>
<span style="color: #d0d0d0;"> 18: 45 01 c8 add %r9d,%r8d</span>
<span style="color: #d0d0d0;"> 1b: 44 89 c0 mov %r8d,%eax</span>
<span style="color: #d0d0d0;"> 1e: 0f af c2 imul %edx,%eax</span>
<span style="color: #d0d0d0;"> 21: c3 retq</span>
<span style="color: #3677a9;">0000000000000022</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">ll</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 22: 48 01 f7 add %rsi,%rdi</span>
<span style="color: #d0d0d0;"> 25: 48 01 fa add %rdi,%rdx</span>
<span style="color: #d0d0d0;"> 28: 49 01 c8 add %rcx,%r8</span>
<span style="color: #d0d0d0;"> 2b: 4d 01 c1 add %r8,%r9</span>
<span style="color: #d0d0d0;"> 2e: 48 89 d0 mov %rdx,%rax</span>
<span style="color: #d0d0d0;"> 31: 49 0f af c1 imul %r9,%rax</span>
<span style="color: #d0d0d0;"> 35: c3 retq</span>
<span style="color: #3677a9;">0000000000000036</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">f</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 36: f3 0f 58 c8 addss %xmm0,%xmm1</span>
<span style="color: #d0d0d0;"> 3a: f3 0f 58 d1 addss %xmm1,%xmm2</span>
<span style="color: #d0d0d0;"> 3e: f3 0f 58 e3 addss %xmm3,%xmm4</span>
<span style="color: #d0d0d0;"> 42: f3 0f 58 ec addss %xmm4,%xmm5</span>
<span style="color: #d0d0d0;"> 46: f3 0f 59 d5 mulss %xmm5,%xmm2</span>
<span style="color: #d0d0d0;"> 4a: 0f 28 c2 movaps %xmm2,%xmm0</span>
<span style="color: #d0d0d0;"> 4d: c3 retq</span>
<span style="color: #3677a9;">000000000000004e</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">d</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 4e: f2 0f 58 c8 addsd %xmm0,%xmm1</span>
<span style="color: #d0d0d0;"> 52: f2 0f 58 d1 addsd %xmm1,%xmm2</span>
<span style="color: #d0d0d0;"> 56: f2 0f 58 e3 addsd %xmm3,%xmm4</span>
<span style="color: #d0d0d0;"> 5a: f2 0f 58 ec addsd %xmm4,%xmm5</span>
<span style="color: #d0d0d0;"> 5e: f2 0f 59 d5 mulsd %xmm5,%xmm2</span>
<span style="color: #d0d0d0;"> 62: 66 0f 28 c2 movapd %xmm2,%xmm0</span>
<span style="color: #d0d0d0;"> 66: c3 retq</span>
<span style="color: #3677a9;">0000000000000067</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">ld</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 67: db 6c 24 18 fldt 0x18(%rsp)</span>
<span style="color: #d0d0d0;"> 6b: db 6c 24 08 fldt 0x8(%rsp)</span>
<span style="color: #d0d0d0;"> 6f: de c1 faddp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 71: db 6c 24 28 fldt 0x28(%rsp)</span>
<span style="color: #d0d0d0;"> 75: de c1 faddp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 77: db 6c 24 48 fldt 0x48(%rsp)</span>
<span style="color: #d0d0d0;"> 7b: db 6c 24 38 fldt 0x38(%rsp)</span>
<span style="color: #d0d0d0;"> 7f: de c1 faddp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 81: db 6c 24 58 fldt 0x58(%rsp)</span>
<span style="color: #d0d0d0;"> 85: de c1 faddp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 87: de c9 fmulp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 89: c3 retq</span>
<span style="color: #d0d0d0;"> 64,35-39 Bot</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
The important thing to notice is that the integer and floating point arguments are put into particular registers consistently. The order is always the same. For integer types, it's %rdi, %rsi, %rdx, %rcx, %r8, %r9. For floats and doubles, it's %xmm0, %xmm1, ... %xmm7. The return value is always stored in the 'A' register for integer types, and the %xmm0 register for floats and doubles,<br />
<br />
However, for long doubles, I am a little bit confused by what I am seeing (maybe someone who understands better can chime in?). After reading the calling convention portion of the System V ABI for x86-64, I assumed that long double arguments should be pushed onto the FPU stack, if they can fit into those registers. On my system they can: sizeof(long double) == 10, CHAR_BIT == 8, and the FPU stack registers are 80 bits wide. Instead, what I am seeing is the long double being put 16 bytes above the base pointer. (The 16 points is where the saved program counter, and caller's base pointer are stored). Perhaps long doubles must padded to be 16 bytes? But then why is the return value pushed onto the %st register (top of the FPU stack)? Weird...<br />
<br />
In any case, there were four interesting cases that came to mind:<br />
<br />
1. There are arguments of different types in different combinations.<br />
<br />
2: There are lots of arguments. Specifically, when there are more arguments than there are registers of the appropriate type to store them.<br />
<br />
3. The size of the type of some of the arguments or the return value is too wide to fit into registers (a structure type with many fields, for example).<br />
<br />
4. When the function accepts a variable number of arguments.<br />
<br />
I will write about the fourth case, functions of a variable number of arguments, in a separate blog entry.<br />
<br />
Let's start with the case when there are arguments of different types:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;">1
2
3
4</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #447fcf;">diff_arg_types</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">i,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">ll,</span> <span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">f,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">ld,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">x,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">y,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">z)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(i</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">ll</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">x</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">y</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">x)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(f</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">ld);</span>
<span style="color: #d0d0d0;">}</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
This function produces the following assembly (this time, with no optimizations, since I want to be very explicit about which registers correspond to which arguments):<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #3677a9;">0000000000000000</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">diff_arg_types</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 0: 55 push %rbp</span>
<span style="color: #d0d0d0;"> 1: 48 89 e5 mov %rsp,%rbp</span>
<span style="color: #d0d0d0;"> 4: 89 7d fc mov %edi,-0x4(%rbp)</span>
<span style="color: #d0d0d0;"> 7: 89 f0 mov %esi,%eax</span>
<span style="color: #d0d0d0;"> 9: 48 89 55 f0 mov %rdx,-0x10(%rbp)</span>
<span style="color: #d0d0d0;"> d: f3 0f 11 45 ec movss %xmm0,-0x14(%rbp)</span>
<span style="color: #d0d0d0;"> 12: f2 0f 11 4d e0 movsd %xmm1,-0x20(%rbp)</span>
<span style="color: #d0d0d0;"> 17: 89 4d e8 mov %ecx,-0x18(%rbp)</span>
<span style="color: #d0d0d0;"> 1a: 44 89 45 dc mov %r8d,-0x24(%rbp)</span>
<span style="color: #d0d0d0;"> 1e: 44 89 4d d8 mov %r9d,-0x28(%rbp)</span>
<span style="color: #d0d0d0;"> 22: 88 45 f8 mov %al,-0x8(%rbp)</span>
<span style="color: #d0d0d0;"> 25: 0f be 55 f8 movsbl -0x8(%rbp),%edx</span>
<span style="color: #d0d0d0;"> 29: 8b 45 fc mov -0x4(%rbp),%eax</span>
<span style="color: #d0d0d0;"> 2c: 01 d0 add %edx,%eax</span>
<span style="color: #d0d0d0;"> 2e: 48 63 d0 movslq %eax,%rdx</span>
<span style="color: #d0d0d0;"> 31: 48 8b 45 f0 mov -0x10(%rbp),%rax</span>
<span style="color: #d0d0d0;"> 35: 48 01 c2 add %rax,%rdx</span>
<span style="color: #d0d0d0;"> 38: 8b 45 e8 mov -0x18(%rbp),%eax</span>
<span style="color: #d0d0d0;"> 3b: 48 98 cltq</span>
<span style="color: #d0d0d0;"> 3d: 48 01 c2 add %rax,%rdx</span>
<span style="color: #d0d0d0;"> 40: 8b 45 dc mov -0x24(%rbp),%eax</span>
<span style="color: #d0d0d0;"> 43: 48 98 cltq</span>
<span style="color: #d0d0d0;"> 45: 48 01 c2 add %rax,%rdx</span>
<span style="color: #d0d0d0;"> 48: 8b 45 e8 mov -0x18(%rbp),%eax</span>
<span style="color: #d0d0d0;"> 4b: 48 98 cltq</span>
<span style="color: #d0d0d0;"> 4d: 48 01 d0 add %rdx,%rax</span>
<span style="color: #d0d0d0;"> 50: 48 89 45 c8 mov %rax,-0x38(%rbp)</span>
<span style="color: #d0d0d0;"> 54: df 6d c8 fildll -0x38(%rbp)</span>
<span style="color: #d0d0d0;"> 57: f3 0f 10 45 ec movss -0x14(%rbp),%xmm0</span>
<span style="color: #d0d0d0;"> 5c: 0f 5a c0 cvtps2pd %xmm0,%xmm0</span>
<span style="color: #d0d0d0;"> 5f: f2 0f 58 45 e0 addsd -0x20(%rbp),%xmm0</span>
<span style="color: #d0d0d0;"> 64: f2 0f 11 45 c0 movsd %xmm0,-0x40(%rbp)</span>
<span style="color: #d0d0d0;"> 69: dd 45 c0 fldl -0x40(%rbp)</span>
<span style="color: #d0d0d0;"> 6c: db 6d 10 fldt 0x10(%rbp)</span>
<span style="color: #d0d0d0;"> 6f: de c1 faddp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 71: de c9 fmulp %st,%st(1)</span>
<span style="color: #d0d0d0;"> 73: d9 5d d4 fstps -0x2c(%rbp)</span>
<span style="color: #d0d0d0;"> 76: f3 0f 10 45 d4 movss -0x2c(%rbp),%xmm0</span>
<span style="color: #d0d0d0;"> 7b: f3 0f 11 45 c0 movss %xmm0,-0x40(%rbp)</span>
<span style="color: #d0d0d0;"> 80: 8b 45 c0 mov -0x40(%rbp),%eax</span>
<span style="color: #d0d0d0;"> 83: 89 45 c0 mov %eax,-0x40(%rbp)</span>
<span style="color: #d0d0d0;"> 86: f3 0f 10 45 c0 movss -0x40(%rbp),%xmm0</span>
<span style="color: #d0d0d0;"> 8b: 5d pop %rbp</span>
<span style="color: #d0d0d0;"> 8c: c3 retq</span>
<span style="color: #d0d0d0;"> 51,1 Bot</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
There's a lot of stuff here, but we don't care about most of it at the moment. First look at lines 4 through twelve. It looks like the compiler is just using the next avaialble register for that type! For examples, it uses the integer registers until it hits a float and a double. So it puts those arguments in %xmm0 and %xmm1, and continues to put the final three int arguments into registers %rcx, %r8, and %r9. And the long double gets put 16 bytes above the base pointer, since on line 35 we see that locations value being pushed onto the FPU stack.<br />
<br />
Now! Let's see what will happen if we pass in more arguments than there are registers to store those arguments. My C code:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #447fcf;">i</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">f,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">g,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">h,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">i,</span> <span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">j)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(f</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">g</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">h</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">i</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">j);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #447fcf;">c</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">f,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">g,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">h,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">i,</span> <span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">j)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(f</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">g</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">h</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">i</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">j);</span>
<span style="color: #d0d0d0;">}</span>
<span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #447fcf;">d</span><span style="color: #d0d0d0;">(</span><span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">a,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">b,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">c,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">d,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">e,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">f,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">g,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">h,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">i,</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">j)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">(a</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">c</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">e)</span> <span style="color: #d0d0d0;">*</span> <span style="color: #d0d0d0;">(f</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">g</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">h</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">i</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">j);</span>
<span style="color: #d0d0d0;">}</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
The Assembly (this time with optimizations turned on again):<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #3677a9;">0000000000000000</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">i</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 0: 01 f7 add %esi,%edi</span>
<span style="color: #d0d0d0;"> 2: 01 fa add %edi,%edx</span>
<span style="color: #d0d0d0;"> 4: 01 d1 add %edx,%ecx</span>
<span style="color: #d0d0d0;"> 6: 41 01 c8 add %ecx,%r8d</span>
<span style="color: #d0d0d0;"> 9: 44 03 4c 24 08 add 0x8(%rsp),%r9d</span>
<span style="color: #d0d0d0;"> e: 44 89 c8 mov %r9d,%eax</span>
<span style="color: #d0d0d0;"> 11: 03 44 24 10 add 0x10(%rsp),%eax</span>
<span style="color: #d0d0d0;"> 15: 03 44 24 18 add 0x18(%rsp),%eax</span>
<span style="color: #d0d0d0;"> 19: 03 44 24 20 add 0x20(%rsp),%eax</span>
<span style="color: #d0d0d0;"> 1d: 41 0f af c0 imul %r8d,%eax</span>
<span style="color: #d0d0d0;"> 21: c3 retq</span>
<span style="color: #3677a9;">0000000000000022</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">c</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 22: 41 01 f8 add %edi,%r8d</span>
<span style="color: #d0d0d0;"> 25: 44 01 c6 add %r8d,%esi</span>
<span style="color: #d0d0d0;"> 28: 01 f2 add %esi,%edx</span>
<span style="color: #d0d0d0;"> 2a: 01 d1 add %edx,%ecx</span>
<span style="color: #d0d0d0;"> 2c: 44 02 4c 24 20 add 0x20(%rsp),%r9b</span>
<span style="color: #d0d0d0;"> 31: 44 89 c8 mov %r9d,%eax</span>
<span style="color: #d0d0d0;"> 34: 02 44 24 08 add 0x8(%rsp),%al</span>
<span style="color: #d0d0d0;"> 38: 02 44 24 10 add 0x10(%rsp),%al</span>
<span style="color: #d0d0d0;"> 3c: 02 44 24 18 add 0x18(%rsp),%al</span>
<span style="color: #d0d0d0;"> 40: 0f af c1 imul %ecx,%eax</span>
<span style="color: #d0d0d0;"> 43: c3 retq</span>
<span style="color: #3677a9;">0000000000000044</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">d</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 44: f2 0f 58 c8 addsd %xmm0,%xmm1</span>
<span style="color: #d0d0d0;"> 48: f2 0f 58 d1 addsd %xmm1,%xmm2</span>
<span style="color: #d0d0d0;"> 4c: f2 0f 58 da addsd %xmm2,%xmm3</span>
<span style="color: #d0d0d0;"> 50: f2 0f 58 e3 addsd %xmm3,%xmm4</span>
<span style="color: #d0d0d0;"> 54: f2 0f 58 f5 addsd %xmm5,%xmm6</span>
<span style="color: #d0d0d0;"> 58: f2 0f 58 fe addsd %xmm6,%xmm7</span>
<span style="color: #d0d0d0;"> 5c: f2 0f 58 7c 24 08 addsd 0x8(%rsp),%xmm7</span>
<span style="color: #d0d0d0;"> 62: 66 0f 28 ef movapd %xmm7,%xmm5</span>
<span style="color: #d0d0d0;"> 66: f2 0f 58 6c 24 10 addsd 0x10(%rsp),%xmm5</span>
<span style="color: #d0d0d0;"> 6c: f2 0f 59 e5 mulsd %xmm5,%xmm4</span>
<span style="color: #d0d0d0;"> 70: 66 0f 28 c4 movapd %xmm4,%xmm0</span>
<span style="color: #d0d0d0;"> 74: c3 retq</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
Here we can see that the compiler uses as many registers as it can, and when it runs out, it starts to place the arguments starting above the base pointer of the callee function. Also note that all the arguments smaller than 8 bytes get aligned to exactly 8 bytes. So in function 'c' for example, where all the arguments are characters, the seventh, eighth, ninth, and tenth argument gets stored at 0x8, 0x10, 0x 18, and 0x20 above the stack pointer, respectively. These are eight byte chunks. (Note: With optimization turned on, the function is not pushing %rbp onto the stack and assigning a new value to it, so it reaches 8 bytes above the STACK POINTER, and NOT 16 bytes above the BASE POINTER as in example 2. I apologize for the confusion).<br />
<br />
Similarly, with the double arguments, the first eight are stored in %xmm0 - %xmm7, and the last two are stored at %rsp + 0x8 and %rsp + 0x10.<br />
<br />
Now, the last case of interest is when we pass to or return from the function, values that are too wide for registers:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #6ab825; font-weight: bold;">typedef</span> <span style="color: #6ab825; font-weight: bold;">struct</span> <span style="color: #d0d0d0;">{</span>
<span style="color: #6ab825; font-weight: bold;">char</span> <span style="color: #d0d0d0;">c;</span>
<span style="color: #6ab825; font-weight: bold;">int</span> <span style="color: #d0d0d0;">i;</span>
<span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #d0d0d0;">ll;</span>
<span style="color: #6ab825; font-weight: bold;">float</span> <span style="color: #d0d0d0;">f;</span>
<span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">d;</span>
<span style="color: #6ab825; font-weight: bold;">long</span> <span style="color: #6ab825; font-weight: bold;">double</span> <span style="color: #d0d0d0;">ld;</span>
<span style="color: #d0d0d0;">}</span> <span style="color: #d0d0d0;">big_struct;</span>
<span style="color: #d0d0d0;">big_struct</span> <span style="color: #447fcf;">fun</span><span style="color: #d0d0d0;">(big_struct</span> <span style="color: #d0d0d0;">b1,</span> <span style="color: #d0d0d0;">big_struct</span> <span style="color: #d0d0d0;">b2)</span>
<span style="color: #d0d0d0;">{</span>
<span style="color: #d0d0d0;">big_struct</span> <span style="color: #d0d0d0;">b1b2</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">{.c</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b1.c</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.c,</span>
<span style="color: #d0d0d0;">.i</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b1.i</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.i,</span>
<span style="color: #d0d0d0;">.ll</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b1.ll</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.ll,</span>
<span style="color: #d0d0d0;">.f</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b1.f</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.f,</span>
<span style="color: #d0d0d0;">.d</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b1.d</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.d,</span>
<span style="color: #d0d0d0;">.ld</span> <span style="color: #d0d0d0;">=</span> <span style="color: #d0d0d0;">b2.ld</span> <span style="color: #d0d0d0;">+</span> <span style="color: #d0d0d0;">b2.ld</span> <span style="color: #d0d0d0;">};</span>
<span style="color: #6ab825; font-weight: bold;">return</span> <span style="color: #d0d0d0;">b1b2;</span>
<span style="color: #d0d0d0;">}</span></pre>
</td></tr>
</tbody></table>
</div>
<br />
And the Assembly:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #202020; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<table><tbody>
<tr><td><pre style="line-height: 125%; margin: 0;"> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21</pre>
</td><td><pre style="line-height: 125%; margin: 0;"><span style="color: #3677a9;">0000000000000000</span> <span style="color: #d0d0d0;"><</span><span style="color: #447fcf;">fun</span><span style="color: #d0d0d0;">>:</span>
<span style="color: #d0d0d0;"> 0: 48 89 f8 mov %rdi,%rax</span>
<span style="color: #d0d0d0;"> 3: 8b 4c 24 0c mov 0xc(%rsp),%ecx</span>
<span style="color: #d0d0d0;"> 7: 03 4c 24 3c add 0x3c(%rsp),%ecx</span>
<span style="color: #d0d0d0;"> b: 48 8b 54 24 10 mov 0x10(%rsp),%rdx</span>
<span style="color: #d0d0d0;"> 10: 48 03 54 24 40 add 0x40(%rsp),%rdx</span>
<span style="color: #d0d0d0;"> 15: f3 0f 10 4c 24 18 movss 0x18(%rsp),%xmm1</span>
<span style="color: #d0d0d0;"> 1b: f3 0f 58 4c 24 48 addss 0x48(%rsp),%xmm1</span>
<span style="color: #d0d0d0;"> 21: f2 0f 10 44 24 20 movsd 0x20(%rsp),%xmm0</span>
<span style="color: #d0d0d0;"> 27: f2 0f 58 44 24 50 addsd 0x50(%rsp),%xmm0</span>
<span style="color: #d0d0d0;"> 2d: db 6c 24 58 fldt 0x58(%rsp)</span>
<span style="color: #d0d0d0;"> 31: d8 c0 fadd %st(0),%st</span>
<span style="color: #d0d0d0;"> 33: 0f b6 74 24 38 movzbl 0x38(%rsp),%esi</span>
<span style="color: #d0d0d0;"> 38: 40 02 74 24 08 add 0x8(%rsp),%sil</span>
<span style="color: #d0d0d0;"> 3d: 40 88 37 mov %sil,(%rdi)</span>
<span style="color: #d0d0d0;"> 40: 89 4f 04 mov %ecx,0x4(%rdi)</span>
<span style="color: #d0d0d0;"> 43: 48 89 57 08 mov %rdx,0x8(%rdi)</span>
<span style="color: #d0d0d0;"> 47: f3 0f 11 4f 10 movss %xmm1,0x10(%rdi)</span>
<span style="color: #d0d0d0;"> 4c: f2 0f 11 47 18 movsd %xmm0,0x18(%rdi)</span>
<span style="color: #d0d0d0;"> 51: db 7f 20 fstpt 0x20(%rdi)</span>
<span style="color: #d0d0d0;"> 54: c3 retq</span>
</pre>
</td></tr>
</tbody></table>
</div>
<br />
From this we can infer that the two structs are laid out on top of each other, above the stack pointer. Each field from the struct is added to its corresponding field in the other struct, and stored in a register. For example: 0xc + %rsp is added to 0x3c + %rsp and stored in %ecx, 0x10 + %rsp is added to 0x40 + %rsp and stored in %rdx, and so on. What's interesting is how the struct is returned. The calling function is expected to put into the register %rdi, the base address of a memory location in which the caller is supposed to store the resturn value. Thus, from lines 16 through 20 we see the values in the registers where the results of our previous calculations were put, being stored at an offset from the address in %rdi.<br />
<br />
I felt I learnt a lot from investigating the X86-64 calling conventions on my machine. However, I now have more questions than when I started :) Many of which can probably be answered by a combination of further experimentation and reading documentation and standards, but alas, this is a topic for another blog post! The question at the front of my mind at the moment are:<br />
<br />
1. Why aren't arguments of type long double passed through the FPU register stack?<br />
<br />
2. What happens if the size of the struct, and the types of the fields are changed? Are structs ever passed in registers?<br />
<br />
3. Tricky alignment questions (really, I just want a set of explicit alignment rules).<br />
<br />
4. In the last example, I am having trouble understanding lines 14 and 15. I know from reading parts of the ABI standard that the address of where a struct is to be put is stored in %rdi. But here, it looks like %rdi is being manipulated in some way. Also, the first field of the first struct begins at 0xc bytes above above the stack pointer. But here it looks like the computer is grabbing data at 0x8 bytes above the stack pointer? But this leaves only 4 bytes of meaningful data between 0x8 and 0xc. What is this data and what the heck does it have to do with %rdi (the address of where to store the return value)?Adam Sharpehttp://www.blogger.com/profile/04248709186954753893noreply@blogger.com0