May 26th, 2009 | Published in Google Mac Blog
(Note: this is one of our occasional extra-geeky technical posts. If this isn't your thing, don't worry; our usual non-technical stuff will be back soon.)
If you're a programmer, there are many reasons why you might want to go exploring the inner workings of Mac OS X. You might want to learn how how Apple achieves interesting effects. Or perhaps you're just curious about how things work. (We're all adults here, so I won't lecture you about the dangers of using private or undocumented interfaces in your apps.)
In any case, though, you need to know how to read assembly, either PowerPC (if you have an older Mac) or x86 (if you have anything recent). While there are good resources available to learn about reading PowerPC assembly for exploration, there are fewer about x86. Despite the present and future of the Mac being x86, it seems like people have lots of anxiety about having to work with it.
I think the problem is not a lack of documentation on x86 assembly, but a surfeit of it. Most of it is Windows- or DOS-centric, usually with syntax that doesn't apply (Intel syntax vs the AT&T syntax that GCC uses), and with the aim of teaching how to write it. But reading x86 assembly really isn't that hard. If all you want to do is learn how to read the code generated by GCC, it's probably just as easy as PowerPC.
The other day I was investigating how window minimization and window titles work. While exploring, I took notes of my discoveries. Let's touch on two functions, in both PowerPC and x86 flavors.
Before we begin, I'm going to assume that you're comfortable with assembly in general (though not necessarily with any particular one). If you have the latest developer tools, launch Shark (in
/Developer/Applications/Performance Tools) and in the Help menu you can access various ISA references. In addition, Apple has ABI documentation for both the PowerPC and x86. I'm going to go over each function twice (once for PowerPC and once for x86); feel free to skim the PowerPC version if you're accustomed to it. And finally, this is only for the 32-bit version of each platform; things change even more with 64 bits.
The trail always begins with a public call that uses the SPI that you want to figure out. In this case, I chose
SetWindowTitleWithCFString because it has to somehow set the title of a window even if it's minimized. I went with Carbon because sometimes the dynamic nature of Objective-C with Cocoa makes tracing code harder.
<+0>: mflr r0 // save linkage
<+4>: stmw r30,-8(r1) // stash r30, r31
<+8>: mr r30,r4 // save r4 (new title)
<+12>: stw r0,8(r1) // make stack frame
<+16>: stwu r1,-80(r1) // make stack frame
This is the prologue of the function. The PowerPC doesn't have a dedicated stack pointer (convention is to use
r1 for that), so the common way of implementing branches by pushing the PC onto the stack doesn't work. Instead, the PowerPC has a link register and a command
bl to branch and put the old PC value into the link register. Thus, almost every function starts with
mflr r0, to pull the old PC into a usable register. Then in <+4> we save off some registers that we're going to smash. Every function needs scratch registers to hold local variables, and usually the high-numbered registers are used. The
stmw (store multiple words) instruction is useful for ditching many high registers on the stack. Then in <+12> we drop the old PC onto the stack and allocate 80 bytes on the stack.
A note on parameter passing. Integer-sized parameters (the only kind we'll be dealing with today) are passed into a function starting with
r3 and going up through the registers. Return values are returned in
r3. So we see that in <+8> we stick away the pointer to the new name in
r30 (whose previous value was stored on the stack earlier).
<+20>: bl 0x92881384 <_Z13GetWindowDataP15OpaqueWindowPtr>
<+24>: li r0,-5600 // errInvalidWindowRef
<+28>: cmpwi cr7,r3,0 // if no window data, bail
<+32>: beq- cr7,0x928d2ae0 <+60>
<+36>: cmpwi cr7,r30,0 // if no string to set, bail
<+40>: li r0,-50 // paramErr
<+44>: beq- cr7,0x928d2ae0 <+60>
<+48>: mr r4,r30
This is where we must start making inferences as to what the code is doing. Fortunately, we have the symbols so it's not too hard. We see that we use the WindowRef as a parameter to a C++ function
GetWindowData(OpaqueWindowPtr), as the WindowRef was passed in as
r3 wasn't altered before the call. In addition, note that the function return value, being in
r3, will overwrite the WindowRef value which wasn't saved in a high register. That's fine, as the WindowRef was just an index into a table and won't be needed further.
At this point we run some checks. We compare both
r30 to zero, and if either are zero we jump to the end with
r0 set to the appropriate error code. (The end of the function will move
r3 for return.)
The PowerPC condition register has eight condition sets. Why are we using
cr7 here? Probably because
cr7 is volatile and we can get away with not saving/restoring it.
<+52>: bl 0x928d2af8 <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+56>: li r0,0 // return noErr
<+60>: addi r1,r1,80 // tear down stack frame and return
<+64>: mr r3,r0
<+68>: lwz r0,8(r1)
<+72>: lmw r30,-8(r1)
<+76>: mtlr r0
The rest is pretty simple. We call a member function
WindowData::SetTitleCommon(CFString*), and then do common tear down. We restore the stack pointer, put the return value into
r3, restore the registers, move the old PC back into the link register, and branch to the link register (
blr), returning us to our caller.
The PowerPC register file is really easy:
r31. x86 has fewer registers and they've historically had different roles (accumulator, base, source index, destination index, and so on). Seriously, forget about that. There are eight registers you care about.
edi are all general-purpose registers.
esp is the stack pointer.
ebp is the frame pointer. That's it.
PowerPC assembly reads right-to-left (except for stores). x86 AT&T syntax in general reads left-to-right.
<+0>: push %ebp // make stack frame
<+1>: mov %esp,%ebp // make stack frame
<+3>: push %esi // stash %esi
<+4>: sub $0x14,%esp // make stack frame
x86 is stack-based. Parameters to a function are put at the top of the stack, and the rightmost parameters have the highest addresses. To execute the function, the
call instruction was used. This instruction pushes the PC onto the stack, so even before we hit <+0> the parameters are four bytes above the stack pointer. In <+0> we save off the old stack frame value and in <+1> we establish our stack frame. At this point
ebp is fixed for the entire function. In <+3> we save the old values of registers we're going to use, and in <+4> we allocate space on the stack.
This is a perfect example of an ideal stack frame.
ebp is the frame pointer. It points (to the stack) at the old frame pointer.
ebp+4 is the PC of the function that called us.
ebp+8 is the first parameter passed in,
ebp+12 is the second, etc. Immediately below
ebp are the values saved from the registers, which will be restored before the return. And below that is a bunch of stack space used for either register spillage or calling subsequent functions. One interesting note is that rarely are parameters pushed onto the stack for a call. The stack pointer doesn't move once we make it past the prologue. We just set the memory right above
esp (the stack pointer) and make the call.
<+7>: mov 0x8(%ebp),%eax // get WindowRef in %eax
<+10>: mov 0xc(%ebp),%esi // get new title in %esi
The parameters are passed on the stack. Since fiddling in memory is slow, we pull the values into registers. It's actually pretty analogous to how things go in PowerPC. There, lower registers like
r3 are reused for parameter passing so important values are kept in the high registers. On x86 the parameters go on the stack and values are kept in registers when possible. Why
esi? Why not?
<+13>: mov %eax,(%esp) // put WindowRef on the stack
<+16>: call 0x92dfb8f6 <_Z13GetWindowDataP15OpaqueWindowPtr>
With the PowerPC, you can tell how many parameters a function has by seeing how many registers starting with
r3 are loaded. Here, we just look at the register indirect addressing with
<+21>: mov %eax,%edx // stick WindowData into %edx
<+23>: mov $0xffffea20,%eax // errInvalidWindowRef
<+28>: test %edx,%edx // if no window data, bail
<+30>: je 0x92e4bb04 <+54>
<+32>: test %esi,%esi // if no string to set, bail
<+34>: mov $0xffce,%ax // paramErr
<+38>: je 0x92e4bb04 <+54>
Return values come back from functions in
eax, but otherwise this is pretty much the same. The only thing of interest to note is the clever use of the peculiar register structure. In <+23> the constant
0xffffea20 is loaded into
eax. But on <+34> the constant
0xffce is loaded in
ax. But since
ax is just an alias for the lower 16 bits of
eax, the upper half of the word is left as
0xffff and we get the full constant
eax. Why do this? Because loading a 32 bit constant takes 5 bytes while loading a 16 bit constant only takes 4.
<+40>: mov %esi,0x4(%esp) // load new title as param 2
<+44>: mov %edx,(%esp) // load WindowData as param 1
<+47>: call 0x92e4bb0c <_ZN10WindowData14SetTitleCommonEPK10__CFString>
<+52>: xor %eax,%eax // return noErr
Same stuff as before. The one note is the zeroing of
eax with an
xor. Just a fancy trick as the generated code is faster and smaller than the equivalent
<+54>: add $0x14,%esp // tear down stack frame and return
<+57>: pop %esi
The mirror image of the stack frame creation.
That's one function down and one left to go. Next time, we'll take a look at a function that behaves a little differently than this one did.