June 4th, 2009 | Published in Google Mac Blog
(Note: this is another of our occasional extra-geeky technical posts. If this isn't your thing, don't worry; our usual non-technical stuff will be back soon.)
Welcome back. In our last post we went through a simple function that made calls to other functions, and touched on stack frames and parameter passing. This time let's talk about a different function. We'll focus less on the things we've seen, and more on some more advanced actions that this function does.
<+0>: mflr r0 // save linkage
<+4>: stmw r28,-16(r1) // stash r28, r29, r30, r31
<+8>: mr r30,r3 // save r3 (WindowData)
<+12>: bcl- 20,4*cr7+so,0x928d2bd4 <+16>
<+16>: mflr r31 // get ip in r31
Short story: <+12> is an unconditional branch-and-link.
Long story: On the PowerPC, instructions like
bge, etc. are just aliases to a more primitive branch instruction,
bc (branch conditional). In this case, the first parameter is 20 (
0b10100), which indicates “branch always”. Since it's always going to branch, the second parameter doesn't matter, so it was set to all 1 bits (which translates to
Why do this? Because we're going to need to access some PC-relative data, and the PowerPC chip has no PC-relative addressing mode. And the register move instructions can't access the PC register. Therefore we cheat in a way by taking an unconditional jump to the next address. Since it's a branch and link, the link register is filled with the next address (in this case, that equals the address just jumped to) which can be moved to a normal register.
Why branch-conditional with a condition “branch always”? The
b opcode only provides absolute addressing. Only
bc has relative addressing.
<+20>: stw r0,8(r1)
<+24>: stwu r1,-80(r1) // make stack frame
<+28>: addis r28,r31,3533
<+32>: bl 0x928d2c50 <_Z15GetTitleForDockP10WindowData>
<+36>: lbz r0,-3364(r28) // haul initialization boolean into r0
This is where intuition comes in. We're hauling in some random byte from a PC-relative address. (
lbz is load byte and zero, which loads one byte from memory and clears the high bits.) What's byte sized? A
Boolean (the Carbon type; GCC makes C++
bools 4 bytes). Why a
Boolean? Probably a flag. And with the value of the byte gating the call to
RegisterAsDockClientPriv, it's a safe bet that it's an initialization flag.
<+40>: mr r29,r3 // stash new title into r29
<+44>: cmpwi cr7,r0,0 // was initialized?
<+48>: bne- cr7,0x928d2c04 <+64> // if so, skip
<+52>: bl 0x9287f864 <_Z24RegisterAsDockClientPrivv> // else initialize
<+56>: li r0,1 // and set flag
<+60>: stb r0,-3364(r28) // as being intialized
<+64>: mr r3,r30
<+68>: mr r4,r29
<+72>: bl 0x928d2c68 <SyncPlatformWindowTitle> // call with (WindowData, new title)
<+76>: lwz r0,344(r30) // pull (WindowData + 344)
<+80>: andis. r2,r0,64 // and pull a flag bit out of it (minimized?)
More intuition here.
r30 contains a pointer to the
WindowData class instance, and we're accessing some word 344 bytes in. We don't care about the destination register (we don't touch
r2 again this function) but don't miss the name of the opcode: “
andis.” Remember that the period means to update
Once again, this is obviously a flag (bit-sized this time). But what does it mean? Context tells us that we only call
CoreDockSetItemTitle when it's set. Thus, it's a safe guess that this is the is-minimized flag.
<+84>: beq- 0x928d2c38 <+116> // if not minimized, skip this step
<+88>: addi r1,r1,80
<+92>: lwz r3,196(r30) // load WID
How do I know that
WindowData+196 is the CoreGraphics WID (CGWindowID; see CGWindow.h)? I used Quartz Debug to look at the window list for a sample app. The app only had one window, and the listed WID matched.
<+96>: mr r4,r29 // load new title
<+100>: lwz r0,8(r1)
<+104>: lmw r28,-16(r1) // tear down stack frame
<+108>: mtlr r0
<+112>: b 0x92b58ce4 <dyld_stub_CoreDockSetItemTitle>
Note that we're tearing down the stack frame twice. In this case we're tail calling
CoreDockSetItemTitle so that it's as if our caller called them directly. This is equivalent to the code
return CoreDockSetItemTitle(wid, newTitle). Note from the setup of
r4 that we can deduce the parameter types. Can we figure out the return type, though? Not really. The calling code ignores it, so we can ignore it too.
<+116>: addi r1,r1,80
<+120>: li r3,0
<+124>: lwz r0,8(r1)
<+128>: lmw r28,-16(r1)
<+132>: mtlr r0
<+0>: push %ebp // make stack frame
<+1>: mov %esp,%ebp
<+3>: sub $0x28,%esp
<+6>: mov %ebx,-0xc(%ebp) // save %ebx
<+9>: call 0x92e4bbe4 <+14>
<+14>: pop %ebx // IP > %ebx
We're doing the same trick here to get the PC into a register and I'm a bit stumped as to why. From what I know, the x86 has PC-relative addressing, and surely there's got to be a better way to get the PC into a normal register. Right?
<+15>: mov %esi,-0x8(%ebp) // save %esi
<+18>: mov 0x8(%ebp),%esi // WindowData > %esi
<+21>: mov %edi,-0x4(%ebp) // save %edi
This almost looks like it was compiled by a different compiler. In the previous function,
esi are pushed, and then the stack pointer dropped. Here, we create the stack space and then move the contents of three registers (
ebx). I suspect that things change once we also have to save
ebx, though I don't know why.
<+24>: mov %esi,%eax // %esi (WindowData) > %eax
<+26>: call 0x92e4bc40 <_Z15GetTitleForDockP10WindowData>
Whoa. If we're calling a function we need to set the parameter via stack-relative addressing off
esp. What's going on here?
The point of an ABI is that it's a documented way for functions to call each other. But if a function, say
GetTitleForDock(WindowData*), is a short one that's not public and is only used under controlled circumstances, why worry about setting up the stack? In this particular case,
GetTitleForDock happens to be a nine-instruction routine. Not worth the hassle of a stack frame, so it's reasonable to pass in the one parameter in
<+31>: cmpb $0x0,0xd51a36c(%ebx) // test initialization boolean
<+38>: mov %eax,%edi // window title > %edi
<+40>: jne 0x92e4bc0c <+54> // if initialized, skip
<+42>: call 0x92df9fe0 <_Z24RegisterAsDockClientPrivv> // else initialize
<+47>: movb $0x1,0xd51a36c(%ebx) // and set flag as being initialized
<+54>: mov %edi,0x4(%esp) // new title (param 2)
<+58>: mov %esi,(%esp) // WindowData (param 1)
<+61>: call 0x92e4bc52 <SyncPlatformWindowTitle>
<+66>: xor %eax,%eax // clear %eax (noErr?)
<+68>: testb $0x2,0x159(%esi) // test flag (WindowData + 0x159) (minimized?)
<+75>: je 0x92e4bc35 <+95> // if not minimized, skip this step
<+77>: mov %edi,0x4(%esp) // new title (param 2)
<+81>: mov 0xc4(%esi),%eax // (WindowData + 0xC4) WID
<+87>: mov %eax,(%esp) // (param 1)
<+90>: call 0xa0a52ad1 <dyld_stub_CoreDockSetItemTitle>
<+95>: mov -0xc(%ebp),%ebx
<+98>: mov -0x8(%ebp),%esi
<+101>: mov -0x4(%ebp),%edi
Poking around in assembly isn't usually something you do every day. But whether you need it for debugging your own code or exploring someone else's, it's a skill that is definitely worth learning. PowerPC and x86 processors might have had a bit of a different history, but the code that's generated for either is certainly not as intractable as some suggest.
Go exploring, and have fun.
(Thanks to my editor, Scott Knaster, and to David Shayer, whose introductory session on PowerPC assembly at the legendary MacHack conference started me on this path.)