So we've got most of
those rubber bricks off the bottom of the pool
but there's some treasure down there, too.
Putting That Wrong Island in the Rear View Mirror --
Single-stack No Frame Example:
6800
As I have said, even though we’ve just looked at an example of how split-stack stack frames can be done on the 6800 and we've even seen a parallel example of single-stack stack frames on the same, I do not recommend stack frames.
But I think I have made it clear that, if you have to do stack frames, I recommend split-stack over single-stack.
In this chapter we are going to look at the same functional example of three kinds of addition using a single stack without a stack frame.
Single-stack no frame, if you are allowed to do it and learn how to do it right, will produce cleaner, more optimal code than single-stack with stack frames.
But I'm going to repeat myself. I cannot recommend this. You have to track
what is on that stack, and the return address just gets in the way of your
calculations and your memory. It's a bit (16 bits on the 6800) of distracting
data that isn't relevant to the calculations the function is doing, and every
time you look for something on the stack, it either sticks out like a sore
thumb, distracting you, or you forget it's there and miss what you are aiming
at. And walk on it. Or try to get it from where it isn't and end up executing
data or garbage instead of instructions.
But we have to keep track of that anyway, really, even though a frame pointer can help. If we don't know what's there, we don't know where we've put things, and that's a terrible state for a program (and a programmer) to be in -- and that's one reason people avoid reading the assembly language output of compilers.
Just looking at the code below, you may not see how much we've ripped out --
that's because we've been hiding what we could in subroutines. But tracing
through the code should feel rather different, because you can hide code from
the programmer, but you can't hide it from the processor.
You'll really want to compare the code with the stack frame version, and re-read the code and the comments. Take time to trace through both, watching the source as you do.
* 16-bit addition as example of single-stack discipline sans stack frame on 6800,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6800
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for user stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily in leaf functions only
DWORK RMB 2 ; For saving D temporarily in leaf functions only
RETVHI RMB 2 ; high half of 32-bit return values (because we can't push X easily)
RETVLO RMB 2 ; 16-bit return values and low half (because loading and saving is redundant)
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
RMB 4 ; buffer
STKLIM RMB 192 ; roughly 16 to 20 levels of call
STKLIMX EQU FINALX+8
STKBAS RMB 8 ; for canary return
STKSZ EQU 192 ; for EXORsim assembler limits
STKBASX EQU STKLIMX+192 ; must be STKLIMX+STKSZ -- assembler won't take symbol
STKFAK RMB 2 ; fake frame pointer, self-link
STKFAKX EQU STKBASX+8 ; 6801 is post-dec (post-store-decrement) push
STKBMP RMB 4 ; a little bumper space
STKBMPX EQU STKFAKX+2 ; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU STKBMPX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
*STKBASM FDB STKBASX ; Doesn't work within EXORsim assembler limits after all
*HBASEXM FDB HBASEX ; by avoiding splitting large constants up at assemble time
*
INISTK LDX #LB_ADDR ; set up process local space
STX LB_BASE ; local space functional
LDAA LB_BASE ; bootstrap own stack
LDAB LB_BASE+1
* ADDB STKBASM+1
* ADCA STKBASM
LDX #STKBASX ; Instead of FDB
STX XWORK
ADDB XWORK+1
ADCA XWORK
*
STAB XWORK+1 ; initial stack pointer
STAA XWORK
*
LDX #STKUNDR ; for fake return address
STX DWORK ; save it for a moment
*
PULA ; pop real return address
PULB
LDX XWORK ; ready own stack pointer
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts, utility routines
*
LDAA DWORK ; error handler for fake return
LDAB DWORK+1
STAA 0,X ; in the cell beyond empty stack pointer
STAB 1,X
STAA 2,X ; and the next cell, for good measure
STAB 3,X
*
LDAA LB_BASE
LDAB LB_BASE+1
PSHB
PSHA
* JSR PSH16I
* FDB HBASEX ; EXORsim's interactive assembler doesn't like FDBs.
LDX #HBASEX
JSR SPSHX
*
JSR UADD16
STAA HPPTR ; as if we were ready to use heap
STAB HPPTR+1
STAA HPALL
STAB HPALL+1
* JSR PSH16I ; FDBs
* FDB CDBASE
* JSR PSH16I
* FDB (-4) ; extra bumper
* JSR UADD16
LDX #CDBASE
STX XWORK
LDAA XWORK
LDAB XWORK+1
SUBB #4
SBCA #0
*
STAA HPLIM
STAB HPLIM+1
RTS ; finally done, now can return
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame,
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP} ] for calling routine
* [PARAM ] from calling routine
* [RETADR ] to calling routine
* [LOCVAR ] for called -- current -- routine
* [TEMP ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ]
* [LOCVAR2 ]
* [TEMP2 ]
* [PARAM3 ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3 ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Push low half of return value
* (Didn't use it there, don't use it here.)
PSHLH TSX
LDAA 0,X ; return address
LDAB 1,X
PSHB
PSHA
LDAA RETVLO
LDAB RETVLO+1
STAA 0,X
STAB 1,X
RTS
*
* Avoid the math to split 16-bit constants into two 8-bit loads,
* and push them while we are here.
* The constant follows the call in the instruction stream.
* Leaves constant in A:B, as well.
PSH16I TSX ; point to top of return address stack
LDX 0,X ; point into the instruction stream
LDAA 0,X ; high byte from instruction stream
LDAB 1,X ; low byte from instruction stream
INS ; drop the return address we almost have in X
INS
PSHB ; replace it with the constant
PSHA
JMP 2,X ; return to the byte after the constant.
*
* 8 bytes for the meat of this vs. 3 for the call.
* We end up using it a lot since EXORsim's interactive assembler doesn't do FDBs.
SPSHX STX XWORK
DES
DES
TSX
LDAA 2,X
LDAB 3,X
STAA 0,X
STAB 1,X
LDAA XWORK
LDAB XWORK+1
STAA 2,X
STAB 3,X
RTS
*
* 6 bytes for the meat of this vs. 3 for the call, instead of FDB
* (Didn't use it there, don't use it here.)
TXD STX XWORK
LDAA XWORK
LDAB XWORK+1
RTS
*
* Utility 16-bit add, leave result in A:B
UADD16 TSX ; no frame
LDAB 5,X ; left
ADDB 3,X ; right ; because we can
LDAA 4,X ; left
ADCA 2,X ; right
LDX 0,X
UADROP INS ; drop return address and parameters
INS
INS
INS
INS
INS
JMP 0,X ; return via X
*
* Utility 16-bit sub, leave result in A:B
* (Didn't use it there, don't use it here.)
USUB16 TSX ; no frame
LDAB 5,X ; left
SUBB 3,X ; right ; because we can
LDAA 4,X ; left
SBCA 2,X ; right
LDX 0,X
BRA UADROP ; drop return address and parameters
*
*
* We really don't want to put S in a temp if we can avoid it
ALOCS8 PULA
PULB
ALOS8I DES
DES
ALOS6I DES
DES
ALOS4I DES
DES
ALOS2I DES
DES
PSHB
PSHA
RTS
*
ALOCS6 PULA
PULB
BRA ALOS6I
*
ALOCS4 PULA
PULB
BRA ALOS4I
*
ALOCS2 PULA
PULB
BRA ALOS2I
*
INI0_8 CLRA
CLRB
* call with initialization value in A:B
INIS8 TSX
INIT8 STAA 8,X
STAB 9,X
INIT6 STAA 6,X
STAB 7,X
INIT4 STAA 4,X
STAB 5,X
INIT2 STAA 2,X
STAB 3,X
RTS ; 0,X is return address!
*
INI0_6 CLRA
CLRB
* call with initialization value in A:B
INIS6 TSX
BRA INIS6
*
INI0_4 CLRA
CLRB
* call with initialization value in A:B
INIS4 TSX
BRA INIS4
*
INI0_2 CLRA
CLRB
* call with initialization value in A:B
INIS2 TSX
BRA INIS2
*
DROP8 PULA
PULB
INS
INS
DROP6I INS
INS
INS
INS
INS
INS
PSHB
PSHA
RTS
*
DROP6 PULA
PULB
BRA DROP6I
*
*
* Stack at entry
* when functions are called by MAIN
* with two parameters
* We will return results in RETVHI:RETVLO in direct page
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ]
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left 1st pushed, right 2nd
* output parameter:
* 17-bit sum in 32-bit in RETVHI:RETVLO
* Does not alter the parameters.
ADD16S TSX ; no local variables
LDAA #(-1) ; prepare for sign extension
TST 4,X ; the left-hand operand sign bit
BMI ADD16SR
CLRA ; zero extend
ADD16SR PSHA ; push left extension (only need one byte, though, really)
PSHA ; left sign cell below X now
LDAA #(-1) ; reload
TST 2,X ; the right-hand operand sign bit
BMI ADD16SL
CLRA ; zero extend
ADD16SL PSHA ; push right extension
PSHA
TSX ; point to sign extensions
LDAA 8,X ; left-hand low cell
LDAB 9,X
ADDB 7,X ; right-hand low cell
ADCA 6,X
STAA RETVLO ; save low half of result
STAB RETVLO+1
LDAA 2,X ; left-hand extension
LDAB 3,X
ADCB 1,X ; right-hand extension
ADCA 0,X
STAA RETVHI ; Save high half of result
STAB RETVHI+1
INS ; drop sign extension temporaries
INS ; 4 INS is one byte more than JSR DROP4
INS
INS
RTS ; result is in RETVLO:RETVHI
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit RETVLO:RETVHI
ADD16U TSX ; no local allocations
LDAA 4,X ; left
LDAB 5,X
ADDB 3,X ; right
ADCA 2,X
STAA RETVLO ; save low half
STAB RETVLO+1
LDAB #0
ADCB #0
STAB RETVHI+1 ; save carry bit in high half
CLR RETVHI ; will never carry beyond bit 17
RTS
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1] <= PARAM2_1
* [32:VAR1_2]
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ]
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameters:
* 16-bit pointer to 32-bit integer
* 16-bit addend
* no output parameter:
ADD16SI TSX ; no own local variables
LDAA #(-1)
TST 2,X ; high byte of addend paramater
BMI ADD16SIP
CLRA
ADD16SIP PSHA ; save the sign extension half
PSHA
LDX 4,X ; get pointer to target
LDAA 2,X ; target low
LDAB 3,X
TSX ; SP[ sign, retadr, addend, long ptr ]
ADDB 5,X ; addend parameter (stack is two lower, now)
ADCA 4,X
LDX 6,X ; target pointer
STAA 2,X ; save result low half away
STAB 3,X
LDAA 0,X ; target high half
LDAB 1,X
TSX
ADCB 1,X ; sign extension half
ADCA 0,X
LDX 6,X ; target
STAA 0,X ; save result high half away
STAB 1,X
INS ; three bytes for INS and RTS vs. two bytes for branch
INS
RTS ; no result to load
*
*
***
* Stack after variable allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ]
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN JSR ALOCS8 ; 2 calls, 6 bytes vs. 1 clr + 8 pushes , 9 bytes
JSR INI0_8
TSX
*
JSR PSH16I
* FDB $1234 ; parameters
FCB $12
FCB $34
JSR PSH16I
* FDB $CDEF
FCB $CD
FCB $EF
JSR ADD16U ; result in RETVHI:RETVLO should be $E023
INS ; drop one parameter, reuse other
INS
TSX
LDAA RETVLO ; four extra bytes compared to calling PSHLH
LDAB RETVLO+1
STAA 0,X
STAB 1,X
JSR PSH16I
* FDB $8765
FCB $87
FCB $65
JSR ADD16S ; result in RETVHI:RETVLO should be $FFFF6788
TSX ; reuse both parameters
LDAA RETVHI
LDAB RETVHI+1
STAA 4,X ; 2nd local variable high half
STAB 5,X
LDAA RETVLO
LDAB RETVLO+1
STAA 6,X
STAB 7,X
STX XWORK ; calculate address of second variable
LDAB XWORK+1
ADDB #4
STAB 3,X
LDAA XWORK
ADCA #0 ; don't lose the carry
STAA 2,X
LDAB #$A5
STAB 0,X ; $A5
STAB 1,X ; $A5A5
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D
INS ; drop the parameters
INS
INS
INS
TSX
LDAA 2,X ; low half
LDAB 3,X
LDX LB_BASE ; store it in FINAL, in process local space
STAA FINALX+2,X
STAB FINALX+3,X
TSX
LDAA 0,X ; high half
LDAB 1,X
LDX LB_BASE
STAA FINALX,X
STAB FINALX+1,X
JSR DROP8
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
*
***
*
START NOP
JSR INISTK
NOP
*
JSR MAIN
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
STKUNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP ; landing pad to set breakpoint at
NOP
NOP
LDX $FFFE ; alternatively, jmp through reset vector
JMP 0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
I probably spent a good six hours or more figuring out all the places I had
messed up the offsets and lost track of what was on the stack. Sure, that was
because I'd quit using the frame pointers for reference. It was also
because I was running low on sleep. But it was more so because of that
distracting presence of the return addresses right in the middle of the
data.
If you haven't traced through this code, do so. Otherwise, you won't really
believe me.
And then go take a look at the split-stack version of this.
[JMR202411260841 addendum:]
Speaking of the split-stack version, while working through that, I realized I could have used a load effective address routine here for calculating the address of the second local variable in MAIN,
something like
* Add D to S and load to X as a pointer
LEADSX TSX ; make it a pointer
INX ; adjust for return address the cheap way
INX
STX XWORK
ADDB XWORK+1
STAB XWORK+1
ADCA XWORK
STAA XWORK
LDX XWORK
RTS
[JMR202411260841 addendum end.]
(Note that, this time, I'm not suggesting you move ahead if you are getting
tired. You've come this far, it's only a little farther along this path until you can decide whether I'm a fool for thinking split stack with no stack frames is so great -- or maybe see what I see.)
No comments:
Post a Comment