Leaving
those rubber bricks at the side of the pool, let's keep going down for more treasure.
Ascending the Right Island --
Split-stack No Frame Example:
6800
At this point, from working through the single-stack example for the 6800 without stack frames, you might be seeing the reasoning behind stack frames. It can be really difficult figuring out where your data is and where it should be heading without some frame of reference, and stack frames do provide a frame of reference when you're deep in the arcane definitions of some routine.
But building the code to support the stack frames tends to consume time and energy that you'd rather devote to the actual problem at hand, unless your CPU provides high-level support for the frames. It tends to end up a mixed blessing at best, with net costs usually, in my opinion, outweighing benefits, even when your CPU supports it.Here on the 6800, we can see those costs most clearly by looking carefully at the code I present here, reading the source code in a text editor while stepping through it in the simulator, and comparing it with the split-stack stack frame version and the single-stack versions.
Before you get to wondering why anyone wanted to use a stack frame in the first place, it's worth noting that stack frames' utility became especially especially apparent in very large procedures with complex logic. When your procedure extends to hundreds of lines of code (or more) with dozens of variables (or more), you use tools in the assembler to name your local variables by their offset from the frame base pointer, and it helps greatly to manage the complexity.
And it helps in constructing compilers, especially in the initial "bootstrap
stages" of development. The compiler may be able to manage constructing and tearing down the frames more easily than it could handle remembering changing offsets.
But.
The frames get in the way.
Especially when return addresses are inside the stack frames, they get in the way.
All the benefits of stack frames can, in fact, be found in this simple example
of split-stack frameless coding discipline. You might think it's just my opinion, but I'll explain further as we go.
I think the code explains itself, particularly when comparing it to
the split-stack example with stack frames
and
the single-stack example without frames, that we just finished.
One thing that might be a point of interest, I had thought I would use an ADDDX Add double accumulator to X routine in MAIN,
* Could use this in the single-stack no frames example, too.
LEADPX LDX PSP ; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
ADDDX STX XWORK
ADDB XWORK+1
ADDA XWORK
STAA XWORK
STAB XWORK+1
LDX XWORK
RTS
to calculate the effective address of the variable that we are passing, but it worked out to be a wash. Took almost as much code to set it up as to just do it there in place.
Read the code, step through it, compare to what we've worked through so far. Note in particular how we are passing the return values back here, and how it is different from the way we use when working with various kinds of stack frames, and even different from the method of the frameless single-stack discipline:
* 16-bit addition as example of split-stack frame-free discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
OPT 6800
NATWID EQU 2 ; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
ORG $80 ; MDOS says this is a good place for usr stuff.
*
ENTRY JMP START
NOP ; Just want even addressed pointers for no reason.
NOP ; bumper
NOP ; 6 bytes to this point.
SSAVE RMB 2 ; a place to keep S so we can return clean
RMB 4 ; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK RMB 2 ; For saving an index register temporarily
DWORK RMB 2 ; For saving D temporarily
PSP RMB 2 ; parameter stack pointer
LB_BASE RMB 2 ; For process local variables
HPPTR RMB 2 ; heap pointer (not yet managed)
HPALL RMB 2 ; heap allocation pointer
HPLIM RMB 2 ; heap limit
* End of pseudo-registers
RMB 4 ; bumper
GAP1 RMB 2 ; Mark the bottom of the gap
*
*
*
ORG $2000 ; Give the DP room.
LB_ADDR RMB 4 ; a little bumper space
FINAL RMB 4 ; 32-bit Final result in DP variable (to show we can)
FINALX EQU 4
RMB 4 ; Put a bumper after the process static variables
SSTKLIM RMB 64 ; 16 levels of call
SSTKLMX EQU FINALX+8
SSTKBAS RMB 6 ; for canary return
SSTKBSX EQU SSTKLMX+64
SSTKFAK RMB 2 ; fake frame pointer, self-link
SSTKFAX EQU SSTKBSX+6 ; 6801 is post-dec (post-store-decrement) push
SSTKBMP RMB 4 ; a little bumper space
SSTKBMX EQU SSTKFAX+2 ; But we are going to init S through X
PSTKLIM RMB 64 ; 16 levels of call at two parameters per call
PSTKLMX EQU SSTKBMX+4
PSTKBAS RMB 4 ; bumper space -- parameter stack is pre-dec
PSTKBSX EQU PSTKLMX+64
PSTKBMP RMB 4 ; a little bumper space
PSTKBMX EQU PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE RMB 1 ; $1024 or something ; Not using or managing heap yet.
HBASEX EQU PSTKBMX+4
*HLIM RMB 4 ; bumper
*HLIMX EQU HBASEX+$100 ; 1024
*
*
ORG $3000
CDBASE JMP ERROR ; more bumpers
NOP
*
INISTKS LDX #LB_ADDR ; set up process local space
STX LB_BASE
LDAA LB_BASE ; bootstrap own return stack
LDAB LB_BASE+1
LDX #SSTKBSX ; Instead of FDB
STX XWORK
ADDB XWORK+1
ADCA XWORK
STAB XWORK+1 ; initial return stack pointer
STAA XWORK
*
LDX #SSTKNDR ; for fake return address
STX DWORK ; save it for a moment
PULA ; pop real return address
PULB
LDX XWORK ; ready own return stack pointer
STS SSAVE ; save stack pointer from monitor ROM
TXS ; move to our own stack (let TXS convert it)
PSHB ; put return address on own stack
PSHA ; stack now ready for interrupts
*
LDAA DWORK ; error handler for fake return
LDAB DWORK+1
STAA 0,X ; in the cell beyond empty stack pointer
STAB 1,X ; prime the return stack with error handler
STAA 2,X ; second fake return to error handler
STAB 3,X
*
LDAA LB_BASE ; bootstrap parameter stack
LDAB LB_BASE+1
LDX #PSTKBSX ; Instead of FDB
STX XWORK
ADDB XWORK+1 ; initial parameter stack pointer
ADCA XWORK
STAA PSP ; parameter stack now ready
STAB PSP+1
*
LDAA LB_BASE ; set up heap as if we actually had one
LDAB LB_BASE+1
LDX #HBASEX ; Instead of FDB
STX XWORK
ADDB XWORK+1 ; calculat EA
ADCA XWORK
STAA HPPTR
STAB HPPTR+1
STAA HPALL ; as if the heap were functional
STAB HPALL+1
LDX #CDBASE
STX XWORK
LDAA XWORK
LDAB XWORK+1
SUBB #4
SBCA #0
STAA HPLIM
STAB HPLIM+1
RTS
*
*
***
* General structure of the stacks,
*
* return stack is only the return address
* (and maybe extremely ephemeral temporaries):
* [PRETADR ]
* [RETADR ]
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
*
* Could use this in the single-stack no frames example, too.
*LEADPX LDX PSP ; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
*ADDDX STX XWORK
* ADDB XWORK+1
* ADDA XWORK
* STAA XWORK
* STAB XWORK+1
* LDX XWORK
* RTS
*
PPOPD LDX PSP
LDAA 0,X
LDAB 1,X
INX
INX
STX PSP
RTS
*
* This saves bytes:
ALCL2 CLRA
CLRB ; fall through
*
PPSHD LDX PSP
DEX
DEX
STX PSP
STAA 0,X
STAB 1,X
RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.s
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8 CLRA
CLRB
* Enter here with initial value in A:B
ALCLD8 LDX PSP
* Enter here with PSP loaded and initial value in D
ALCLI8 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI6 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI4 DEX
DEX
STAA 0,X
STAB 1,X
ALCLI2 DEX ; PPSHD usually costs less.
DEX
STAA 0,X
STAB 1,X
STX PSP
RTS
*
* six bytes
ALCL6 CLRA
CLRB
ALCLD6 LDX PSP
BRA ALCLI6
*
* four bytes
ALCL4 CLRA
CLRB
ALCLD4 LDX PSP
BRA ALCLI4
*
* two bytes
*ALCL2 CLRA
* CLRB
* LDX PSP
* BRA ALCLI2
*
*
PDROP8 LDAB #8 ; saves two bytes, 7 vs. 3
PDROP_B CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP ADDB PSP+1
ADCA PSP
STAA PSP
STAB PSP+1
LDX PSP ; return with X ready
RTS
*
PDROP6 LDAB #6
BRA PDROP_B
*
PDROP4 LDAB #4
BRA PDROP_B
*
PDROP2 LDAB #2 ; JSR is 3 bytes, LDX PSP; INX; INX; STX PSP is 6
BRA PDROP_B
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ]
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
* 16-bit left, right
* output parameter:
* 17-bit sum in 32-bit
ADD16S LDX PSP
LDAB #(-1) ; default negative
TBA
JSR ALCLI4 ; allocate 2 temporary cells and init (leaves PSP in X)
TST 6,X ; the left-hand operand sign bit
BMI ADD16SR
CLR 2,X ; positive
CLR 3,X
ADD16SR TST 4,X ; the right-hand operand sign bit
BMI ADD16SL
CLR 0,X ; positive
CLR 1,X
ADD16SL LDAA 6,X ; left hand
LDAB 7,X
ADDB 5,X ; right hand
ADCA 4,X
STAA 6,X ; store low half
STAB 7,X
LDAA 2,X
LDAB 3,X
ADCB 1,X
ADCA 0,X
STAA 4,X ; store high half
STAB 5,X
JSR PDROP4
RTS
*
* The alternative, without link, mark, or restore?
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
* 16-bit left, right in 2 16-bit
* output parameter:
* 17-bit sum in 32-bit
ADD16U LDX PSP
LDAA 2,X ; left
LDAB 3,X
ADDB 1,X ; add right
ADCA 0,X
STAA 2,X ; save low in left side
STAB 3,X
LDAB #0 ; extend
ADCB #0 ; extend Carry unsigned (could ROL)
STAB 1,X ; re-use right side to store high half
CLR 0,X ; only bit 8 can be affected
RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown> ]
* [32:VAR1_1 ]
* [32:VAR1_2 ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
* 16-bit pointer to 32-bit integer
* 16-bit addend
* no output parameter:
ADD16SI LDAB #(-1) ; make a temporary -1
TBA
JSR PPSHD ; default to signed (leaves PSP in X)
TST 2,X ; test high byte
BMI ADD16SP
CLR 0,X ; zero extend
CLR 1,X
ADD16SP LDX 4,X ; get pointer to target
LDAA 2,X ; target low
LDAB 3,X
LDX PSP
ADDB 3,X ; parameter
ADCA 2,X
LDX 4,X : pointer to target
STAA 2,X ; update low half with result
STAB 3,X
LDAA 0,X ; target, high half
LDAB 1,X
LDX PSP
ADCB 1,X ; sign extension half
ADCA 0,X
LDX 4,X ; target
STAA 0,X ; update high half
STAB 1,X
JSR PDROP6 ; drop temporary and parameters
RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
*
* Parameter stack after mark and local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN JSR ALCL8 ; allocate and clear 8 bytes
LDAA #$12
LDAB #$34
JSR PPSHD
LDAA #$CD
LDAB #$EF
JSR PPSHD
JSR ADD16U ; 32-bit result on parameter stack should be $0000E023
LDAA #$87 ; ADD16U leaves PSP in X
LDAB #$65
STAA 0,X ; reuse low half of result space, overwrite high half
STAB 1,X
JSR ADD16S ; result on parameter stack should be $FFFF6788
LDAA 2,X ; result low half -- ADD16S leaves PSP in X
LDAB 3,X ; put result away
STAA 6,X ; to 2nd local variable low half
STAB 7,X
LDAA 0,X ; result high half
LDAB 1,X
STAA 4,X ; to 2nd local variable high half
STAB 5,X
STX XWORK ; instead of JSR ADDDX:
LDAB XWORK+1 ; LDAB #4; CLRA; JSR ADDDX; LDX PSP; STAB 3,X; STAA 2,X
LDAA XWORK ; Moving results around takes a lot of code,
ADDB #4 ; So just do it here.
ADCA #0
STAB 3,X
STAA 2,X
LDAA #$A5
TAB ; don't really need to use both, just making things clear.
STAA 0,X
STAB 1,X
JSR ADD16SI ; result in 2nd variable should be FFFF0D2D (Carry set)
LDAA 2,X ; 2nd variable low half -- ADD16SI leaves PSP in X
LDAB 3,X
LDX LB_BASE
STAA FINALX+2,X
STAB FINALX+3,X
LDX PSP
LDAA 0,X
LDAB 1,X
LDX LB_BASE
STAA FINALX,X
STAB FINALX+1,X
JSR PDROP8 ; ADD16SI also dropped its arguments for us, so only locals
RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
***
* Return stack will only contain return addresses (and very ephemeral temporaries):
* [RETADRNN ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= RSP
*
*
* Parameter stack after initialization:
* [<unknown]PSTKBAS <= PSP
*
START JSR INISTKS
*
JSR MAIN
*
*
DONE NOP
ERROR NOP ; define error labels as something not DONE, anyway
SSTKNDR NOP
LDS SSAVE ; restore the monitor stack pointer
NOP
NOP
NOP ; another landing pad to set breakpoint at
NOP
LDX $FFFE
JMP 0,X ; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor,
* but not necessarily to your program in a runnable state.
As always, I have tested this code, and it produces the correct results without stack frames, passing both input and return parameters on the stack, except for utility routines which use lower level register protocols not available to higher-level routines.
I will be pointing you back here later. If this talk about stack frames and parameter passing methods seems a little fuzzy at this point, it's okay to move ahead for now.
You may want to move ahead with getting numeric output in binary, or you might want to see how single-stack, no-frame parameter passing works on the 6801, next.
No comments:
Post a Comment