Wednesday, October 23, 2024

ALPP 02-19 -- Ascending the Wrong Island -- Single-stack Stack Frame Example: 6809

Well, this one did sit at the bottom of the pool for a little while. Real world stuff interfering. Concrete examples are still useful.

  Ascending the Wrong Island --
Single-stack Stack Frame Example:
6809

(Title Page/Index)

This is more concrete work to elucidate the problems in single-stack stack frames on the 6801. I'm translating the concrete example for the 68000 to the 6809 here.

Again, I do not recommend a single stack discipline. But most of the current "modern" software engineering infrastructure is built on this discipline, so it helps to have code that allows us to compare the single stack approach with the split stack approach. I am providing example of both for the 6809 here, the split stack example below the single stack example.

With the 6809 written and checked, it should become possible to write a concrete example for the 6801 and even the 6800.

Again, I'm leaving the discussion for the comments, in the (not quite realistic) hopes that the comments will be more accurate than free-form prose.

* 16-bit addition as example of single-stack stack frame discipline on 6809
* using the direct page,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$2000	; MDOS says this is a good place for usr stuff.
*	SETDP	$20	; for some other assemblers
	SETDP	$2000	; for EXORsim
*
ENTRY	LBRA	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP
SSAVE	RMB	2	; a place to keep S so we can return clean
SSAVEX	EQU	6	; manufacture offsets for assemblers that can't do SSAVE-ENTRY
USAVE	RMB	2	; just for kicks, save U, too
USAVEX	EQU	SSAVEX+2
DPSAVE	RMB	2	; a place to keep DP so we can return clean
DPSAVEX	EQU	USAVEX+2
	RMB	4	; bumper
XWORK	RMB	2	; For saving an index register temporarily
XWORKX	EQU	DPSAVEX+6
HPPTR	RMB	2	; heap pointer (not yet managed)
HPPTRX	EQU	XWORKX+2
HPALL	RMB	2	; heap allocation pointer
HPALLX	EQU	HPPTRX+2
	RMB	4	; bumper
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	HPALLX+6
GAP1	RMB	2	; Mark the bottom of the gap
GAP1X	EQU	FINALX+4
*
LB_ADDR	EQU	ENTRY
*
*
	SETDP	0	; Not yet set up
	ORG	$2100	; Give the DP room.
	RMB	4	; a little bumper space
SSTKLIM	RMB	96	; roughly 16 levels of call
SSTKLIMX	EQU	$104
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	6	; for canary return
SSTKBASX	EQU	SSTKLIMX+96
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAKX	EQU	SSTKBASX+6
SSTKBMP	RMB	4	; a little bumper space
SSTKBMPX	EQU	SSTKFAKX+2
*
HBASE	RMB	$1024		; Not using or managing heap yet.
HBASEX	EQU	SSTKBMPX+4
HLIM	RMB	4	; bumper
HLIMX	EQU	HBASEX+$1024
*
*
* If we had DP relative in postbyte,
* and if DP were defined for 2-byte transfers as DP:00,
* we could do this:
*INISTK	LEAX	0,DP
*	LEAY	ENTRY,PCR	; Set up new DP base
*	TFR	Y,DP		; I think this would actually work, but isn't documented.
*	STX	<DPSAVE
* (If wishes were fishes ....)
* Calculate DP because we don't have DP relative in index postbyte:
INISTK	TFR	DP,A
	CLRB
	TFR	D,X		; save old DP base for a moment
	LEAY	ENTRY,PCR	; Set up new DP base
	TFR	Y,D
	TFR	A,DP		; Now we can access DP variables correctly.
*	SETDP	$20	; some other assemblers
	SETDP	$2000	; EXORsim
	STX	<DPSAVE		; technically only need to save high byte
	STU	<USAVE
	PULS	X		; get return address
	STS	<SSAVE		; Save what the monitor gave us.
	LEAS	SSTKFAKX,Y	; Move to our own stack
	STS	,S		; self-link as fake frame pointer
	LEAY	STKUNDR,PCR	; fake return to stack underflow handler
	PSHS	Y		; Using U would conflict with frame pointer use
*	STS	,--S		; This would not work even if emulated correctly
	LEAU	-2,S		; self-link as fake frame pointer
	PSHS	U		; U is FP, S and U equal
	PSHS	Y		; one more fake return to handler
* Because we don't have DP (long) relative in postbyte,
* and can't do
*	LEAY	HBASEX,DP
* calculate it:
	CLRB			; A still has run-time DP
	ADDD	#HBASEX		; calculat EA
	TFR	D,Y		; as if we actually had a heap
	STY	<HPPTR
	STY	<HPALL
	JMP	,X	; return via X
*
***
* Stack after LINK #0 when fuctions are called by MAIN
* with two parameters
* (#0 means no local variables)
* We will return result in D0:D1
* [<SELF>  ] <= <SELF>
* [STKUNDR ]
* [<SELF>  ] <= <SELF>,FRMPTRX
* [STKUNDR ]SSTKBAS
* [FRMPTRX=SSTKBAS+NATWID ] <= FRMPTR0
* [RETADR0 ] 
* [FRMPTR0 ] <= FRMPTR1
* [--------]
* [--------] 
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
* [FRMPTR1 ] <= FP,SP
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
ADD16S	PSHS	U	; mark
	TFR	S,U	; link, no allocate
	LDX	#-1	; sign extend right
	TST	4,U	; sign bit, anyway
	BMI	ADD16SR
	LEAX	1,X	; 0
ADD16SR	PSHS	X	; push right extension
	LDX	#-1	; negative
	LDD	6,U	; left
	BMI	ADD16SL
	LEAX	1,X	; 0
ADD16SL	PSHS	X	; push left extension
	ADDD	4,U	; add right
	TFR	D,X	; save low
	PULS	D	; get left sign extension
	ADCB	1,S	; carry is still safe
	ADCA	,S	; high word complete
	TFR	U,S	; result is in D:X	
	PULS	U	; unlink
	RTS		; C, N valid, Z not valid
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	PSHS	U	; mark
	TFR	S,U	; link, no allocate
	LDD	6,U	; left
	ADDD	4,U	; add right
	TFR	D,X	; save low
	LDD	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL in)
	TFR	U,S	; unlink (unecessary here, but ...)
	PULS	U
	RTS		; C, N valid, Z not valid
*
* Etc.
*
***
* Stack after LINK #0 when fuctions are called by MAIN
* with one parameter
* (#0 means no local variables)
* We will return result in D0:D1
* [<SELF>  ] <= <SELF>
* [STKUNDR ]
* [<SELF>  ] <= <SELF>,FRMPTRX
* [STKUNDR ]SSTKBAS
* [FRMPTRX=SSTKBAS+NATWID ] <= FRMPTR0
* [RETADR0 ] 
* [FRMPTR0 ] <= FRMPTR1
* [VAR1_1--]
* [VAR1_2--] 
* [PARAM2_1]
* [RETADR1 ] 
* [FRMPTR1 ] <= FP,SP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
SUB16SI	PSHS	U	; mark
	TFR	S,U	; link, no allocate
	LDY	,U	; get caller's FP back
	LDX	#-1	; sign extend (only) parameter
	TST	4,U
	BMI	SUB16SIP
	LEAX	1,X
SUB16SIP	PSHS	X
	LDD	-6,Y	; caller's 2nd variable, low
	ADDD	4,U	; 1st (only) parameter
	STD	-6,Y	; update low half
	LDD	-8,Y	; caller's 2nd variable, high
	ADCB	1,S
	ADCA	,S
	STD	-8,Y
	TFR	U,S	; unlink
	PULS	U
	RTS		; C, N valid, Z not valid
*
*
***
* Stack after LINK
* [<SELF>  ] <= <SELF>
* [STKUNDR ]
* [<SELF>  ] <= <SELF>,FRMPTRX
* [STKUNDR ]SSTKBAS
* [FRMPTRX=SSTKBAS+NATWID ] <= FRMPTR0
* [RETADR0 ] 
* [FRMPTR0 ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	PSHS	U	; mark
	TFR	S,U	; link
*	LEAS	-8,S	; allocate 2 32-bit variables
*	LDD	#0	; (showing how to access)
*	STD	-8,U	; clear the variables
*	STD	-6,U	; there is a slightly faster way
*	STD	-4,U
*	STD	-2,U
* slightly faster, fewer bytes, too:
	LDD	#0
	TFR	D,X
	PSHS	D,X
	PSHS	D,X
*
	LDX	#$1234
*	PSHS	X	; yes we could push D and X together
	LDD	#$CDEF
*	PSHS	D
	PSHS	D,X
	LBSR	ADD16U	; result in D:X should be $E023
	LEAS	4,S	; could reuse instead of dropping
*	PSHS	X
	LDD	#$8765
*	PSHS	D
	PSHS	D,X
	LBSR	ADD16S	; result in D1 should be $FFFF6788 (and carry set)
	LEAS	4,S
	STD	-8,U
	STX	-6,U
	LDD	#$A5A5
	PSHS	D
	LBSR	SUB16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	LDD	-8,U
	STD	<FINAL
	LDD	-6,U
	STD	<FINAL+2
	TFR	U,S	
	PULS	U
	RTS		; C, N valid, Z not valid
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP (A7)
***
* (who knows?) <= FP (A6)
***
*
* Stack after initialization:
* [<SELF>  ] <= <SELF>
* [STKUNDR ]
* [<SELF>  ] <= <SELF>,FP
* [STKUNDR ]SSTKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF>  ] <= <SELF>
* [STKUNDR ]
* [<SELF>  ] <= <SELF>,FRMPTRX
* [STKUNDR ]SSTKBAS
* [FRMPTRX=SSTKBAS+NATWID ] <= SP,FP
*
START	NOP
	LBSR	INISTK
	NOP
*
	PSHS	U	; mark
	TFR	S,U	; link
*
	LBSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	<SSAVE	; restore the monitor stack pointer
	LDD	<DPSAVE	; restore the monitor DP last
	TFR	A,DP
	SETDP	0	; For lack of a better way to set it.
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	JMP	[$FFFE]	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Note, in the above, that moving the STKUNDR and ERROR labels away from DONE makes it possible to set a breakpoint at DONE which would not be taken if the code failed to finish properly. This would be the case if it returned to STKUNDR via a fake return or (hypothetically) jumped to ERROR.

Again, I've tested the code. It runs. It builds the stack frames and tears them down as advertised. And, as always, I will not guarantee that this code can be generalized. Nor will I guarantee that it can be generated by any real compiler.

Again for comparison and for grins, let's see what it might look like with split stacks and a literal frame pointer.

* 16-bit addition as example of split-stack stack frame discipline on 6809
* using the direct page,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$2000	; MDOS says this is a good place for usr stuff.
*	SETDP	$20	; for lwasm and some other assemblers
	SETDP	$2000	; for EXORsim and some other assemblers
*
ENTRY	LBRA	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP
SSAVE	RMB	2	; a place to keep S so we can return clean
SSAVEX	EQU	4	; manufacture offsets for assemblers that can't do SSAVE-ENTRY
USAVE	RMB	2	; just for kicks, save U, too
USAVEX	EQU	SSAVEX+2
DPSAVE	RMB	2	; a place to keep DP so we can return clean
DPSAVEX	EQU	USAVEX+2
	RMB	4	; bumper
XWORK	RMB	2	; For saving an index register temporarily
XWORKX	EQU	DPSAVEX+6
FMTMP	RMB	2	; For saving the stack mark in Y temporarily
FMTMPX	EQU	XWORKX+2
HPPTR	RMB	2	; heap pointer (not yet managed)
HPPTRX	EQU	FMTMPX+2
HPALL	RMB	2	; heap allocation pointer
HPALLX	EQU	HPPTRX+2
	RMB	4	; bumper
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	HPALLX+6
GAP1	RMB	2	; Mark the bottom of the gap
GAP1X	EQU	FINALX+4
*
LB_ADDR	EQU	ENTRY
*
*
	SETDP	0	; Not yet set up
	ORG	$2100	; Give the DP room.
	RMB	4	; a little bumper space
SSTKLIM	RMB	32	; roughly 16 levels of call
SSTKLIMX	EQU	$104
* 			; 6809 is pre-dec (pre-store-decrement) push
SSTKBAS	RMB	6	; for canary return
SSTKBASX	EQU	SSTKLIMX+96
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAKX	EQU	SSTKBASX+6
SSTKBMP	RMB	4	; a little bumper space
SSTKBMPX	EQU	SSTKFAKX+2
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLIMX	EQU	SSTKBMPX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBASX	EQU	PSTKLIMX+64
*
HBASE	RMB	$1024		; Not using or managing heap yet.
HBASEX	EQU	PSTKBASX+4
HLIM	RMB	4	; bumper
HLIMX	EQU	HBASEX+$1024
*
*
* If we had DP relative in postbyte,
* and if DP were defined for 2-byte transfers as DP:00,
* we could do this:
*INISTK	LEAX	0,DP
*	LEAY	ENTRY,PCR	; Set up new DP base
*	TFR	Y,DP		; I think this would actually work, but isn't documented.
*	STX	<DPSAVE
* (If wishes were fishes ....)
* Calculate DP because we don't have DP relative in index postbyte:
INISTKS	TFR	DP,A
	CLRB
	TFR	D,X		; save old DP base for a moment
	LEAY	ENTRY,PCR	; Set up new DP base
	TFR	Y,D
	TFR	A,DP		; Now we can access DP variables correctly.
*	SETDP	$20	; some other assemblers
	SETDP	$2000	; EXORsim
	STX	<DPSAVE		; technically only need to save high byte
	STU	<USAVE
	PULS	X		; get return address
	STS	<SSAVE		; Save what the monitor gave us.
	LEAS	SSTKFAKX,Y	; Move to our own return stack
	LEAU	PSTKBASX,Y	; and our own parameter stack
	LEAY	STKUNDR,PCR	; fake return to stack underflow handler
	PSHS	Y
	PSHS	U		; fake link to empty stack
	PSHS	Y		; one more fake return to stack underflow handler
	CLRB			; A still has run-time DP
	ADDD	#HBASEX		; calculat EA
	TFR	D,Y		; as if we actually had a heap
	STY	<HPPTR
	STY	<HPALL
	JMP	,X	; return via X
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>] <= RSP
* [RETADR1 ]
*
* Return stack after link:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>]
* [RETADR1 ]
* [FRMPTR1 ] <= RSP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>] <= FRMPTR0,FRMPTR1
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP,FP
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	PSHS	Y	; link, mark, and restore could be optimized out.
	TFR	U,Y	; mark
	LDX	#-1	; sign extend right
	TST	,Y	; sign bit, anyway (Use Y to show it can be used.)
	BMI	ADD16SR
	LEAX	1,X	; 0
ADD16SR	PSHU	X	; push right extension
	LDX	#-1	; negative
	LDD	2,Y	; left
	BMI	ADD16SL
	LEAX	1,X	; 0
ADD16SL	PSHU	X	; push left extension
	ADDD	,Y	; add right
	STD	2,Y	; save low
	PULU	D	; get left sign extension
	ADCB	1,U	; carry is still safe
	ADCA	,U++	; high word complete, tricky postinc
	STD	,Y
	PULS	Y	; restore FP
	RTS		; C, N valid, Z not valid
*
* Alternative: no link, mark, or restore:
*ADD16S	LDX	#-1	; sign extend right
*	TST	,U	; sign bit, anyway (Use Y to show it can be used.)
*	BMI	ADD16SR
*	LEAX	1,X	; 0
*ADD16SR	PSHU	X	; push right extension
*	LDX	#-1	; negative
*	LDD	4,U	; left
*	BMI	ADD16SL
*	LEAX	1,X	; 0
*ADD16SL	PSHU	X	; push left extension
*	ADDD	4,U	; add right
*	STD	6,U	; save low
*	PULU	D	; get left sign extension
*	ADCB	1,U	; carry is still safe
*	ADCA	,U++	; high word complete, sneaky postinc
*	STD	,U
*	RTS		; C, N valid, Z not valid
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit D1
ADD16U	PSHS	Y	; link, mark, and restore could be optimized out.
	TFR	U,Y	; mark
	LDD	2,Y	; left
	ADDD	,Y	; add right
	STD	2,Y	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	,Y
	PULS	Y	; restore FP
	RTS		; C, N valid, Z not valid
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown>] <= FRMPTR0,FRMPTR1
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1] <= PSP,FP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend in 32-bit
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
SUB16SI	PSHS	Y	; link, mark, and restore could be optimized out.
	TFR	U,Y	; mark
	LDX	,S	; get caller's FP back
	LDD	#-1	; sign extend (single) parameter
	TST	,Y
	BMI	SUB16SIP
	LDD	#0
SUB16SIP	PSHU	D	; save sign extension
	LDD	-6,X	; caller's 2nd variable, low
	ADDD	,Y	; single parameter
	STD	-6,X	; update low half
	LDD	-8,X	; caller's 2nd variable, high
	ADCB	1,U	; sign extension low byte
	ADCA	,U	; high byte
	STD	-8,X	; store result
	TFR	Y,U	; drop parameter and sign extension
	PULS	Y	; restore FP
	RTS		; C, N valid, Z not valid
*
*
*
***
* Return stack on entry:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ] <= RSP
*
* Return stack after link:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>] <= RSP
*
* Parameter stack after mark and local allocation
* [<unknown>] <= FP,FRMPTR0
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	PSHS	Y	; link
	TFR	U,Y	; mark
	LDD	#0	; allocate and initialize
	TFR	D,X
	PSHU	D,X
	PSHU	D,X
	LDX	#$1234
	LDD	#$CDEF
	PSHU	D,X
	LBSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LEAU	2,U	; drop high part (could be optimized out).
	LDD	#$8765
	PSHU	D
	LBSR	ADD16S	; result on parameter stack should be $FFFF6788 (and carry set)
	PULU	D,X
	STX	-6,Y
	STD	-8,Y
	LDD	#$A5A5
	PSHU	D
	LBSR	SUB16SI		; result in 2nd variable should be FFFF0D2D (Carry set)
	LDD	-6,Y
	STD	<FINAL+2
	LDD	-8,Y
	STD	<FINAL
	PULS	Y
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP (A7)
***
* (who knows?) <= FP (A6)
***
*
***
* Return stack will always be in pairs:
* [RETADRNN  ]
* [CALLERFMNN]
*
* Return stack after initialization:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS <= RSP
*
* Return stack after link:
* [STKUNDR ]
* [<EMPTYP>]
* [STKUNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>] <= RSP
*
* Parameter stack after initialization, mark:
* [<unknown] <= PSP,FP==<EMPTYP>
*
START	LBSR	INISTKS
	PSHS	U	; link
	TFR	U,Y	; mark in Y (will often not be used).
*
	LBSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	<SSAVE	; restore the monitor stack pointer
	LDU	<USAVE	; restore U
	LDD	<DPSAVE	; restore the monitor DP last
	TFR	A,DP
	SETDP	0	; For lack of a better way to set it.
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	JMP	[$FFFE]	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As a reminder, we've already seen what this kind of code looks like without stack frames.

I'm going to go ahead at some point relatively soon and try to get this example converted to the 6801. It should be concrete enough. But it's got a lot of pointer manipulation the hard way in it, so I'm going to do another chapter on address math first. If you aren't interested in long sequences of INX and DEX, go ahead and move on to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

No comments:

Post a Comment