Wednesday, November 27, 2024

ALPP 02-30 -- Ascending the Right Island -- Split-stack No Frame Example: 6800

Leaving those rubber bricks at the side of the pool, let's keep going down for more treasure.

  Ascending the Right Island --
Split-stack No Frame Example:
6800

(Title Page/Index)

 

At this point, from working through the single-stack example for the 6800 without stack frames, you might be seeing the reasoning behind stack frames. It can be really difficult figuring out where your data is and where it should be heading without some frame of reference, and stack frames do provide a frame of reference when you're deep in the arcane definitions of some routine. 

But building the code to support the stack frames tends to consume time and energy that you'd rather devote to the actual problem at hand, unless your CPU provides high-level support for the frames. It tends to end up a mixed blessing at best, with net costs usually, in my opinion, outweighing benefits, even when your CPU  supports it.

Here on the 6800, we can see those costs most clearly by looking carefully at the code I present here, reading the source code in a text editor while stepping through it in the simulator, and comparing it with the split-stack stack frame version and the single-stack versions. 

Before you get to wondering why anyone wanted to use a stack frame in the first place, it's worth noting that stack frames' utility became especially especially apparent in very large procedures with complex logic. When your procedure extends to hundreds of lines of code (or more) with dozens of variables (or more), you use tools in the assembler to name your local variables by their offset from the frame base pointer, and it helps greatly to manage the complexity. 

And it helps in constructing compilers, especially in the initial "bootstrap stages" of development. The compiler may be able to manage constructing and tearing down the frames more easily than it could handle remembering changing offsets.

But.

The frames get in the way. 

Especially when return addresses are inside the stack frames, they get in the way.

All the benefits of stack frames can, in fact, be found in this simple example of split-stack frameless coding discipline. You might think it's just my opinion, but I'll explain further as we go.

I think the code explains itself, particularly when comparing it to the split-stack example with stack frames and the single-stack example without frames, that we just finished.

One thing that might be a point of interest, I had thought I would use an ADDDX Add double accumulator to X routine in MAIN, 

* Could use this in the single-stack no frames example, too.
LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
ADDDX	STX	XWORK
	ADDB	XWORK+1
	ADDA	XWORK
	STAA	XWORK
	STAB	XWORK+1
	LDX	XWORK
	RTS	

to calculate the effective address of the variable that we are passing, but it worked out to be a wash. Took almost as much code to set it up as to just do it there in place.

Read the code, step through it, compare to what we've worked through so far. Note in particular how we are passing the return values back here, and how it is different from the way we use when working with various kinds of stack frames, and even different from the method of the frameless single-stack discipline:

* 16-bit addition as example of split-stack frame-free discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	6	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+6	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDAA	LB_BASE		; bootstrap own return stack
	LDAB	LB_BASE+1
	LDX	#SSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1		; initial return stack pointer
	STAA	XWORK
*
	LDX	#SSTKNDR	; for fake return address
	STX	DWORK		; save it for a moment
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own return stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X	; prime the return stack with error handler
	STAA	2,X	; second fake return to error handler
	STAB	3,X
* 
	LDAA	LB_BASE		; bootstrap parameter stack
	LDAB	LB_BASE+1
	LDX	#PSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; initial parameter stack pointer
	ADCA	XWORK
	STAA	PSP		; parameter stack now ready
	STAB	PSP+1
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	LDX	#HBASEX		; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; calculat EA
	ADCA	XWORK
	STAA	HPPTR
	STAB	HPPTR+1
	STAA	HPALL		; as if the heap were functional
	STAB	HPALL+1
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
	STAA	HPLIM
	STAB	HPLIM+1
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is only the return address
* (and maybe extremely ephemeral temporaries):
* [PRETADR   ]
* [RETADR    ]
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
*
* Could use this in the single-stack no frames example, too.
*LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
*ADDDX	STX	XWORK
*	ADDB	XWORK+1
*	ADDA	XWORK
*	STAA	XWORK
*	STAB	XWORK+1
*	LDX	XWORK
*	RTS	
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.s
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI6	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI4	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI2	DEX		; PPSHD usually costs less.
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
	RTS
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
* two bytes
*ALCL2	CLRA
*	CLRB
*	LDX	PSP
*	BRA	ALCLI2
*
*
PDROP8	LDAB	#8	; saves two bytes, 7 vs. 3
PDROP_B	CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP	ADDB	PSP+1
	ADCA	PSP
	STAA	PSP
	STAB	PSP+1
	LDX	PSP	; return with X ready
	RTS
*
PDROP6	LDAB	#6
	BRA	PDROP_B	
*
PDROP4	LDAB	#4
	BRA	PDROP_B	
*
PDROP2	LDAB	#2	; JSR is 3 bytes, LDX PSP; INX; INX; STX PSP is 6
	BRA	PDROP_B	
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ]
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDX	PSP
	LDAB	#(-1)	; default negative
	TBA
	JSR	ALCLI4	; allocate 2 temporary cells and init (leaves PSP in X)
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDAA	6,X	; left hand 
	LDAB	7,X
	ADDB	5,X	; right hand
	ADCA	4,X
	STAA	6,X	; store low half
	STAB	7,X
	LDAA	2,X
	LDAB	3,X
	ADCB	1,X
	ADCA	0,X
	STAA	4,X	; store high half
	STAB	5,X
	JSR	PDROP4
	RTS
*
* The alternative, without link, mark, or restore?
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 2 16-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDAA	2,X	; left
	LDAB	3,X
	ADDB	1,X	; add right
	ADCA	0,X
	STAA	2,X	; save low in left side
	STAB	3,X
	LDAB	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL)
	STAB	1,X	; re-use right side to store high half
	CLR	0,X	; only bit 8 can be affected
	RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDAB	#(-1)	; make a temporary -1
	TBA
	JSR	PPSHD	; default to signed (leaves PSP in X)
	TST	2,X	; test high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; get pointer to target
	LDAA	2,X	; target low
	LDAB	3,X
	LDX	PSP
	ADDB	3,X	; parameter
	ADCA	2,X
	LDX	4,X	: pointer to target
	STAA	2,X	; update low half with result
	STAB	3,X
	LDAA	0,X	; target, high half
	LDAB	1,X
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; target
	STAA	0,X	; update high half
	STAB	1,X
	JSR	PDROP6	; drop temporary and parameters
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
*
* Parameter stack after mark and local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
	LDAA	#$12
	LDAB	#$34
	JSR	PPSHD
	LDAA	#$CD
	LDAB	#$EF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDAA	#$87	; ADD16U leaves PSP in X
	LDAB	#$65
	STAA	0,X	; reuse low half of result space, overwrite high half
	STAB	1,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDAA	2,X	; result low half -- ADD16S leaves PSP in X
	LDAB	3,X	; put result away
	STAA	6,X	; to 2nd local variable low half
	STAB	7,X
	LDAA	0,X	; result high half
	LDAB	1,X
	STAA	4,X	; to 2nd local variable high half
	STAB	5,X
	STX	XWORK	; instead of JSR ADDDX: 
	LDAB	XWORK+1	; LDAB #4; CLRA; JSR ADDDX; LDX PSP; STAB 3,X; STAA 2,X
	LDAA	XWORK	; Moving results around takes a lot of code,
	ADDB	#4 	; So just do it here.
	ADCA	#0
	STAB	3,X
	STAA	2,X
	LDAA	#$A5
	TAB		; don't really need to use both, just making things clear.
	STAA	0,X
	STAB	1,X
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDAA	2,X	; 2nd variable low half -- ADD16SI leaves PSP in X
	LDAB	3,X
	LDX	LB_BASE
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
	JSR	PDROP8	; ADD16SI also dropped its arguments for us, so only locals
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
***
* Return stack will only contain return addresses (and very ephemeral temporaries):
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= RSP
*
*
* Parameter stack after initialization:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As always, I have tested this code, and it produces the correct results without stack frames, passing both input and return parameters on the stack, except for utility routines which use lower level register protocols not available to higher-level routines. 

I will be pointing you back here later. If this talk about stack frames and parameter passing methods seems a little fuzzy at this point, it's okay to move ahead for now.

You may want to move ahead with getting numeric output in binary, or you might want to see how single-stack, no-frame parameter passing works on the 6801, next.


(Title Page/Index)


 

 

 

 

No comments:

Post a Comment