Friday, November 29, 2024

ALPP 02-32 -- More Ascending the Right Island -- Split-stack No Frame Example: 6801

Still digging into that treasure from the bottom of the pool.

  Ascending the Right Island --
Split-stack No Frame Example:
6801

(Title Page/Index)

 

About the only thing I want to point out here is that, with the support for 16-bit operations on the 6801, it becomes easier to see how splitting the return address allows a more seamless approach to passing parameters than the single-stack no-frame example we just finished. 

Hopefully the code is mostly self-explanatory by now. (We've been looking at the meat of it for so long ...)

Compare with both the single-stack example for the 6801 and the split-stack example for the 6800 to help see what is and is not going on.

As always, read the code and step through it:

* 16-bit addition as example of split-stack frame-free discipline on 6801
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	4	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+8	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE		; bootstrap own return stack
	ADDD	#SSTKBSX
	STD	XWORK
	LDX	XWORK		; initial return stack pointer
*
	LDD	#SSTKNDR
	STD	0,X	; in the cell beyond empty stack pointer
	STD	2,X	; and the next cell, for good measure
	PULA		; pop real return address
	PULB
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDD	LB_BASE		; bootstrap parameter stack
	ADDD	#PSTKBSX
	STD	PSP		; parameter stack now ready
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	ADDD	#HBASEX
	STD	HPPTR
	STD	HPALL		; as if the heap were functional
	LDD	#CDBASE
	SUBD	#4
	STAA	HPLIM
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is just the return address:
* [PRETADR   ]
* [RETADR    ] <= SP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
*
* Utility routines
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
ALCLI2	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STD	0,X
ALCLI6	DEX
	DEX
	STD	0,X
ALCLI4	DEX
	DEX
	STD	0,X
	BRA	ALCLI2
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ] <= SP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after entry (before temporary allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDD	#(-1)	; default negative
	JSR	ALCLD4	; returns with PSP in X
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDD	6,X	; left hand 
	ADDD	4,X	; right hand
	STD	6,X	; store low half
	LDD	2,X
	ADCB	1,X
	ADCA	0,X
	STD	4,X
*
	LDAB	#4	; shorter and faster than 4*INX, walks on B
	ABX
	STX	PSP	; drop the temporaries
	RTS
*
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDD	2,X	; left
	ADDD	0,X	; add right
	STD	2,X	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	0,X	; re-use right side to store high half
*
	RTS		; PSP unchanged
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after after entry (before temporary allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDD	#(-1)	; make a temporary -1
	JSR	PPSHD	; (default to signed) returns with PSP in X, 2 bytes on stack
	TST	2,X	; test parameter high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; pointer to caller's local
	LDD	2,X	; caller's 2nd variable, low
	LDX	PSP
	ADDD	2,X	; parameter
	LDX	4,X	; pointer
	STD	2,X	; update low half with result
	LDD	0,X	; 2nd variable, high half
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; pointer
	STD	0,X	; update high half
*
	LDX	PSP
	LDAB	#6	; drop sign temporary and two parameters
	ABX
	STX	PSP
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= SP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
*
	LDD	#$1234
	JSR	PPSHD
	LDD	#$CDEF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDX	PSP	; order is okay, low half where we want it (PSP returned in X anyway)
	LDD	#$8765	; reuse high half
	STD	0,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDX	PSP	; (PSP returned in X anyway)
	LDD	2,X	; result low half
	STD	6,X	; to 2nd local variable low half
	LDD	0,X	; result high half
	STD	4,X	; to 2nd local variable high half
	LDD	PSP	; address of 2nd local variable
	ADDD	#4
	STD	2,X	; pointer is 1st arg
	LDD	#$A5A5
	STD	0,X	; 1st arg
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDX	PSP	: unnecssary, ...
	LDD	2,X	; 2nd variable low half
	LDX	LB_BASE
	STD	FINALX+2,X
	LDX	PSP
	LDD	0,X
	LDX	LB_BASE
	STD	FINALX,X
*
	LDD	PSP
	ADDD	#8	; deallocate the locals
	STD	PSP
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will be just the return address:
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= SP
*
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Again, I have tested the code and it produces the correct results without stack frames.

If you think you've seen enough for now, go ahead and move ahead with getting numeric output in binary. Otherwise, I'll be cleaning the stack frame support code out of the 6809 examples next. 

[JMR202412050835 daydream addendum:]

Before we leave this topic behind, if you've been following what I've been talking about on this detour, I could mention a bit more of my daydreams.

I think I've mentioned it in passing, but I have often thought it was unfortunate that Motorola didn't push the opcodes around a bit and keep direct mode address for the unary operator -- INC, ROL, TST, etc. -- instructions. (They did so on the 6809.) 

In fact, I would have preferred that they had kept direct page and left out extended mode for unary instructions. (They did exactly that for the 6805.)

"Unary" operators on the 68XX CPUs are mostly read-modify-write instructions that would benefit greatly, in terms of timing and object code efficiency, from having short-addressed versions, and they would also help make the direct page area even more of a psuedo-register memory file.

But we didn't really understand principles of locality in coding back then, so we can, shifting ourselves back to the context of the 1960s and '70s, understand why they saw it as a reasonable tradeoff, and why they wanted to leave as many op-codes as possible available for "inherent" mode operators that didn't seem to fit the unary/binary operator partition they were using -- like Add B to A (ABA), et. al.

If they had, or if, in producing the 6801 as an object-code compatible upgrade to the 6800, they had been willing to produce a mnemonic-level compatible object-code incompatible version of the 6801 with direct-page versions of the the unary operators -- daydream warning! -- it should have been possible to shave at least two cycles off the timing, compared to the 6801's extended mode timing (6 cycles extended, vs. potentially 4 cycles direct-page), giving more meaning to the idea of pseudo-registers -- or making the direct page more of a static cache. 

And if the RAM were going to be built-in (as it pretty much always was in 6801 SOCs), it might even have been possible to shave off yet another cycle, bringing DP variables within a cycle of accumulator timing.

And ... well, the 6801 has 16-bit shifts of the double accumulator,  so why not have 16-bit shifts and increments/decrements for direct page variables? Yeah, maybe that's just being greedy.

And, then, here's yet another step out into alternate reality -- a couple of extra address lines (48-pin DIP packages?) for address space, and it would be possible to distinguish between accessing code, data, stack, and the direct page, helping expand the address range beyond the tight squeeze of 64K.

Erk. Lost in my daydreams again. No wonder it takes me so long to get things done.

Okay, moving on to the 6809 examples, or skipping ahead to getting numeric output in binary.

[JMR202412050835 daydream addendum end.]


(Title Page/Index)


 

 

 

 

Thursday, November 28, 2024

ALPP 02-31 -- More Looking in the Rear-view Mirror -- Single-stack No Frame Example: 6801

More treasure from the bottom of the pool.

  More Looking in the Rear View Mirror --
Single-stack No Frame Example:
6801

(Title Page/Index)

 

Not much to say that I haven't already said. We've seen frameless for the 6800, both the single-stack frameless discipline of one chapter back and the split-stack frameless discipline that we just finished. I'm not sure but what I should leave the 6801, 6809, and 68000 versions as exercises for the interested reader, but I'm a sucker for easy puzzles, so I'll post them anyway. There are plenty of things an interested reader can think of to try for him- or herself.

One thing to pay attention to as you go through is the fact that I have left the utility routines out. Doing them in-line is not that much more code than a JSR, and I didn't want to hide what's going on. That's how much of an improvement the 6801 is over the 6800.

The down side of doing it in-line (by hand) is that there are more opportunities for mistakes.

Go ahead and read the code and compare, and if you are not sure you understand what's going on, single-step through the code.
* 16-bit addition as example of single-stack no frame discipline on 6801,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; buffer
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+8
STKBAS	RMB	4	; for canary return
STKBASX	EQU	STKLIMX+192
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+4	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE		; local space functional
	LDD	LB_BASE		; bootstrap own stack
	ADDD	#STKBASX
	STD	XWORK	; avoid using BIOS stack
	LDX	XWORK	; ready own stack pointer
*
	PULA		; pop real return address
	PULB
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts, utility routines
*
	LDD	#STKUNDR
	STD	0,X	; in the cell beyond empty stack pointer
	STD	2,X	; and the next cell, for good measure
*
	LDD	LB_BASE	
	ADDD	#HBASEX
	STD	HPPTR		; as if we were ready to use heap
	STD	HPALL
	LDD	#CDBASE
	SUBD	#4
	STD	HPLIM
	RTS		; finally done, now can return
*
***
* Not generating a stack frame
*
* Cross-section of general stack structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing nesting for routine 3, in-flight:
* [RETADR1 ] 
* [LOCVAR2 ]
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines left out
*
* Let the caller do allocation after.
*
* Stack at entry, before allocation
* when functions are called by MAIN
* with two 32-bit parameters
* We will return result in D:X
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2]
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit D:X D high, X low
* Does not alter the parameters.
ADD16S	TSX		; no local allocations
	LDAA	#(-1)	; prepare for sign extension
	TST	4,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	2,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions (4 temporary bytes on stack)
	LDD	8,X	; left-hand low cell
	ADDD	6,X	; right-hand low cell
	STD	XWORK	; save low half of result
	LDD	2,X	; left-hand extension
	ADCB	1,X	; right-hand extension
	ADCA	0,X	; high half done
*
	INS		; fastest to just drop the temporaries
	INS
	INS
	INS
	LDX	XWORK	; get low half of result
	RTS		; result is in D:X
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit D:X D high
ADD16U	TSX		; no local allocations
	LDD	4,X	; left
	ADDD	2,X	; right
	STD	XWORK	; save low half
	LDD	#0
	ADCB	#0
*
	LDX	XWORK	; get low half of result
	RTS		; result is in D:X
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] 
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ] <= SP (return stack pointer (6800 S is byte below))
*
* To show how to access caller's local through pointer
* instead of walking stack --
* Add 16-bit signed parameter
* to 32 bit caller's 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	TSX		; no local allocations up front
	LDAA	#(-1)
	TST	2,X	; high byte of paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half (2 temporary bytes on stack)
	PSHA
	LDX	4,X	; get caller's pointer
	LDD	2,X	; caller's 2nd variable, low
	TSX
	ADDD	4,X	; parameter
	LDX	6,X	; caller's pointer
	STD	2,X	; save result low half away
	LDD	0,X	; caller's 2nd variable, high
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	6,X	; caller's pointer
	STD	0,X	; save result high half away
*
	INS		; drop temporary 
	INS
	RTS		; no result to load
*
*
***
* Stack after local allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	LDX	#0
	PSHX		; four pushes is only one byte more than a call. 
	PSHX
	PSHX
	PSHX
*
	LDX	#$1234	; parameters
	PSHX
	LDX	#$CDEF
	PSHX
	JSR	ADD16U	; result in D:X should be $E023
	INS		; could reuse instead of dropping
	INS
	INS
	INS
	PSHX		; low half
	LDX	#$8765
	PSHX
	JSR	ADD16S	; result in D:X should be $FFFF6788
	STX	XWORK
	STD	DWORK
	INS		; could reuse instead of dropping
	INS
	INS
	INS
	TSX
	LDD	XWORK
	STD	2,X
	LDD	DWORK
	STD	0,X
*	LDAB	#0	; calculate pointer
*	ABX		; would use ABX here if there were an offset.
	PSHX
	LDX	#$A5A5
	PSHX
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	INS		; drop parameters
	INS
	INS
	INS
	TSX
	LDD	2,X		; low half
	LDX	LB_BASE		; store it in FINAL, in process local space
	STD	FINALX+2,X
	TSX
	LDD	0,X		; high half
	LDX	LB_BASE
	STD	FINALX,X
*
	TSX
	LDAB	#8
	ABX
	TXS
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
***
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

If you've seen enough binary output is still waiting. (And it will still be waiting in a few more hours or days, really.) 

If not, split stack with no stack frames is also great on the 6801, even a bit better than what we saw here.

 -- 

Maybe this would be a good place to bring up (again?) the regrets I have that Motorola didn't include a SBX subtract B from X instruction in the 6801. It would have been useful in the stack allocation code as you can see from where I used (and didn't use) ABX. It would also have been useful to have an add immediate to index AIX op-code, possibly 16-bit to do both allocation and deallocation, or signed 8-bit, or unsigned, paired with a subtract immediate from X (SIX?) instruction.

Yeah, more daydreams. Sorry. --


 (Title Page/Index)

 


 

 

Wednesday, November 27, 2024

ALPP 02-30 -- Ascending the Right Island -- Split-stack No Frame Example: 6800

Leaving those rubber bricks at the side of the pool, let's keep going down for more treasure.

  Ascending the Right Island --
Split-stack No Frame Example:
6800

(Title Page/Index)

 

At this point, from working through the single-stack example for the 6800 without stack frames, you might be seeing the reasoning behind stack frames. It can be really difficult figuring out where your data is and where it should be heading without some frame of reference, and stack frames do provide a frame of reference when you're deep in the arcane definitions of some routine. 

But building the code to support the stack frames tends to consume time and energy that you'd rather devote to the actual problem at hand, unless your CPU provides high-level support for the frames. It tends to end up a mixed blessing at best, with net costs usually, in my opinion, outweighing benefits, even when your CPU  supports it.

Here on the 6800, we can see those costs most clearly by looking carefully at the code I present here, reading the source code in a text editor while stepping through it in the simulator, and comparing it with the split-stack stack frame version and the single-stack versions. 

Before you get to wondering why anyone wanted to use a stack frame in the first place, it's worth noting that stack frames' utility became especially especially apparent in very large procedures with complex logic. When your procedure extends to hundreds of lines of code (or more) with dozens of variables (or more), you use tools in the assembler to name your local variables by their offset from the frame base pointer, and it helps greatly to manage the complexity. 

And it helps in constructing compilers, especially in the initial "bootstrap stages" of development. The compiler may be able to manage constructing and tearing down the frames more easily than it could handle remembering changing offsets.

But.

The frames get in the way. 

Especially when return addresses are inside the stack frames, they get in the way.

All the benefits of stack frames can, in fact, be found in this simple example of split-stack frameless coding discipline. You might think it's just my opinion, but I'll explain further as we go.

I think the code explains itself, particularly when comparing it to the split-stack example with stack frames and the single-stack example without frames, that we just finished.

One thing that might be a point of interest, I had thought I would use an ADDDX Add double accumulator to X routine in MAIN, 

* Could use this in the single-stack no frames example, too.
LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
ADDDX	STX	XWORK
	ADDB	XWORK+1
	ADDA	XWORK
	STAA	XWORK
	STAB	XWORK+1
	LDX	XWORK
	RTS	

to calculate the effective address of the variable that we are passing, but it worked out to be a wash. Took almost as much code to set it up as to just do it there in place.

Read the code, step through it, compare to what we've worked through so far. Note in particular how we are passing the return values back here, and how it is different from the way we use when working with various kinds of stack frames, and even different from the method of the frameless single-stack discipline:

* 16-bit addition as example of split-stack frame-free discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	6	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+6	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDAA	LB_BASE		; bootstrap own return stack
	LDAB	LB_BASE+1
	LDX	#SSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1		; initial return stack pointer
	STAA	XWORK
*
	LDX	#SSTKNDR	; for fake return address
	STX	DWORK		; save it for a moment
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own return stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X	; prime the return stack with error handler
	STAA	2,X	; second fake return to error handler
	STAB	3,X
* 
	LDAA	LB_BASE		; bootstrap parameter stack
	LDAB	LB_BASE+1
	LDX	#PSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; initial parameter stack pointer
	ADCA	XWORK
	STAA	PSP		; parameter stack now ready
	STAB	PSP+1
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	LDX	#HBASEX		; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; calculat EA
	ADCA	XWORK
	STAA	HPPTR
	STAB	HPPTR+1
	STAA	HPALL		; as if the heap were functional
	STAB	HPALL+1
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
	STAA	HPLIM
	STAB	HPLIM+1
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is only the return address
* (and maybe extremely ephemeral temporaries):
* [PRETADR   ]
* [RETADR    ]
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
*
* Could use this in the single-stack no frames example, too.
*LEADPX	LDX	PSP	; Add D to PSP and load to X
* Add A:B to X through XWORK.
* Returned in both X and A:B.
* But we won't use it, after all.
*ADDDX	STX	XWORK
*	ADDB	XWORK+1
*	ADDA	XWORK
*	STAA	XWORK
*	STAB	XWORK+1
*	LDX	XWORK
*	RTS	
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.s
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI6	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI4	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI2	DEX		; PPSHD usually costs less.
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
	RTS
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
* two bytes
*ALCL2	CLRA
*	CLRB
*	LDX	PSP
*	BRA	ALCLI2
*
*
PDROP8	LDAB	#8	; saves two bytes, 7 vs. 3
PDROP_B	CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP	ADDB	PSP+1
	ADCA	PSP
	STAA	PSP
	STAB	PSP+1
	LDX	PSP	; return with X ready
	RTS
*
PDROP6	LDAB	#6
	BRA	PDROP_B	
*
PDROP4	LDAB	#4
	BRA	PDROP_B	
*
PDROP2	LDAB	#2	; JSR is 3 bytes, LDX PSP; INX; INX; STX PSP is 6
	BRA	PDROP_B	
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ]
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDX	PSP
	LDAB	#(-1)	; default negative
	TBA
	JSR	ALCLI4	; allocate 2 temporary cells and init (leaves PSP in X)
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDAA	6,X	; left hand 
	LDAB	7,X
	ADDB	5,X	; right hand
	ADCA	4,X
	STAA	6,X	; store low half
	STAB	7,X
	LDAA	2,X
	LDAB	3,X
	ADCB	1,X
	ADCA	0,X
	STAA	4,X	; store high half
	STAB	5,X
	JSR	PDROP4
	RTS
*
* The alternative, without link, mark, or restore?
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 2 16-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDAA	2,X	; left
	LDAB	3,X
	ADDB	1,X	; add right
	ADCA	0,X
	STAA	2,X	; save low in left side
	STAB	3,X
	LDAB	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL)
	STAB	1,X	; re-use right side to store high half
	CLR	0,X	; only bit 8 can be affected
	RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after mark (no local allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDAB	#(-1)	; make a temporary -1
	TBA
	JSR	PPSHD	; default to signed (leaves PSP in X)
	TST	2,X	; test high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; get pointer to target
	LDAA	2,X	; target low
	LDAB	3,X
	LDX	PSP
	ADDB	3,X	; parameter
	ADCA	2,X
	LDX	4,X	: pointer to target
	STAA	2,X	; update low half with result
	STAB	3,X
	LDAA	0,X	; target, high half
	LDAB	1,X
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; target
	STAA	0,X	; update high half
	STAB	1,X
	JSR	PDROP6	; drop temporary and parameters
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= RSP
*
*
* Parameter stack after mark and local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
	LDAA	#$12
	LDAB	#$34
	JSR	PPSHD
	LDAA	#$CD
	LDAB	#$EF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDAA	#$87	; ADD16U leaves PSP in X
	LDAB	#$65
	STAA	0,X	; reuse low half of result space, overwrite high half
	STAB	1,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDAA	2,X	; result low half -- ADD16S leaves PSP in X
	LDAB	3,X	; put result away
	STAA	6,X	; to 2nd local variable low half
	STAB	7,X
	LDAA	0,X	; result high half
	LDAB	1,X
	STAA	4,X	; to 2nd local variable high half
	STAB	5,X
	STX	XWORK	; instead of JSR ADDDX: 
	LDAB	XWORK+1	; LDAB #4; CLRA; JSR ADDDX; LDX PSP; STAB 3,X; STAA 2,X
	LDAA	XWORK	; Moving results around takes a lot of code,
	ADDB	#4 	; So just do it here.
	ADCA	#0
	STAB	3,X
	STAA	2,X
	LDAA	#$A5
	TAB		; don't really need to use both, just making things clear.
	STAA	0,X
	STAB	1,X
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDAA	2,X	; 2nd variable low half -- ADD16SI leaves PSP in X
	LDAB	3,X
	LDX	LB_BASE
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
	JSR	PDROP8	; ADD16SI also dropped its arguments for us, so only locals
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
***
* Return stack will only contain return addresses (and very ephemeral temporaries):
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= RSP
*
*
* Parameter stack after initialization:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As always, I have tested this code, and it produces the correct results without stack frames, passing both input and return parameters on the stack, except for utility routines which use lower level register protocols not available to higher-level routines. 

I will be pointing you back here later. If this talk about stack frames and parameter passing methods seems a little fuzzy at this point, it's okay to move ahead for now.

You may want to move ahead with getting numeric output in binary, or you might want to see how single-stack, no-frame parameter passing works on the 6801, next.


(Title Page/Index)


 

 

 

 

Sunday, November 24, 2024

ALPP 02-29 -- Putting That Wrong Island in the Rear-view Mirror -- Single-stack No Frame Example: 6800

So we've got most of those rubber bricks off the bottom of the pool but there's some treasure down there, too.

  Putting That Wrong Island in the Rear View Mirror --
Single-stack No Frame Example:
6800

(Title Page/Index)

 

As I have said, even though we’ve just looked at an example of how split-stack stack frames can be done on the 6800 and we've even seen a parallel example of single-stack stack frames on the same, I do not recommend stack frames. 

But I think I have made it clear that, if you have to do stack frames, I recommend split-stack over single-stack.

In this chapter we are going to look at the same functional example of three kinds of addition using a single stack without a stack frame.

Single-stack no frame, if you are allowed to do it and learn how to do it right, will produce cleaner, more optimal code than single-stack with stack frames.

But I'm going to repeat myself. I cannot recommend this. You have to track what is on that stack, and the return address just gets in the way of your calculations and your memory. It's a bit (16 bits on the 6800) of distracting data that isn't relevant to the calculations the function is doing, and every time you look for something on the stack, it either sticks out like a sore thumb, distracting you, or you forget it's there and miss what you are aiming at. And walk on it. Or try to get it from where it isn't and end up executing data or garbage instead of instructions.

We have to acknowledge is that, without the frame pointer(s), we end up having to track how much of what we have on the stack at any particular point in the code.

But we have to keep track of that anyway, really, even though a frame pointer can help. If we don't know what's there, we don't know where we've put things, and that's a terrible state for a program (and a programmer) to be in -- and that's one reason people avoid reading the assembly language output of compilers.

Just looking at the code below, you may not see how much we've ripped out -- that's because we've been hiding what we could in subroutines. But tracing through the code should feel rather different, because you can hide code from the programmer, but you can't hide it from the processor.

You'll really want to compare the code with the stack frame version, and re-read the code and the comments. Take time to trace through both, watching the source as you do.

* 16-bit addition as example of single-stack discipline sans stack frame on 6800,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for user stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily in leaf functions only
DWORK	RMB	2	; For saving D temporarily in leaf functions only
RETVHI	RMB	2	; high half of 32-bit return values (because we can't push X easily)
RETVLO	RMB	2	; 16-bit return values and low half (because loading and saving is redundant)
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; buffer
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+8
STKBAS	RMB	8	; for canary return
STKSZ	EQU	192	; for EXORsim assembler limits
STKBASX	EQU	STKLIMX+192	; must be STKLIMX+STKSZ -- assembler won't take symbol
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+8	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*STKBASM	FDB	STKBASX	; Doesn't work within EXORsim assembler limits after all
*HBASEXM	FDB	HBASEX	; by avoiding splitting large constants up at assemble time
*
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE		; local space functional
	LDAA	LB_BASE		; bootstrap own stack
	LDAB	LB_BASE+1
*	ADDB	STKBASM+1
*	ADCA	STKBASM
	LDX	#STKBASX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
*
	STAB	XWORK+1		; initial stack pointer
	STAA	XWORK
*
	LDX	#STKUNDR	; for fake return address
	STX	DWORK		; save it for a moment
*	
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts, utility routines
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X
	STAA	2,X	; and the next cell, for good measure
	STAB	3,X
*
	LDAA	LB_BASE	
	LDAB	LB_BASE+1
	PSHB
	PSHA
*	JSR	PSH16I 
*	FDB	HBASEX	; EXORsim's interactive assembler doesn't like FDBs.
	LDX	#HBASEX
	JSR	SPSHX
*
	JSR	UADD16
	STAA	HPPTR		; as if we were ready to use heap
	STAB	HPPTR+1
	STAA	HPALL
	STAB	HPALL+1
*	JSR	PSH16I	; FDBs
*	FDB	CDBASE
*	JSR	PSH16I
*	FDB	(-4)		; extra bumper
*	JSR	UADD16
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
*
	STAA	HPLIM
	STAB	HPLIM+1
	RTS		; finally done, now can return
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame, 
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [LOCVAR2 ]
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Push low half of return value
* (Didn't use it there, don't use it here.)
PSHLH	TSX
	LDAA	0,X		; return address
	LDAB	1,X
	PSHB
	PSHA
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X
	RTS
*
* Avoid the math to split 16-bit constants into two 8-bit loads,
* and push them while we are here.
* The constant follows the call in the instruction stream.
* Leaves constant in A:B, as well.
PSH16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream	
	INS		; drop the return address we almost have in X
	INS
	PSHB		; replace it with the constant
	PSHA
	JMP	2,X	; return to the byte after the constant.
*
* 8 bytes for the meat of this vs. 3 for the call.
* We end up using it a lot since EXORsim's interactive assembler doesn't do FDBs.
SPSHX	STX	XWORK
	DES
	DES
	TSX
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	LDAA	XWORK
	LDAB	XWORK+1
	STAA	2,X
	STAB	3,X
	RTS
*
* 6 bytes for the meat of this vs. 3 for the call, instead of FDB
* (Didn't use it there, don't use it here.)
TXD	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	RTS
*
* Utility 16-bit add, leave result in A:B
UADD16	TSX		; no frame
	LDAB	5,X	; left
	ADDB	3,X	; right		; because we can
	LDAA	4,X	; left
	ADCA	2,X	; right
	LDX	0,X
UADROP	INS		; drop return address and parameters
	INS
	INS
	INS
	INS
	INS
	JMP	0,X	; return via X
*
* Utility 16-bit sub, leave result in A:B
* (Didn't use it there, don't use it here.)
USUB16	TSX		; no frame
	LDAB	5,X	; left
	SUBB	3,X	; right		; because we can
	LDAA	4,X	; left
	SBCA	2,X	; right
	LDX	0,X
	BRA	UADROP	; drop return address and parameters
*
*
* We really don't want to put S in a temp if we can avoid it
ALOCS8	PULA
	PULB
ALOS8I	DES
	DES
ALOS6I	DES
	DES
ALOS4I	DES
	DES
ALOS2I	DES
	DES
	PSHB
	PSHA
	RTS
*
ALOCS6	PULA
	PULB
	BRA	ALOS6I
*
ALOCS4	PULA
	PULB
	BRA	ALOS4I
*
ALOCS2	PULA
	PULB
	BRA	ALOS2I
*
INI0_8	CLRA
	CLRB
* call with initialization value in A:B
INIS8	TSX
INIT8	STAA	8,X
	STAB	9,X
INIT6	STAA	6,X
	STAB	7,X
INIT4	STAA	4,X
	STAB	5,X
INIT2	STAA	2,X
	STAB	3,X
	RTS		; 0,X is return address!
*
INI0_6	CLRA
	CLRB
* call with initialization value in A:B
INIS6	TSX
	BRA	INIS6
*
INI0_4	CLRA
	CLRB
* call with initialization value in A:B
INIS4	TSX
	BRA	INIS4
*
INI0_2	CLRA
	CLRB
* call with initialization value in A:B
INIS2	TSX
	BRA	INIS2
*
DROP8	PULA
	PULB
	INS
	INS
DROP6I	INS
	INS
	INS
	INS
	INS
	INS
	PSHB
	PSHA
	RTS
*
DROP6	PULA
	PULB
	BRA	DROP6I
*
*
* Stack at entry
* when functions are called by MAIN
* with two parameters
* We will return results in RETVHI:RETVLO in direct page
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit in RETVHI:RETVLO
* Does not alter the parameters.
ADD16S	TSX		; no local variables
	LDAA	#(-1)	; prepare for sign extension
	TST	4,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension (only need one byte, though, really)
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	2,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions
	LDAA	8,X	; left-hand low cell
	LDAB	9,X
	ADDB	7,X	; right-hand low cell
	ADCA	6,X
	STAA	RETVLO	; save low half of result
	STAB	RETVLO+1
	LDAA	2,X	; left-hand extension
	LDAB	3,X
	ADCB	1,X	; right-hand extension
	ADCA	0,X
	STAA	RETVHI	; Save high half of result
	STAB	RETVHI+1
	INS		; drop sign extension temporaries
	INS		; 4 INS is one byte more than JSR DROP4
	INS
	INS
	RTS		; result is in RETVLO:RETVHI
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit RETVLO:RETVHI
ADD16U	TSX		; no local allocations
	LDAA	4,X	; left
	LDAB	5,X
	ADDB	3,X	; right
	ADCA	2,X
	STAA	RETVLO	; save low half
	STAB	RETVLO+1
	LDAB	#0
	ADCB	#0
	STAB	RETVHI+1	; save carry bit in high half
	CLR	RETVHI		; will never carry beyond bit 17
	RTS
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1] <= PARAM2_1
* [32:VAR1_2]
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ] 
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameters:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	TSX		; no own local variables 
	LDAA	#(-1)
	TST	2,X	; high byte of addend paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half
	PSHA
	LDX	4,X	; get pointer to target
	LDAA	2,X	; target low
	LDAB	3,X
	TSX		; SP[ sign, retadr, addend, long ptr ]
	ADDB	5,X	; addend parameter (stack is two lower, now)
	ADCA	4,X
	LDX	6,X	; target pointer
	STAA	2,X	; save result low half away
	STAB	3,X
	LDAA	0,X	; target high half
	LDAB	1,X
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	6,X	; target
	STAA	0,X	; save result high half away
	STAB	1,X
	INS		; three bytes for INS and RTS vs. two bytes for branch
	INS
	RTS		; no result to load
*
*
***
* Stack after variable allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	JSR	ALOCS8	; 2 calls, 6 bytes vs. 1 clr + 8 pushes , 9 bytes
	JSR	INI0_8
	TSX
*
	JSR	PSH16I
*	FDB	$1234	; parameters
	FCB	$12
	FCB	$34
	JSR	PSH16I
*	FDB	$CDEF
	FCB	$CD
	FCB	$EF
	JSR	ADD16U	; result in RETVHI:RETVLO should be $E023
	INS		; drop one parameter, reuse other
	INS
	TSX
	LDAA	RETVLO	; four extra bytes compared to calling PSHLH
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X	
	JSR	PSH16I
*	FDB	$8765
	FCB	$87
	FCB	$65
	JSR	ADD16S	; result in RETVHI:RETVLO should be $FFFF6788
	TSX		; reuse both parameters
	LDAA	RETVHI
	LDAB	RETVHI+1
	STAA	4,X		; 2nd local variable high half
	STAB	5,X
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	6,X
	STAB	7,X
	STX	XWORK	; calculate address of second variable
	LDAB	XWORK+1
	ADDB	#4
	STAB	3,X
	LDAA	XWORK
	ADCA	#0	; don't lose the carry
	STAA	2,X
	LDAB	#$A5
	STAB	0,X	; $A5
	STAB	1,X	; $A5A5
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	INS			; drop the parameters
	INS
	INS
	INS
	TSX
	LDAA	2,X		; low half
	LDAB	3,X
	LDX	LB_BASE		; store it in FINAL, in process local space
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	TSX
	LDAA	0,X		; high half
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
	JSR	DROP8
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
*
***
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

I probably spent a good six hours or more figuring out all the places I had messed up the offsets and lost track of what was on the stack. Sure, that was because I'd quit using the frame pointers for reference. It was also  because I was running low on sleep. But it was more so because of that distracting presence of the return addresses right in the middle of the data.

If you haven't traced through this code, do so. Otherwise, you won't really believe me.

And then go take a look at the split-stack version of this.

[JMR202411260841 addendum:]

Speaking of the split-stack version, while working through that, I realized I could have used a load effective address routine here for calculating the address of the second local variable in MAIN, something like

* Add D to S and load to X as a pointer
LEADSX	TSX	; make it a pointer
	INX	; adjust for return address the cheap way
	INX
	STX	XWORK
	ADDB	XWORK+1
	STAB	XWORK+1
	ADCA	XWORK
	STAA	XWORK
	LDX	XWORK
	RTS

[JMR202411260841 addendum end.]

(Note that, this time, I'm not suggesting you move ahead if you are getting tired. You've come this far, it's only a little farther along this path until you can decide whether I'm a fool for thinking split stack with no stack frames is so great -- or maybe see what I see.)

 (Title Page/Index)

 


 

 

Thursday, November 21, 2024

ALPP 02-28 -- Walking the Pontoons -- Split-stack Stack Frame Example: 6800

Well, this should be the last of the rubber bricks at the bottom of the pool, taking, as predicted, a long time, much longer than the 6809 example.

  Walking the Pontoons --
Split-stack Stack Frame Example:
6800

(Title Page/Index)

Now that we have worked out the concrete single-(combined)-stack stack frame example for the 6800, let's get a look at how the split stack discipline/paradigm improves things on this processor. 

The conversion of both the split-stack and single-stack stack frame code from the 68000 to the 6809 was straightforward. 

From the 6809 to the 6801, the single-stack conversion was rather hairy. It surprised me a little that the conversion from the 6801 to the 6800 was also rather hairy. The difference a few instructions makes can be rather dramatic, and the less support you have for address math, the more the combined single stack shows up as a bottleneck

By comparison, the conversion of the split-stack code was a little tricky from the 6809 to the 6801, but quite straightforward from the 6801 to the 6800. This really isn't a surprise, since the big jump from the 6809 to the 6801 is the code to support the software stack, and that code moved to the 6800 is primarily a matter of doing 16-bit additions and subtractions a byte at a time, which we now know is not that hard.

It's mostly just a matter of doing the less-significant byte before the more-significant, and remembering to include the carry on the more significant byte -- mostly just replacing one instruction with two.

Well, and some places where you use PSHX and PULX on the 6801, where on the 6800 you instead need to save X to a temporary and, as immediately as possible, grab it back in one or both accumulators and push it or do math on it. Or the reverse order, popping the address to accumulator(s), saving in the proper order to adjacent temporary bytes, maybe working on it, and loading it back into X as soon as possible. Oh, and make sure the stack pointer gets updated in the appropriate places.

I used both accumulators to make it clear what I was doing, but if you need to keep one accumulator available for something else, the process can be done just as efficiently with one accumulator. Even the addition and subtraction works in the same number of instructions with a single accumulator, you just have to complete moving and/or working on one byte before starting on the other.

This is, as I say, all straightforward -- because you separate the stacks. When you're working on paramaters, the parameter pointer is in X. When you're working on the return stack, you've pulled S into X. When you are pointing at other things, the pointer for that is in X.

When you have the stacks combined, there are times when you find yourself needing to point at two things at once, and finding yourself with only one X. Sure, you can save X off to a temporary, but then you have to be careful how you use that temporary for other things, or the code balloons in your face (if it doesn't just blow up).

Yes, this is a problem in allocating temporary variables and in being disciplined in how you use them, but it's also a problem in optimization against the discipline. You want to make efficient use of the on-CPU resources if you can.

I did hit one snag in the un-mark routine. With the 6801, the split stack made it unnecessary to actually have an un-mark routine, but the 6800 used enough instructions to make it worthwhile. But, in replacing the PULX instruction in that routine, I used INX instead of INS to update the return stack pointer, and lost a couple of hours staring at the code and tracing through it and wondering why the stack pointer ended up pointing to a frame boundary instead of a return address when the code was supposed to be returning from a subroutine.

And that was the hardest problem going from 6801 to 6800 in the split-stack discipline.

I want to note that I realized, while debugging this, that the FINAL variable really should have some bumper space after it. We know our test routines won't get anywhere even close to overwriting it. And it's only used at the end, so, even if a stack overflow overwrote it,  it wouldn't matter. But it still should have the buffer. Or, at least, I should be testing it to be still zero before we store the final value. It's something to keep in mind.

For reference, here's what the two stacks look like, from the discussion of the two-stack discipline on the 6801, the return stack:

* [PRETADR   ]
* [PCALLERFRM]
* [RETADR    ]
* [CALLERFRM ] <= RSP

and the (conceptual) parameter stack:

* [VARIABLES  ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP

again, keeping offsets positive because we don't have the 6809's fancy negative offsets. 

I've left some routines i didn't end up using in the code, don't let that bother you.

So, the code, with the reminder that the bulk of the discussion is in the code itself. Please don't skip reading the code, watching out for comments that I forgot to update after fixing something.
* 16-bit addition as example of split-stack frame-less discipline on 6800
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
ALCOUNT	RMB	1	; for counting utility routines
	RMB	1	; reserved
PSP	RMB	2	; parameter stack pointer
PTR	RMB	2	; frame pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4	; Should've had a bumper after this (but it's only used at the end?)
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+4
SSTKBAS	RMB	6	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+6	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDAA	LB_BASE		; bootstrap own return stack
	LDAB	LB_BASE+1
	LDX	#SSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
	STAB	XWORK+1		; initial return stack pointer
	STAA	XWORK
*
	LDX	#SSTKNDR	; for fake return address
	STX	DWORK		; save it for a moment
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own return stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X	; prime the return stack with error handler
	STAA	4,X	; second fake return to error handler
	STAB	5,X
* 
	LDAA	LB_BASE		; bootstrap parameter stack
	LDAB	LB_BASE+1
	LDX	#PSTKBSX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; initial parameter stack pointer
	ADCA	XWORK
	STAA	PSP		; parameter stack now ready
	STAB	PSP+1
	STAA	FP		; initial frame pointer
	STAB	FP+1
	TSX			; pointing at return address below fake
	STAA	4,X		; empty parameter stack frame pointer for fake frame,
	STAB	5,X		; stacks primed
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	LDX	#HBASEX		; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1		; calculat EA
	ADCA	XWORK
	STAA	HPPTR
	STAB	HPPTR+1
	STAA	HPALL		; as if the heap were functional
	STAB	HPALL+1
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
	STAA	HPLIM
	STAB	HPLIM+1
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is always in pairs:
* [PRETADR   ]
* [PCALLERFRM]
* [RETADR    ]
* [CALLERFRM ] <= RSP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ] <= CALLERFRM
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ] <= FP
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
* Utility routines
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* subroutine to make sure we don'T forget anything
MARK	PULA	; move the return address down
	PULB
	DES
	DES
	TSX	; to stack the mark
	PSHB
	PSHA
	LDAA	FP
	LDAB	FP+1
	STAA	0,X	; mark old FP, no allocate
	STAB	1,X
	LDX	PSP
	STX	FP	; update frame pointer
	RTS
*
UNMK	PULA		; get the return address out of the way
	PULB
	TSX		; point to the return stack
	LDX	0,X	; get the old frame pointer
	STX	FP	; and restore it
	INS		; drop the old frame pointer
	INS
	PSHB		; put the return address back
	PSHA
	RTS
*
* Compromise between speed and reusability
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI6	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI4	DEX
	DEX
	STAA	0,X
	STAB	1,X
ALCLI2	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
	RTS
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
* two bytes
ALCL2	CLRA
	CLRB
	LDX	PSP
	BRA	ALCLI4
*
*
PDROP_8	LDAB	#8	; saves two bytes, 7 vs. 3
	CLRA
* Add A:B to PSP -- negative for allocation, positive for deallocation
ADDPSP	ADDB	PSP+1
	ADCA	PSP
	STAA	PSP
	STAB	PSP+1
	LDX	PSP	; return with X ready
	RTS
*
* Just use ADDPSP with a negative 8 or whatever
*PDROP_8	LDAB	#8	; saves two bytes, 7 vs. 3
* deallocate count in B
*PDROPB	LDX	PSP	; 5 bytes to deallocate in-line
*	ABX		; vs. 3 bytes to call this.
*	STX	PSP	; ABX is useful for deallocation
*	RTS		; 5 bytes vs. 7 total
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry, after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>]
* [RETADR1 ]
* [FRMPTR1 ] <= RSP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after mark (no local allocation)
* [<unknown>] <= FRMPTR0,FRMPTR1
* [32:VAR1_1--]
* [32:VAR1_2--] <= FRMPTR1
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP,FP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	JSR	MARK	; mark, no allocate, X is PSP
*
	LDAB	#(-1)	; default negative
	TBA
	JSR	ALCLI4	; allocate 2 temporary cells and init
	LDX	FP	; (could have optimized that to 2 bytes.)
	TST	2,X	; the left-hand operand sign bit
	BMI	ADD16SR
	LDX	PSP
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	LDX	FP
	TST	0,X	; the right-hand operand sign bit
	BMI	ADD16SL
	LDX	PSP
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDX	FP
	LDAA	2,X	; left hand 
	LDAB	3,X
	ADDB	1,X	; right hand
	ADCA	0,X
	STAA	2,X	; store low half
	STAB	3,X
	LDX	PSP
	LDAA	2,X
	LDAB	3,X
	ADCB	1,X
	ADCA	0,X
	LDX	FP	; wouldn't need to do this if we tracked PSP extras
	STAA	0,X
	STAB	1,X
*
	STX	PSP	; drop the temporaries
	JSR	UNMK
	RTS
*
* The alternative without link, mark, or restore will be shown in the no-frame case.
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit 
ADD16U	JSR	MARK	; mark, no allocate, X is PSP
*
	LDX	FP
	LDAA	2,X	; left
	LDAB	3,X
	ADDB	1,X	; add right
	ADCA	0,X
	STAA	2,X	; save low
	STAB	3,X
	LDAB	#0	; extend
	ADCB	#0	; extend Carry unsigned (could ROL)
	STAB	1,X	; re-use right side to store high half
	CLR	0,X	; only bit 8 can be affected
*
	JSR	UNMK
	RTS
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameter,
* after mark (no local allocation)
* [<unknown>  ] <= FRMPTR0
* [32:VAR1_1  ]
* [32:VAR1_2  ] <= FRMPTR1
* [16:PARAM2_1] <= PSP,FP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI	JSR	MARK
*
	LDAB	#(-1)	; make a temporary -1
	TBA
	JSR	ALCLI2	; (default to signed)
	LDX	FP
	TST	0,X	; test high byte
	BMI	ADD16SP
	LDX	PSP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	TSX
	LDX	0,X	; caller's FP
	LDAA	2,X	; caller's 2nd variable, low
	LDAB	3,X
	LDX	FP
	ADDB	1,X	; parameter
	ADCA	0,X
	TSX
	LDX	0,X
	STAA	2,X	; update low half with result
	STAB	3,X
	LDAA	0,X	; 2nd variable, high half
	LDAB	1,X
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	TSX
	LDX	0,X
	STAA	0,X	; update high half
	STAB	1,X
*
	LDX	FP
	INX		; drop parameter
	INX	
	STX	PSP	; and sign temporary goes bye-bye, too
	JSR	UNMK
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ] <= RSP
*
* Return stack after link:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>]
* [RETADR0 ]
* [FRMPTR0==<EMPTYP>] <= RSP
*
* Parameter stack after mark and local allocation
* [<unknown>] <= FRMPTR0
* [VAR1_1--]
* [VAR1_2--] <= PSP,FP
*
MAIN	JSR	MARK
	JSR	ALCL8	; allocate and clear 8 bytes
	STX	FP	; Point FP to base of local variables.
*
	LDAA	#$12
	LDAB	#$34
	JSR	PPSHD
	LDAA	#$CD
	LDAB	#$EF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDAA	#$87
	LDAB	#$65
	LDX	PSP	; reuse parameter space, since order is okay
	STAA	0,X
	STAB	1,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDX	PSP
	LDAA	2,X	; result low half
	LDAB	3,X
	LDX	FP
	STAA	2,X	; to 2nd local variable low half
	STAB	3,X
	LDX	PSP
	LDAA	0,X	; result high half
	LDAB	1,X
	LDX	FP
	STAA	0,X	; to 2nd local variable high half
	STAB	1,X
	LDAA	#$A5
	TAB	
	JSR	PPSHD
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDX	FP
	LDAA	2,X	; 2nd variable low half
	LDAB	3,X
	LDX	LB_BASE
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	LDX	FP
	LDAA	0,X
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
*
	JSR	PDROP_8
	JSR	UNMK
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will always be in pairs:
* [RETADRNN  ]
* [CALLERFMNN]
*
* Return stack after initialization:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS <= RSP
*
* Return stack after saving previous mark:
* [SSTKNDR ]
* [<EMPTYP>]
* [SSTKNDR ]SSTKBAS
* [FRMPTRm1==<EMPTYP>] <= RSP
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP,FP==<EMPTYP>
*
START	JSR	INISTKS
	JSR	MARK	
*	LDAA	PSP
*	LDAB	PSP+1
*	JSR	PPSHD	; empty previous mark
*	STX	FP	; empty new mark
*
	JSR	MAIN
*
	JSR	UNMK
*
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

As always, I have tested this code. it handles the stack frames properly and gets the right result in the right place. And, as always, I do not guarantee that this code can be generalized or automatically produced, by hand or by compiler.

As I keep saying, we have seen what this kind of code looks like without stack frames. I plan to come back and do these examples without stack frames, so we can get a clearer picture of how the disciplines affect the code. It may take several weeks, as I have to get back to some sort of day job to pay the rent.

I now (25 Nov.) have the functionality of this code done in single-stack discipline, without stack frames. Have a look at it.

At some point in the tutorial, where we've gone far enough to need to reference these concepts again, I will point you back here. If you haven't really followed what the stack frames business is all about, you will want to come back and re-read this then. You may want to, anyway.

For the time being, here's some simpler stuff -- getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Tuesday, November 19, 2024

ALPP 02-27 -- Ascending the Wrong Island -- Single-stack Stack Frame Example: 6800

And this one has been sitting at the bottom of the pool for a while, as predicted, even longer than the 6809 example.

  Ascending the Wrong Island --
Single-stack Stack Frame Example:
6800

(Title Page/Index)

 

Now that we have seen how we can implement concrete examples of both single-stack and split stack stack frames on the 6801, let's see if we can get a better feel for what the 6801's extensions buy for us, by repeating those implementations using only the 6800's original instruction set.

The usual caveat -- I do not recommend stack frames, and I especially do not recommend combining parameters and return addresses on a single stack. Part of the reason we're doing this is to study addressing techniques, but the other part is to convince ourselves that we don't want to do this.

I started by working out an implementation of PUSHX and POPX routines, since the PSHX and PULX routines featured so prominently the the 6801 code. Late at night, when I had time to work on this, I typed without thinking, probably in 6809 mode or something,

PUSHX	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	PSHB
	PSHA
	RTS
*
POPX	PULA
	PULB
	STAA	XWORK
	STAB	XWORK+1
	LDX	XWORK
	RTS

As we know, the results of this code on the 6801 would be humorous. I laughed at myself and went to bed.

(If these were macros, or if we were doing it in-line, this would actually be exactly what we'd do -- leaving off the RTS, of course. And, of course, if we needed to do a software stack on the 6809, the push and pop routines would be even more straightforward.)

But we have to dance around the return address. So it ends up something like this:

SPSHX	STX	XWORK
	DES
	DES
	TSX
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	LDAA	XWORK
	LDAB	XWORK+1
	STAA	2,X
	STAB	3,X
	RTS
*
SPULX	TSX
	LDX	2,X
	STX	XWORK
	LDAA	0,X
	LDAB	1,X
	STAA	2,X
	STAB	3,X
	LDX	XWORK
	INS
	INS
	RTS

which is disgustingly long. But necessary. Because of the return address dance.

With that written, I at least was confident (the next time I could work on it) that the same stack frames we used on the 6801 would be workable. (If you don't have the 6801 code open in another browser window for reference, go ahead and open it up, you'll want it handy to compare.) And if the stack frame would be the same, I could just convert the link and unlink from the 6801 code:

LINKF	DES
	DES
	DES
	DES
	TSX
	LDD	4,X
	STD	0,X
	LDD	VBP
	STD	4,X
	LDD	FP
	STD	2,X
	INX
	INX
	STX	FP
	STX	VBP	
	RTS
*
* No return value on stack
UNLKF	TSX
	LDD	2,X	; get old FP, dodge return address
	STD	FP	
	LDD	4,X	; old VBP
	STD	VBP
	LDD	0,X	; return address
	STD	4,X	; copy it so we can return
	INS		; drop 4 bytes
	INS
	INS
	INS
	RTS

It was a little bit trickier, since we don't have PSHX and PULX on the 6800, but it wasn't too bad.

And then I proceeded to work on converting the addition routines. (And in the process realized I had misnamed SUB16whatever, but I've taken care of that now.)

And I discovered that moving the return value back into the X and A:B registers at the end of the routines was waste motion. 

You have to use the accumulators and the index to simply get in and out of your subroutines' stack frames, so you're just thrashing the stack at procedure entrance and exit.

So I added two variables in the direct page for the return values, RETVHI and RETVLO.

And I paused to reflect for a moment whether they would have also been useful in the 6801 code. I think it would be a wash, really, because using these direct page variables for the return values means that the caller has to load them, and the 6801 does have PSHX and PULX.

Lots of stuff like that came up during the conversion.

Another issue that came up was that the EXORsim interactive assembler currently has a bug that makes the FDB (form double byte constant) directive unusable. (I should try to fix it, or at least contact Joe about it, but I'm wanting to get this done first.)

I brought in the PSH16I routine to load 16-bit literals out of the instruction stream:

PSH16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream	
	INS		; drop the return address we almost have in X
	INS
	PSHB		; replace it with the constant
	PSHA
	JMP	2,X	; return to the byte after the constant.

But you have to follow that with the two bytes you want to push onto the stack, and that's a FDB in the case of addresses and 16-bit offsets -- 

And EXORsim's interactive assembler doesn't help us split addresses up, so there were several places I loaded the 16-bit address or offset into the index register and called PSHX, or did similar things.

And the stack (and runtime) initialization (STKINI is a misnomer, isn't it?) needs these routines, so I moved stuff in there around so that I'd have the stack ready early, And there was some of the math that had to be done by hand in the process of setting the stack up. I and I end up doing some things by hand anyway. 

But you'll see a premonition of why this is all so meaningless in a routine that is just for the initialization code, UADD16.

* Utility 16-bit add, leave result in A:B
UADD16	TSX		; no frame
	LDAB	5,X	; left
	ADDB	3,X	; right		; because we can
	LDAA	4,X	; left
	ADCA	2,X	; right
	LDX	0,X
UADROP	INS		; drop return address and parameters
	INS
	INS
	INS
	INS
	INS
	JMP	0,X	; return via X

I didn't end up using USUB16, but I left it in for your enjoyment.

Anyway, it was by no means as straightforward as I had hoped (of course). I ended up trying a number things that didn't help, like defining PUSHD and POPD routines, and a SBX routine.

Admittedly, some of the complexities could be avoided by simply restricting the stack from crossing a 256-byte boundary, and leaving LOUD notes in the comments about ALWAYS keeping the size and location so that it doesn't. You'll note that this it is actually the case here that the size and location would allow us to optimize out carries for the stack pointer.

But, although shouting with capitals can be done in plain text, colored text cannot, so I don't think it's a wise example ...

No, that's not it. It just doesn't really solve enough of the problems to make stack frames a reasonable option. Try it yourself if you are not convinced.

As always, read the code and the comments. Don't assume I remembered to fix the comments after every edit, if the comments don't seem to match the code, they may not, or the code may be doing things you don't understand. Take time to work through it and be sure.

* 16-bit addition as example of single-stack stack frame discipline on 6800,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for user stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily in leaf functions only
DWORK	RMB	2	; For saving D temporarily in leaf functions only
RETVHI	RMB	2	; high half of 32-bit return values (because we can't push X easily)
RETVLO	RMB	2	; 16-bit return values and low half (because loading and saving is redundant)
FP	RMB	2	; frame pointer
VBP	RMB	2	; variable base pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+4
STKBAS	RMB	8	; for canary return
STKSZ	EQU	192	; for EXORsim assembler limits
STKBASX	EQU	STKLIMX+192	; must be STKLIMX+STKSZ -- assembler won't take symbol
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+8	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*STKBASM	FDB	STKBASX	; Doesn't work within EXORsim assembler limits after all
*HBASEXM	FDB	HBASEX	; by avoiding splitting large constants up at assemble time
*
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE		; local space functional
	LDAA	LB_BASE		; bootstrap own stack
	LDAB	LB_BASE+1
*	ADDB	STKBASM+1
*	ADCA	STKBASM
	LDX	#STKBASX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
*
	STAB	XWORK+1		; initial stack pointer
	STAA	XWORK
*
	LDX	#STKUNDR	; for fake return address
	STX	DWORK		; save it for a moment
*	
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts, utility routines
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X
	STAA	6,X	; full fake frame
	STAB	7,X
	LDAA	XWORK	; calculate final self-link
	LDAB	XWORK+1
	ADDB	#8
	ADCA	#0
	STAA	4,X	; fake VBP
	STAB	5,X
	STAA	8,X	; final self-link
	STAB	9,X
	INX		; prepare first fake stack frame links
	INX
	STX	FP	; get frame pointers ready
	STX	VBP
	STX	0,X	; first self-link for list terminator
*
	LDAA	LB_BASE	
	LDAB	LB_BASE+1
	PSHB
	PSHA
*	JSR	PSH16I 
*	FDB	HBASEX	; EXORsim's interactive assembler doesn't like FDBs.
	LDX	#HBASEX
	JSR	SPSHX
*
	JSR	UADD16
	STAA	HPPTR		; as if we were ready to use heap
	STAB	HPPTR+1
	STAA	HPALL
	STAB	HPALL+1
*	JSR	PSH16I	; FDBs
*	FDB	CDBASE
*	JSR	PSH16I
*	FDB	(-4)		; extra bumper
*	JSR	UADD16
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
*
	STAA	HPLIM
	STAB	HPLIM+1
	RTS		; finally done, now can return
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame, 
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [VARBP   ] base of local variables in calling routine
* [FRMLNK  ] at entry to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK2 ] <= FRMLNK3
* [LOCVAR2 ] <= VARBP2
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [VARBP2  ]
* [FRMLNK3 ] <= FP (frame pointer)
* [LOCVAR3 ] <= VBP (variable base pointer)
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Push low half of return value
PSHLH	TSX
	LDAA	0,X		; return address
	LDAB	1,X
	PSHB
	PSHA
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X
	RTS
*
* Avoid the math to split 16-bit constants into two 8-bit loads,
* and push them while we are here.
* The constant follows the call in the instruction stream.
* Leaves constant in A:B, as well.
PSH16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream	
	INS		; drop the return address we almost have in X
	INS
	PSHB		; replace it with the constant
	PSHA
	JMP	2,X	; return to the byte after the constant.
*
* 8 bytes for the meat of this vs. 3 for the call.
* We end up using it a lot since EXORsim's interactive assembler doesn't do FDBs.
SPSHX	STX	XWORK
	DES
	DES
	TSX
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	LDAA	XWORK
	LDAB	XWORK+1
	STAA	2,X
	STAB	3,X
	RTS
*
* 6 bytes for the meat of this vs. 3 for the call, instead of FDB
TXD	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	RTS
*
* Utility 16-bit add, leave result in A:B
UADD16	TSX		; no frame
	LDAB	5,X	; left
	ADDB	3,X	; right		; because we can
	LDAA	4,X	; left
	ADCA	2,X	; right
	LDX	0,X
UADROP	INS		; drop return address and parameters
	INS
	INS
	INS
	INS
	INS
	JMP	0,X	; return via X
*
* Utility 16-bit sub, leave result in A:B
USUB16	TSX		; no frame
	LDAB	5,X	; left
	SUBB	3,X	; right		; because we can
	LDAA	4,X	; left
	SBCA	2,X	; right
	LDX	0,X
	BRA	UADROP	; drop return address and parameters
*
* Let the caller do allocation after.
LINKF	DES		; allocate room to push to
	DES
	DES
	DES
	TSX
	LDAA	4,X	; return address
	LDAB	5,X	; not sure of any reason to use or not use B
	STAA	0,X	; move it down to new top of stack
	STAB	1,X
	LDAA	VBP	; copy VBP and FP above return address
	LDAB	VBP+1
	STAA	4,X
	STAB	5,X
	LDAA	FP
	LDAB	FP+1
	STAA	2,X
	STAB	3,X
	INX
	INX
	STX	FP
	STX	VBP
	RTS
*
* No return value on stack
UNLKF	LDX	FP
	LDAA	2,X	; old VBP
	LDAB	3,X
	STAA	VBP
	STAB	VBP+1
	PULA		; get the return address	
	PULB
	STAA	2,X	; put return address in place
	STAB	3,X
	TXS		; drop temporaries and locals
	LDX	0,X	; get old FP
	STX	FP
	INS
	INS
	RTS
*
* We really don't want to put S in a temp if we can avoid it
ALOCS8	PULA
	PULB
ALOS8I	DES
	DES
ALOS6I	DES
	DES
ALOS4I	DES
	DES
ALOS2I	DES
	DES
	PSHB
	PSHA
	RTS
*
ALOCS6	PULA
	PULB
	BRA	ALOS6I
*
ALOCS4	PULA
	PULB
	BRA	ALOS4I
*
ALOCS2	PULA
	PULB
	BRA	ALOS2I
*
INI0_8	CLRA
	CLRB
* call with initialization value in A:B
INIS8	TSX
INIT8	STAA	8,X
	STAB	9,X
INIT6	STAA	6,X
	STAB	7,X
INIT4	STAA	4,X
	STAB	5,X
INIT2	STAA	2,X
	STAB	3,X
	RTS		; 0,X is return address!
*
INI0_6	CLRA
	CLRB
* call with initialization value in A:B
INIS6	TSX
	BRA	INIS6
*
INI0_4	CLRA
	CLRB
* call with initialization value in A:B
INIS4	TSX
	BRA	INIS4
*
INI0_2	CLRA
	CLRB
* call with initialization value in A:B
INIS2	TSX
	BRA	INIS2
*
DROP8	PULA
	PULB
	INS
	INS
DROP6I	INS
	INS
	INS
	INS
	INS
	INS
	PSHB
	PSHA
	RTS
*
DROP6	PULA
	PULB
	BRA	DROP6I
*
*
* Stack after LINK and allocation
* when functions are called by MAIN
* with two parameters
* We will return results in RETVHI:RETVLO in direct page
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit RETVHI:RETVLO
* Does not alter the parameters.
ADD16S	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	#(-1)	; prepare for sign extension
	TST	8,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	6,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions
	LDAA	12,X	; left-hand low cell
	LDAB	13,X
	ADDB	11,X	; right-hand low cell
	ADCA	10,X
	STAA	RETVLO	; save low half of result
	STAB	RETVLO+1
	LDAA	2,X	; left-hand extension
	LDAB	3,X
	ADCB	1,X	; right-hand extension
	ADCA	0,X
	STAA	RETVHI	; Save high half of result
	STAB	RETVHI+1
*
	JSR	UNLKF	; drops temporaries
	RTS		; result is in RETVLO:RETVHI
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit RETVHI:RETVLO
ADD16U	JSR	LINKF
	TSX		; no local allocations
*
	LDAA	8,X	; left
	LDAB	9,X
	ADDB	7,X	; right
	ADCA	6,X
	STAA	RETVLO	; save low half
	STAB	RETVLO+1
	LDAB	#0
	ADCB	#0
	STAB	RETVHI+1	; save carry bit in high half
	CLR	RETVHI		; will never carry beyond bit 17
*
	JSR	UNLKF	; drops temporaries
	RTS		; result is in RETVLO:RETVHI
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [<SELF>  ] <= <SELF>
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ]
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ]
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FRMLNK0
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [RETADR1 ] 
* [VARBP1  ]
* [FRMLNK0 ] <= FP,SP,VBP
*
* To show how to walk the stack --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit addend
* target parameter in caller
*   2nd 32-bit variable at offset -2*NATWID
* no output parameter:
ADD16SI	JSR	LINKF
	TSX		; no local variables 
*
	LDAA	#(-1)
	TST	6,X	; high byte of paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half
	PSHA
	LDX	2,X	; get caller's VBP
	LDAA	2,X	; caller's 2nd variable, low
	LDAB	3,X
	LDX	FP
	ADDB	7,X	; parameter
	ADCA	6,X
	LDX	2,X	; caller's VBP
	STAA	2,X	; save result low half away
	STAB	3,X
	LDAA	0,X	; caller's 2nd variable, high
	LDAB	1,X
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	FP
	LDX	2,X
	STAA	0,X	; save result high half away
	STAB	1,X
*
	JSR	UNLKF	; drops temporaries 
	RTS		; no result to load
*
*
***
* Stack after LINK
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= FRMLNKX,VARBP0
* [RETADR0 ] 
* [VARBP0  ]
* [FRMLNKX ] <= FP
* [32:VAR1_1]
* [32:VAR1_2] <= SP,VBP
*
MAIN	JSR	LINKF
	JSR	ALOCS8	; 2 calls, 6 bytes vs. 1 clr + 8 pushes , 9 bytes
	JSR	INI0_8
	TSX
	STX	VBP	; link and allocate complete
*
	JSR	PSH16I
*	FDB	$1234	; parameters
	FCB	$12
	FCB	$34
	JSR	PSH16I
*	FDB	$CDEF
	FCB	$CD
	FCB	$EF
	JSR	ADD16U	; result in D:X should be $E023
	INS		; drop one parameter, reuse other
	INS
	TSX
	LDAA	RETVLO	; four extra bytes compared to calling PSHLH
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X	
	JSR	PSH16I
*	FDB	$8765
	FCB	$87
	FCB	$65
	JSR	ADD16S	; result in D:X should be $FFFF6788
	INS		; drop one parameter, reuse other
	INS
	LDX	VBP
	LDAA	RETVHI
	LDAB	RETVHI+1
	STAA	0,X
	STAB	1,X
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	2,X
	STAB	3,X
	TSX
	LDAB	#$A5
	STAB	0,X	; $A5
	STAB	1,X	; $A5A5
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	LDX	VBP		; get the result from our variable
	LDAA	2,X		; low half
	LDAB	3,X
	LDX	LB_BASE		; store it in FINAL, in process local space
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	LDX	VBP
	LDAA	0,X		; high half
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
*
	JSR	UNLKF
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
* (who knows?) <= VBP
***
*
* Stack after initialization:
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,FP,VBP
* [STKUNDR ]STKBAS <= SP
***
* Stack after LINK (at call to MAIN)
* [<SELF>  ] <= <SELF>,VARBPY
* [STKUNDR ]
* [VARBPY  ] 
* [<SELF>  ] <= <SELF>,VARBPX,FRMLNKY
* [STKUNDR ]STKBAS
* [VARBPX  ] 
* [FRMLNKY=STKBAS+NATWID ] <= SP,FP,VBP
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	LINKF
*
	JSR	MAIN
*
	JSR	UNLKF
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

I had to use my own assembler to clean up some mistakes, but the code assembles and runs correctly in EXORsim. As always, I will make no guarantees that this code is appropriate to be generalized for compilers and such.

We've seen what this kind of code looks like without stack frames, but now that I have the split-stack version of this code up, I'm planning to do the functionality without frames so you can really see it and compare. 

Again, if you're getting worn out, go ahead and move on to getting numeric output in binary.


(Title Page/Index)