Friday, November 29, 2024

ALPP 02-32 -- More Ascending the Right Island -- Split-stack No Frame Example: 6801

Still digging into that treasure from the bottom of the pool.

  Ascending the Right Island --
Split-stack No Frame Example:
6801

(Title Page/Index)

 

About the only thing I want to point out here is that, with the support for 16-bit operations on the 6801, it becomes easier to see how splitting the return address allows a more seamless approach to passing parameters than the single-stack no-frame example we just finished. 

Hopefully the code is mostly self-explanatory by now. (We've been looking at the meat of it for so long ...)

Compare with both the single-stack example for the 6801 and the split-stack example for the 6800 to help see what is and is not going on.

As always, read the code and step through it:

* 16-bit addition as example of split-stack frame-free discipline on 6801
* using the direct page,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6801
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for usr stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily
DWORK	RMB	2	; For saving D temporarily
PSP	RMB	2	; parameter stack pointer
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; Put a bumper after the process static variables
SSTKLIM	RMB	64	; 16 levels of call
SSTKLMX	EQU	FINALX+8
SSTKBAS	RMB	4	; for canary return
SSTKBSX	EQU	SSTKLMX+64
SSTKFAK	RMB	2	; fake frame pointer, self-link
SSTKFAX	EQU	SSTKBSX+8	; 6801 is post-dec (post-store-decrement) push
SSTKBMP	RMB	4	; a little bumper space
SSTKBMX	EQU	SSTKFAX+2	; But we are going to init S through X
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKLMX	EQU	SSTKBMX+4
PSTKBAS	RMB	4	; bumper space -- parameter stack is pre-dec
PSTKBSX	EQU	PSTKLMX+64
PSTKBMP	RMB	4	; a little bumper space
PSTKBMX	EQU	PSTKBSX+4
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	PSTKBMX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*
INISTKS	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE
	LDD	LB_BASE		; bootstrap own return stack
	ADDD	#SSTKBSX
	STD	XWORK
	LDX	XWORK		; initial return stack pointer
*
	LDD	#SSTKNDR
	STD	0,X	; in the cell beyond empty stack pointer
	STD	2,X	; and the next cell, for good measure
	PULA		; pop real return address
	PULB
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts
*
	LDD	LB_BASE		; bootstrap parameter stack
	ADDD	#PSTKBSX
	STD	PSP		; parameter stack now ready
*
	LDAA	LB_BASE		; set up heap as if we actually had one
	LDAB	LB_BASE+1
	ADDD	#HBASEX
	STD	HPPTR
	STD	HPALL		; as if the heap were functional
	LDD	#CDBASE
	SUBD	#4
	STAA	HPLIM
	RTS
*
*
***
* General structure of the stacks, 
*
* return stack is just the return address:
* [PRETADR   ]
* [RETADR    ] <= SP
*
* order of elements on the parameter stack,
* when they are present:
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ]
* [VARIABLES  ]
* [TEMPORARIES]
* [PARAMETERS ] <= PSP
*
* Result is returned on parameter stack
*
***
*
* Utility routines
*
PPOPD	LDX	PSP
	LDD	0,X
	INX
	INX
	STX	PSP
	RTS
*
* This saves bytes:
ALCL2	CLRA
	CLRB	; fall through
* 
PPSHD	LDX	PSP
ALCLI2	DEX
	DEX
	STX	PSP
	STD	0,X
	RTS
*
* Compromise between speed and reusability
* Returns with PSP in X.
* Enter here to load PSP and initialize to 0
* 8 bytes
ALCL8	CLRA	
	CLRB
* Enter here with initial value in A:B
ALCLD8	LDX	PSP
* Enter here with PSP loaded and initial value in D
ALCLI8	DEX
	DEX
	STD	0,X
ALCLI6	DEX
	DEX
	STD	0,X
ALCLI4	DEX
	DEX
	STD	0,X
	BRA	ALCLI2
*
* six bytes
ALCL6	CLRA
	CLRB
ALCLD6	LDX	PSP
	BRA	ALCLI6
*
* four bytes
ALCL4	CLRA
	CLRB
ALCLD4	LDX	PSP
	BRA	ALCLI4
*
*
***
* Return stack when functions are called by MAIN
* Return stack on entry
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ]
* [RETADR1 ] <= SP
*
* Parameter stack when called by MAIN
* with two 32-bit local variables
* and two 16-bit parameters,
* after entry (before temporary allocation)
* [<unknown>]
* [32:VAR1_1--]
* [32:VAR1_2--]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
****
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit
ADD16S	LDD	#(-1)	; default negative
	JSR	ALCLD4	; returns with PSP in X
	TST	6,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLR	2,X	; positive
	CLR	3,X
ADD16SR	TST	4,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLR	0,X	; positive
	CLR	1,X
ADD16SL	LDD	6,X	; left hand 
	ADDD	4,X	; right hand
	STD	6,X	; store low half
	LDD	2,X
	ADCB	1,X
	ADCA	0,X
	STD	4,X
*
	LDAB	#4	; shorter and faster than 4*INX, walks on B
	ABX
	STX	PSP	; drop the temporaries
	RTS
*
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right in 32-bit
* output parameter:
*   17-bit sum in 32-bit
ADD16U	LDX	PSP
	LDD	2,X	; left
	ADDD	0,X	; add right
	STD	2,X	; save low
	LDD	#0	; extend
	ROLB		; extend Carry unsigned (could ADC #0)
	STD	0,X	; re-use right side to store high half
*
	RTS		; PSP unchanged
*
* Etc.
*
*
***
* Parameter stack when called by MAIN
* with one 16-bit parameters,
* after after entry (before temporary allocation)
* [<unknown>  ]
* [32:VAR1_1  ]
* [32:VAR1_2  ]
* [16:PARAM2_1]
* [16:PARAM2_2] <= PSP
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameter:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	LDD	#(-1)	; make a temporary -1
	JSR	PPSHD	; (default to signed) returns with PSP in X, 2 bytes on stack
	TST	2,X	; test parameter high byte
	BMI	ADD16SP
	CLR	0,X	; zero extend
	CLR	1,X
ADD16SP	LDX	4,X	; pointer to caller's local
	LDD	2,X	; caller's 2nd variable, low
	LDX	PSP
	ADDD	2,X	; parameter
	LDX	4,X	; pointer
	STD	2,X	; update low half with result
	LDD	0,X	; 2nd variable, high half
	LDX	PSP
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	4,X	; pointer
	STD	0,X	; update high half
*
	LDX	PSP
	LDAB	#6	; drop sign temporary and two parameters
	ABX
	STX	PSP
	RTS
*
*
***
* Return stack on entry:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS
* [RETADR0 ] <= SP
*
* Parameter stack after local allocation
* [<unknown>]
* [VAR1_1--]
* [VAR1_2--] <= PSP
*
MAIN	JSR	ALCL8	; allocate and clear 8 bytes
*
	LDD	#$1234
	JSR	PPSHD
	LDD	#$CDEF
	JSR	PPSHD
	JSR	ADD16U	; 32-bit result on parameter stack should be $0000E023
	LDX	PSP	; order is okay, low half where we want it (PSP returned in X anyway)
	LDD	#$8765	; reuse high half
	STD	0,X
	JSR	ADD16S	; result on parameter stack should be $FFFF6788
	LDX	PSP	; (PSP returned in X anyway)
	LDD	2,X	; result low half
	STD	6,X	; to 2nd local variable low half
	LDD	0,X	; result high half
	STD	4,X	; to 2nd local variable high half
	LDD	PSP	; address of 2nd local variable
	ADDD	#4
	STD	2,X	; pointer is 1st arg
	LDD	#$A5A5
	STD	0,X	; 1st arg
	JSR	ADD16SI	; result in 2nd variable should be FFFF0D2D (Carry set)
	LDX	PSP	: unnecssary, ...
	LDD	2,X	; 2nd variable low half
	LDX	LB_BASE
	STD	FINALX+2,X
	LDX	PSP
	LDD	0,X
	LDX	LB_BASE
	STD	FINALX,X
*
	LDD	PSP
	ADDD	#8	; deallocate the locals
	STD	PSP
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
* (who knows?) <= FP
***
*
***
* Return stack will be just the return address:
* [RETADRNN  ]
*
* Return stack after initialization:
* [SSTKNDR ]
* [SSTKNDR ]SSTKBAS <= SP
*
*
* Parameter stack after initialization, mark:
* [<unknown]PSTKBAS <= PSP
*
START	JSR	INISTKS
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
SSTKNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP
	NOP		; another landing pad to set breakpoint at
	NOP
	LDX	$FFFE
	JMP	0,X	; alternatively, jmp through reset vector
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

Again, I have tested the code and it produces the correct results without stack frames.

If you think you've seen enough for now, go ahead and move ahead with getting numeric output in binary. Otherwise, I'll be cleaning the stack frame support code out of the 6809 examples next. 

[JMR202412050835 daydream addendum:]

Before we leave this topic behind, if you've been following what I've been talking about on this detour, I could mention a bit more of my daydreams.

I think I've mentioned it in passing, but I have often thought it was unfortunate that Motorola didn't push the opcodes around a bit and keep direct mode address for the unary operator -- INC, ROL, TST, etc. -- instructions. (They did so on the 6809.) 

In fact, I would have preferred that they had kept direct page and left out extended mode for unary instructions. (They did exactly that for the 6805.)

"Unary" operators on the 68XX CPUs are mostly read-modify-write instructions that would benefit greatly, in terms of timing and object code efficiency, from having short-addressed versions, and they would also help make the direct page area even more of a psuedo-register memory file.

But we didn't really understand principles of locality in coding back then, so we can, shifting ourselves back to the context of the 1960s and '70s, understand why they saw it as a reasonable tradeoff, and why they wanted to leave as many op-codes as possible available for "inherent" mode operators that didn't seem to fit the unary/binary operator partition they were using -- like Add B to A (ABA), et. al.

If they had, or if, in producing the 6801 as an object-code compatible upgrade to the 6800, they had been willing to produce a mnemonic-level compatible object-code incompatible version of the 6801 with direct-page versions of the the unary operators -- daydream warning! -- it should have been possible to shave at least two cycles off the timing, compared to the 6801's extended mode timing (6 cycles extended, vs. potentially 4 cycles direct-page), giving more meaning to the idea of pseudo-registers -- or making the direct page more of a static cache. 

And if the RAM were going to be built-in (as it pretty much always was in 6801 SOCs), it might even have been possible to shave off yet another cycle, bringing DP variables within a cycle of accumulator timing.

And ... well, the 6801 has 16-bit shifts of the double accumulator,  so why not have 16-bit shifts and increments/decrements for direct page variables? Yeah, maybe that's just being greedy.

And, then, here's yet another step out into alternate reality -- a couple of extra address lines (48-pin DIP packages?) for address space, and it would be possible to distinguish between accessing code, data, stack, and the direct page, helping expand the address range beyond the tight squeeze of 64K.

Erk. Lost in my daydreams again. No wonder it takes me so long to get things done.

Okay, moving on to the 6809 examples, or skipping ahead to getting numeric output in binary.

[JMR202412050835 daydream addendum end.]


(Title Page/Index)


 

 

 

 

No comments:

Post a Comment