Sunday, November 24, 2024

ALPP 02-29 -- Putting That Wrong Island in the Rear-view Mirror -- Single-stack No Frame Example: 6800

So we've got most of those rubber bricks off the bottom of the pool but there's some treasure down there, too.

  Putting That Wrong Island in the Rear View Mirror --
Single-stack No Frame Example:
6800

(Title Page/Index)

 

As I have said, even though we’ve just looked at an example of how split-stack stack frames can be done on the 6800 and we've even seen a parallel example of single-stack stack frames on the same, I do not recommend stack frames. 

But I think I have made it clear that, if you have to do stack frames, I recommend split-stack over single-stack.

In this chapter we are going to look at the same functional example of three kinds of addition using a single stack without a stack frame.

Single-stack no frame, if you are allowed to do it and learn how to do it right, will produce cleaner, more optimal code than single-stack with stack frames.

But I'm going to repeat myself. I cannot recommend this. You have to track what is on that stack, and the return address just gets in the way of your calculations and your memory. It's a bit (16 bits on the 6800) of distracting data that isn't relevant to the calculations the function is doing, and every time you look for something on the stack, it either sticks out like a sore thumb, distracting you, or you forget it's there and miss what you are aiming at. And walk on it. Or try to get it from where it isn't and end up executing data or garbage instead of instructions.

We have to acknowledge is that, without the frame pointer(s), we end up having to track how much of what we have on the stack at any particular point in the code.

But we have to keep track of that anyway, really, even though a frame pointer can help. If we don't know what's there, we don't know where we've put things, and that's a terrible state for a program (and a programmer) to be in -- and that's one reason people avoid reading the assembly language output of compilers.

Just looking at the code below, you may not see how much we've ripped out -- that's because we've been hiding what we could in subroutines. But tracing through the code should feel rather different, because you can hide code from the programmer, but you can't hide it from the processor.

You'll really want to compare the code with the stack frame version, and re-read the code and the comments. Take time to trace through both, watching the source as you do.

* 16-bit addition as example of single-stack discipline sans stack frame on 6800,
* with test code
* Joel Matthew Rees, October, November 2024
*
	OPT	6800
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS says this is a good place for user stuff.
*
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
	NOP		; bumper
	NOP		; 6 bytes to this point.
SSAVE	RMB	2	; a place to keep S so we can return clean
	RMB	4	; bumper
* All of the pseudo-registers must be saved and restored on context switch,
* cannot be accessed during interrupt service.
XWORK	RMB	2	; For saving an index register temporarily in leaf functions only
DWORK	RMB	2	; For saving D temporarily in leaf functions only
RETVHI	RMB	2	; high half of 32-bit return values (because we can't push X easily)
RETVLO	RMB	2	; 16-bit return values and low half (because loading and saving is redundant)
LB_BASE	RMB	2	; For process local variables
HPPTR	RMB	2	; heap pointer (not yet managed)
HPALL	RMB	2	; heap allocation pointer
HPLIM	RMB	2	; heap limit
* End of pseudo-registers
	RMB	4	; bumper
GAP1	RMB	2	; Mark the bottom of the gap
*
*
*
	ORG	$2000	; Give the DP room.
LB_ADDR	RMB	4	; a little bumper space
FINAL	RMB	4	; 32-bit Final result in DP variable (to show we can)
FINALX	EQU	4
	RMB	4	; buffer
STKLIM	RMB	192	; roughly 16 to 20 levels of call
STKLIMX	EQU	FINALX+8
STKBAS	RMB	8	; for canary return
STKSZ	EQU	192	; for EXORsim assembler limits
STKBASX	EQU	STKLIMX+192	; must be STKLIMX+STKSZ -- assembler won't take symbol
STKFAK	RMB	2	; fake frame pointer, self-link
STKFAKX	EQU	STKBASX+8	; 6801 is post-dec (post-store-decrement) push
STKBMP	RMB	4	; a little bumper space
STKBMPX	EQU	STKFAKX+2	; But we are going to init S through X
*
* My assembler limits RMBs to $100 long, so we'll use a different way.
HBASE	RMB	1	; $1024 or something	; Not using or managing heap yet.
HBASEX	EQU	STKBMPX+4
*HLIM	RMB	4	; bumper
*HLIMX	EQU	HBASEX+$100	; 1024
*
*
	ORG	$3000
CDBASE	JMP	ERROR		; more bumpers
	NOP
*STKBASM	FDB	STKBASX	; Doesn't work within EXORsim assembler limits after all
*HBASEXM	FDB	HBASEX	; by avoiding splitting large constants up at assemble time
*
INISTK	LDX	#LB_ADDR	; set up process local space
	STX	LB_BASE		; local space functional
	LDAA	LB_BASE		; bootstrap own stack
	LDAB	LB_BASE+1
*	ADDB	STKBASM+1
*	ADCA	STKBASM
	LDX	#STKBASX	; Instead of FDB
	STX	XWORK 
	ADDB	XWORK+1
	ADCA	XWORK
*
	STAB	XWORK+1		; initial stack pointer
	STAA	XWORK
*
	LDX	#STKUNDR	; for fake return address
	STX	DWORK		; save it for a moment
*	
	PULA		; pop real return address
	PULB
	LDX	XWORK	; ready own stack pointer
	STS	SSAVE	; save stack pointer from monitor ROM
	TXS		; move to our own stack (let TXS convert it)
	PSHB		; put return address on own stack
	PSHA		; stack now ready for interrupts, utility routines
*
	LDAA	DWORK	; error handler for fake return
	LDAB	DWORK+1
	STAA	0,X	; in the cell beyond empty stack pointer
	STAB	1,X
	STAA	2,X	; and the next cell, for good measure
	STAB	3,X
*
	LDAA	LB_BASE	
	LDAB	LB_BASE+1
	PSHB
	PSHA
*	JSR	PSH16I 
*	FDB	HBASEX	; EXORsim's interactive assembler doesn't like FDBs.
	LDX	#HBASEX
	JSR	SPSHX
*
	JSR	UADD16
	STAA	HPPTR		; as if we were ready to use heap
	STAB	HPPTR+1
	STAA	HPALL
	STAB	HPALL+1
*	JSR	PSH16I	; FDBs
*	FDB	CDBASE
*	JSR	PSH16I
*	FDB	(-4)		; extra bumper
*	JSR	UADD16
	LDX	#CDBASE
	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	SUBB	#4
	SBCA	#0
*
	STAA	HPLIM
	STAB	HPLIM+1
	RTS		; finally done, now can return
*
***
* Since negative index offsets are so expensive,
* we want to create a stack frame with only positive offsets.
* And we want the frame pointer to be pushed after the call,
* on entry to the local context.
* And the saved frame pointer needs to link to the previous one.
* And when we restore the previous frame, 
* we need to be able to restore the previous frame base.
*
* Cross-section of general frame structure in called routine:
* [{LOCVAR}] for calling routine
* [{TEMP}  ] for calling routine
* [PARAM   ] from calling routine
* [RETADR  ] to calling routine
* [LOCVAR  ] for called -- current -- routine
* [TEMP    ] for called -- current -- routine
* [(PARAM) ] to be passed to a further call
*
* Broader cross-section, showing chaining for routine 3, in-flight:
* [RETADR1 ] 
* [LOCVAR2 ]
* [TEMP2   ]
* [PARAM3  ]
* [RETADR2 ]
* [LOCVAR3 ]
* [TEMP3   ]
* [(PARAM4)] <= SP (return stack pointer (6800 S is byte below))
***
*
***
* Utility routines
*
* Push low half of return value
* (Didn't use it there, don't use it here.)
PSHLH	TSX
	LDAA	0,X		; return address
	LDAB	1,X
	PSHB
	PSHA
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X
	RTS
*
* Avoid the math to split 16-bit constants into two 8-bit loads,
* and push them while we are here.
* The constant follows the call in the instruction stream.
* Leaves constant in A:B, as well.
PSH16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream	
	INS		; drop the return address we almost have in X
	INS
	PSHB		; replace it with the constant
	PSHA
	JMP	2,X	; return to the byte after the constant.
*
* 8 bytes for the meat of this vs. 3 for the call.
* We end up using it a lot since EXORsim's interactive assembler doesn't do FDBs.
SPSHX	STX	XWORK
	DES
	DES
	TSX
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	LDAA	XWORK
	LDAB	XWORK+1
	STAA	2,X
	STAB	3,X
	RTS
*
* 6 bytes for the meat of this vs. 3 for the call, instead of FDB
* (Didn't use it there, don't use it here.)
TXD	STX	XWORK
	LDAA	XWORK
	LDAB	XWORK+1
	RTS
*
* Utility 16-bit add, leave result in A:B
UADD16	TSX		; no frame
	LDAB	5,X	; left
	ADDB	3,X	; right		; because we can
	LDAA	4,X	; left
	ADCA	2,X	; right
	LDX	0,X
UADROP	INS		; drop return address and parameters
	INS
	INS
	INS
	INS
	INS
	JMP	0,X	; return via X
*
* Utility 16-bit sub, leave result in A:B
* (Didn't use it there, don't use it here.)
USUB16	TSX		; no frame
	LDAB	5,X	; left
	SUBB	3,X	; right		; because we can
	LDAA	4,X	; left
	SBCA	2,X	; right
	LDX	0,X
	BRA	UADROP	; drop return address and parameters
*
*
* We really don't want to put S in a temp if we can avoid it
ALOCS8	PULA
	PULB
ALOS8I	DES
	DES
ALOS6I	DES
	DES
ALOS4I	DES
	DES
ALOS2I	DES
	DES
	PSHB
	PSHA
	RTS
*
ALOCS6	PULA
	PULB
	BRA	ALOS6I
*
ALOCS4	PULA
	PULB
	BRA	ALOS4I
*
ALOCS2	PULA
	PULB
	BRA	ALOS2I
*
INI0_8	CLRA
	CLRB
* call with initialization value in A:B
INIS8	TSX
INIT8	STAA	8,X
	STAB	9,X
INIT6	STAA	6,X
	STAB	7,X
INIT4	STAA	4,X
	STAB	5,X
INIT2	STAA	2,X
	STAB	3,X
	RTS		; 0,X is return address!
*
INI0_6	CLRA
	CLRB
* call with initialization value in A:B
INIS6	TSX
	BRA	INIS6
*
INI0_4	CLRA
	CLRB
* call with initialization value in A:B
INIS4	TSX
	BRA	INIS4
*
INI0_2	CLRA
	CLRB
* call with initialization value in A:B
INIS2	TSX
	BRA	INIS2
*
DROP8	PULA
	PULB
	INS
	INS
DROP6I	INS
	INS
	INS
	INS
	INS
	INS
	PSHB
	PSHA
	RTS
*
DROP6	PULA
	PULB
	BRA	DROP6I
*
*
* Stack at entry
* when functions are called by MAIN
* with two parameters
* We will return results in RETVHI:RETVLO in direct page
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= VARBP1
* [PARAM2_1]
* [PARAM2_2]
* [RETADR1 ] 
*
* Signed 16 bit add to 32 bit result
* Handle sign overflow without losing precision.
* input parameters:
*   16-bit left 1st pushed, right 2nd
* output parameter:
*   17-bit sum in 32-bit in RETVHI:RETVLO
* Does not alter the parameters.
ADD16S	TSX		; no local variables
	LDAA	#(-1)	; prepare for sign extension
	TST	4,X	; the left-hand operand sign bit
	BMI	ADD16SR
	CLRA		; zero extend
ADD16SR	PSHA		; push left extension (only need one byte, though, really)
	PSHA		; left sign cell below X now
	LDAA	#(-1)	; reload
	TST	2,X	; the right-hand operand sign bit
	BMI	ADD16SL
	CLRA		; zero extend
ADD16SL	PSHA		; push right extension
	PSHA
	TSX		; point to sign extensions
	LDAA	8,X	; left-hand low cell
	LDAB	9,X
	ADDB	7,X	; right-hand low cell
	ADCA	6,X
	STAA	RETVLO	; save low half of result
	STAB	RETVLO+1
	LDAA	2,X	; left-hand extension
	LDAB	3,X
	ADCB	1,X	; right-hand extension
	ADCA	0,X
	STAA	RETVHI	; Save high half of result
	STAB	RETVHI+1
	INS		; drop sign extension temporaries
	INS		; 4 INS is one byte more than JSR DROP4
	INS
	INS
	RTS		; result is in RETVLO:RETVHI
*
* Unsigned 16 bit add to 32 bit result
* input parameters:
*   16-bit left, right
* output parameter:
*   17-bit sum in 32-bit RETVLO:RETVHI
ADD16U	TSX		; no local allocations
	LDAA	4,X	; left
	LDAB	5,X
	ADDB	3,X	; right
	ADCA	2,X
	STAA	RETVLO	; save low half
	STAB	RETVLO+1
	LDAB	#0
	ADCB	#0
	STAB	RETVHI+1	; save carry bit in high half
	CLR	RETVHI		; will never carry beyond bit 17
	RTS
*
* Etc.
*
***
*
* Stack after LINK #0 when fuctions are called by MAIN
* with one input parameter
* (#0 means no local variables)
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1] <= PARAM2_1
* [32:VAR1_2]
* [PARAM2_1] (pointer)
* [PARAM2_2] (addend)
* [RETADR1 ] 
*
* Instead of walking the stack, pass in a pointer --
* Add 16-bit signed parameter
* to 32 bit caller's 2nd 32-bit internal variable.
* input parameters:
*   16-bit pointer to 32-bit integer
*   16-bit addend
* no output parameter:
ADD16SI	TSX		; no own local variables 
	LDAA	#(-1)
	TST	2,X	; high byte of addend paramater
	BMI	ADD16SIP
	CLRA
ADD16SIP	PSHA	; save the sign extension half
	PSHA
	LDX	4,X	; get pointer to target
	LDAA	2,X	; target low
	LDAB	3,X
	TSX		; SP[ sign, retadr, addend, long ptr ]
	ADDB	5,X	; addend parameter (stack is two lower, now)
	ADCA	4,X
	LDX	6,X	; target pointer
	STAA	2,X	; save result low half away
	STAB	3,X
	LDAA	0,X	; target high half
	LDAB	1,X
	TSX
	ADCB	1,X	; sign extension half
	ADCA	0,X
	LDX	6,X	; target
	STAA	0,X	; save result high half away
	STAB	1,X
	INS		; three bytes for INS and RTS vs. two bytes for branch
	INS
	RTS		; no result to load
*
*
***
* Stack after variable allocation
* [STKUNDR ]
* [STKUNDR ]STKBAS
* [RETADR0 ] 
* [32:VAR1_1]
* [32:VAR1_2] <= SP
*
MAIN	JSR	ALOCS8	; 2 calls, 6 bytes vs. 1 clr + 8 pushes , 9 bytes
	JSR	INI0_8
	TSX
*
	JSR	PSH16I
*	FDB	$1234	; parameters
	FCB	$12
	FCB	$34
	JSR	PSH16I
*	FDB	$CDEF
	FCB	$CD
	FCB	$EF
	JSR	ADD16U	; result in RETVHI:RETVLO should be $E023
	INS		; drop one parameter, reuse other
	INS
	TSX
	LDAA	RETVLO	; four extra bytes compared to calling PSHLH
	LDAB	RETVLO+1
	STAA	0,X
	STAB	1,X	
	JSR	PSH16I
*	FDB	$8765
	FCB	$87
	FCB	$65
	JSR	ADD16S	; result in RETVHI:RETVLO should be $FFFF6788
	TSX		; reuse both parameters
	LDAA	RETVHI
	LDAB	RETVHI+1
	STAA	4,X		; 2nd local variable high half
	STAB	5,X
	LDAA	RETVLO
	LDAB	RETVLO+1
	STAA	6,X
	STAB	7,X
	STX	XWORK	; calculate address of second variable
	LDAB	XWORK+1
	ADDB	#4
	STAB	3,X
	LDAA	XWORK
	ADCA	#0	; don't lose the carry
	STAA	2,X
	LDAB	#$A5
	STAB	0,X	; $A5
	STAB	1,X	; $A5A5
	JSR	ADD16SI		; result in 2nd variable should be FFFF0D2D
	INS			; drop the parameters
	INS
	INS
	INS
	TSX
	LDAA	2,X		; low half
	LDAB	3,X
	LDX	LB_BASE		; store it in FINAL, in process local space
	STAA	FINALX+2,X
	STAB	FINALX+3,X
	TSX
	LDAA	0,X		; high half
	LDAB	1,X
	LDX	LB_BASE
	STAA	FINALX,X
	STAB	FINALX+1,X
	JSR	DROP8
	RTS
*
*
***
* Stack at START:
* (what BIOS/OS gave us) <= SP
***
*
* Stack after initialization:
* [STKUNDR ]
* [STKUNDR ]STKBAS <= SP
*
***
*
START	NOP
	JSR	INISTK
	NOP
*
	JSR	MAIN
*
DONE	NOP
ERROR	NOP	; define error labels as something not DONE, anyway
STKUNDR	NOP
	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad to set breakpoint at
	NOP
	NOP
	LDX	$FFFE	; alternatively, jmp through reset vector
	JMP	0,X
*
* Anyway, if running in EXORsim, after RESETting,
* Ctrl-C should bring you back to EXORsim monitor, 
* but not necessarily to your program in a runnable state.

I probably spent a good six hours or more figuring out all the places I had messed up the offsets and lost track of what was on the stack. Sure, that was because I'd quit using the frame pointers for reference. It was also  because I was running low on sleep. But it was more so because of that distracting presence of the return addresses right in the middle of the data.

If you haven't traced through this code, do so. Otherwise, you won't really believe me.

And then go take a look at the split-stack version of this.

[JMR202411260841 addendum:]

Speaking of the split-stack version, while working through that, I realized I could have used a load effective address routine here for calculating the address of the second local variable in MAIN, something like

* Add D to S and load to X as a pointer
LEADSX	TSX	; make it a pointer
	INX	; adjust for return address the cheap way
	INX
	STX	XWORK
	ADDB	XWORK+1
	STAB	XWORK+1
	ADCA	XWORK
	STAA	XWORK
	LDX	XWORK
	RTS

[JMR202411260841 addendum end.]

(Note that, this time, I'm not suggesting you move ahead if you are getting tired. You've come this far, it's only a little farther along this path until you can decide whether I'm a fool for thinking split stack with no stack frames is so great -- or maybe see what I see.)

 (Title Page/Index)

 


 

 

No comments:

Post a Comment