Wednesday, October 30, 2024

ALPP 02-21 -- Some Address Math for the 6801

  Some Address Math
for the
6801

(Title Page/Index)

I had thought I would not need to show this for the 6801, but the difference between addressing math on the 6800 and on the 6801, due to being able to add and subtract the double accumulator and being able to push and pop X is dramatic enough that I guess I should.

This chapter, then, will be an extension of the handwaving and conceptualizing in the unsteady footing chapter

Even if you aren't interested in stack frames, this discussion of addressing math should be useful, although I'm adding it a bit earlier than I had planned.

In the 6801, as I keep noting, we have ABX to help us with address math, but no corollary SBX. 

But the D register math is wide enough to do addresses, the big problem being in moving addresses between D and X. Two pushes and a pop, or two pops and a push, is not bad, but going through a pseudo-register in the direct page works quicker, and takes more bytes of object code. And sometimes you didn't want to use the whole D accumulator.

Now that I think of it, a sign-extend B into A instruction like the 6809's sign-extend instruction, SEX, might have been helpful in a few places. (cough.) Still, just using D is not an onerous burden.

We still have to use a pseudo-register for many/most of the calculations.

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

Here are those NEGate D snippets, modified for 6801:

* For reference -- NEGate a 16-bit value in D (same as 6800) --
NEGD	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGDX	; or BCS. but BNE works -- extends 0
	INCA
NEGDX	RTS
*
* Another way (use Double accumulator subtract):
NEGDS	PSHB
	PSHA
	CLRB	; 0 - D
	CLRA
	TSX
	SUBD	0,X
	INS
	INS
	RTS	
*
* Same thing using Double accumulator and a temporary
* somewhere in DP:
	...
SCRCHD	RMB	2
	...
* somewhere else
NEGDV	STD	SCRCHA
	LDD	#0	: 0 - D
	SUBD	SCRCHA
	RTS
	...

Remember to read the code and the comments in the code, and open up a separate browser window to compare side-by-side with the 6800. Read through my transliterations from the 6800, but don't jump to conclusions before you get to the very end.

Again, assume you have these declarations for the pseudo-registers:

	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

Using D is so much faster than either 8-bit accumulator that it really doesn't make much sense to provide anything but D-offset, but I've kept the 8-bit and subtract-by-negating entry points for reference. Lack of a negate D means this way to subtract de-optimizes subtraction, and, since the D offset is 16-bit, it's quicker to just load a negative offset in D and call ADDDX instead of bothering with using the SUBDX entry point.

ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDD	XOFFSV
	STD	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

If you want a SUBDX entry point for some reason, it may be worth keeping the logic separate and moving the operands. The Double accumulator math speeds this up significantly.

* Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STD	XOFFA	; subtraction does not commute.
	STX	XOFFSV	; Handle operand order.
	LDD	XOFFSV
	SUBD	XOFFA
	STD	XOFFSV
	LDX	XOFFSV
	RTS

Just so I don't gloss over ABX, here's ADDBX as a subroutine. 8-bit offset SUBBX remains as it was for the 6800, except using ABX for the add means there's not code sharing:

* Working in byte offsets just takes that much more code than D,
* these are all superfluous.
* Well, the ABX instruction can be useful in-line.
* Alternative unsigned byte only
* subtract needs to be checked again
* range 0 to 255
ADDBX	ABX
	STX	XOFFSV
	RTS
* No improvements here without just using D.
SUBBX	NEGB
	BNE	SUBDXL	; or BCS. but BNE works -- extends
	DEC	XOFFSV	; I think inverting the add should work
SUBDXL	ADDB	XOFFSV+1
	BCC	SUBBXL	; still need to bring the carry in
	INC	XOFFSV+1
SUBBXL	STAB	XOFFSV
	LDX	XOFFSV
	RTS

Using ABX for the positive half of the signed 8-bit routines also emphasizes the lack of SBX in the 6801:

* ABX partially improves the positive half of things here,
* but you really don't want to do this.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	LDX	XOFFSV
ADSBXD	RTS
ADSBXU	ABX
	STX	XOFFSV
	RTS

Return stack pointer math with byte offsets losing its meaning on the 6801. You really want the speed when doing math on S, so you're just going to use D.

PSHX and PULX helps with handling the return address..

Again, you should recognize that the call writes the return address into the allocated space on allocation, so if you've stored before allocation, you'll be walking on what you stored.

The declarations,  note that we are adding SOFFA for the double accumulator:

* For S stack
* Even though we really don't want to be bumping the return stack that far,
* Using D is just faster on the 6801
	ORG	$90
	...
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:

* Here's what we can use the 6801 extensions for when doing unsigned byte offsets,
* but, really, use D instead:
	ORG	SOMETHING
ADDBS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1	; can't use ABX because we need X for return
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Doing it with D instead, but use negative offsets instead of the SUBDS entry point:

* Do it with D, instead, but use negative offsets instead of SUBDS:
ADDDS	PULX	; get return address, restore stack address
	STS	SOFFSV
	ADDD	SOFFSV	; can't use ABX because we need X for return
ADDDSL	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBDS	COMA
	NEGB
	BNE	ADDDS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDDS

Moving the operands around, if we think we must subtract positive offsets instead of adding negative offset, gets a lot of improvement. Again, just use D instead and call SUBDS instead of trying to optimize with the 8-bit B accumulator:

* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X
* Do it with D, instead
* use SOFFA instead of ADDDS
SUBDS	PULX	; get return address, restore stack pointer
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X

At this point, I think it is obvious that long trains of INX are meaningless on the 6801: Two to four, in-line, sure. More, no.

Long trains for S also become questionable, but PULX can make an appearance, which is interesting, though not useful more than for something to think about:
* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	PULX
	BRA	ISB14S
SUB12S	PULX
	BRA	ISB12S
SUB10S	PULX
	BRA	ISB10S
SUB8S	PULX
	BRA	ISB8S
SUB16S	PULX
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES
	DES
	DES
	JMP	0,X
* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	PULX
	BRA	IAD14S
ADD12S	PULX
	BRA	IAD12S
ADD10S	PULX
	BRA	IAD10S
ADD8S	PULX
	BRA	IAD8S
ADD6S	PULX
	BRA	IAD6S
ADD16S	PULX
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
IAD6S	INS	; ADD5S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	JMP	0,X

I guess, since I'm being noisy about SBX not being implemented on the 6801, I should also be noisy about ABS (add B to S) and SBS (subtract B from S) being missing.

But so much of the above really becomes irrelevant if we just liberate ourselves from the stack frame mentality/paradigm. Stack frames really ought to be classed among Monty Python's silly walks. 

Stacks allocated entirely within a single page

Concerning the optimization of allocating stacks entirely within a page and only doing math on the low byte, the 6801 offers no improvements to that, only to make the optimization less meaningful. I'll repeat, with the full address math below to make it clear. 

 Oh, but working directly on the parameter stack pointer becomes more interesting.

* And stacks restricted within page boundaries no longer make as much sense on the 6801.
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFA	RMB	1
XOFFB	RMB	1
SOFFA	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* B for parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
ADDPSX	STX	PSP
ADDPSP	ADDD	PSP
	STD	PSP	; does the whole pointer, negatives, too
	LDX	PSP
	RTS
*
* B for parameter stack:
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
* D for parameter stack:
SBDPSX	STX	PSP
SBDPSP	STD	XOFFA
	LDD	PSP
	SUBD	XOFFA	; does the whole pointer
	STD	PSP
	LDX	PSP
	RTS

* B for return stack:
ADBSP	PULX	; return address
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
ADDSP	PULX	; return address
	STS	SOFFSV
	ADDD	SOFFSV	; does the whole pointer, negatives, too
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

* B for return stack:
SBBSP	PULX	; return address
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X
*
* D for return stack (but we saw this above):
SBDSP	PULX	; return address
	STS	SOFFSV
	STD	SOFFA
	LDD	SOFFSV
	SUBD	SOFFA		; does the whole pointer
	STD	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return via X

As with the last chapter, I have not tested the code. I do think it should run, modulo typos.

[JMR202411021012 addendum:]

 Not stack frame related, but address math. I discussed it in the 6800 address math chapter, and I want to show the 6801 version of the code. 

This is for accessing per-process global variables that don't need such high-speed access that they are worth slowing process switches down with, which is almost all per-process variables except when the hardware application only has a few very limited processes. See the discussion before the 6800 snippets.

* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something like these functions --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDD	LOCBAS		; entry point for larger offsets in A:B
	STD	LBXPTR
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411021012 addendum end.]

And with this in mind, too, while thinking about how the 6801's enhanced instruction set can make some of the above code much less intransigent, let's remind ourselves why the 6809 and 68000 don't need routines like these before we take a look at a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

No comments:

Post a Comment