Some Address Math
for the
6801
I had thought I would not need to show this for the 6801, but the difference between addressing math on the 6800 and on the 6801, due to being able to add and subtract the double accumulator and being able to push and pop X is dramatic enough that I guess I should.
This chapter, then, will be an extension of the handwaving and conceptualizing in the unsteady footing chapter.
Even if you aren't interested in stack frames, this discussion of addressing
math should be useful, although I'm adding it a bit earlier than I had
planned.
In the 6801, as I keep noting, we have ABX to help us with address math, but
no corollary SBX.
But the D register math is wide enough to do addresses, the big problem being in moving addresses between D and X. Two pushes and a pop, or two pops and a push, is not bad, but going through a pseudo-register in the direct page works quicker, and takes more bytes of object code. And sometimes you didn't want to use the whole D accumulator.
Now that I think of it, a sign-extend B into A instruction like the 6809's
sign-extend instruction, SEX, might have been helpful in a few places.
(cough.) Still, just using D is not an onerous burden.
We still have to use a pseudo-register for many/most of the calculations.
-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.
Here are those NEGate D snippets, modified for 6801:
* For reference -- NEGate a 16-bit value in D (same as 6800) --
NEGD COMA ; 2's complement NEGate is bit COMplement + 1
NEGB
BNE NEGDX ; or BCS. but BNE works -- extends 0
INCA
NEGDX RTS
*
* Another way (use Double accumulator subtract):
NEGDS PSHB
PSHA
CLRB ; 0 - D
CLRA
TSX
SUBD 0,X
INS
INS
RTS
*
* Same thing using Double accumulator and a temporary
* somewhere in DP:
...
SCRCHD RMB 2
...
* somewhere else
NEGDV STD SCRCHA
LDD #0 : 0 - D
SUBD SCRCHA
RTS
...
Remember to read the code and the comments in the code, and open up a separate
browser window to compare side-by-side with the 6800. Read through my
transliterations from the 6800, but don't jump to conclusions before you get
to the very end.
Again, assume you have these declarations for the pseudo-registers:
ORG $80
...
XOFFA RMB 1
XOFFB RMB 1
XOFFSV RMB 2
...
Using D is so much faster than either 8-bit accumulator that it really doesn't
make much sense to provide anything but D-offset, but I've kept the 8-bit and
subtract-by-negating entry points for reference. Lack of a negate D means this
way to subtract de-optimizes subtraction, and, since the D offset is 16-bit,
it's quicker to just load a negative offset in D and call ADDDX instead of
bothering with using the SUBDX entry point.
ADDBX CLRA
ADDDX STX XOFFSV
ADDD XOFFSV
STD XOFFSV
LDX XOFFSV
RTS
SUBBX CLRA ; B is unsigned
SUBDX COMA
NEGB
BNE ADDDX ; or BCS. but BNE works -- extends
INCA
BRA ADDDX
If you want a SUBDX entry point for some reason, it may be worth keeping the
logic separate and moving the operands. The Double accumulator math speeds
this up significantly.
* Alternative, don't use ADDDX, use XOFFA and XOFFB instead
SUBBX CLRA ; B is unsigned
SUBDX STD XOFFA ; subtraction does not commute.
STX XOFFSV ; Handle operand order.
LDD XOFFSV
SUBD XOFFA
STD XOFFSV
LDX XOFFSV
RTS
Just so I don't gloss over ABX, here's ADDBX as a subroutine. 8-bit offset
SUBBX remains as it was for the 6800, except using ABX for the add means
there's not code sharing:
* Working in byte offsets just takes that much more code than D,
* these are all superfluous.
* Well, the ABX instruction can be useful in-line.
* Alternative unsigned byte only
* subtract needs to be checked again
* range 0 to 255
ADDBX ABX
STX XOFFSV
RTS
* No improvements here without just using D.
SUBBX NEGB
BNE SUBDXL ; or BCS. but BNE works -- extends
DEC XOFFSV ; I think inverting the add should work
SUBDXL ADDB XOFFSV+1
BCC SUBBXL ; still need to bring the carry in
INC XOFFSV+1
SUBBXL STAB XOFFSV
LDX XOFFSV
RTS
Using ABX for the positive half of the signed 8-bit routines also emphasizes the lack of SBX in the 6801:
* ABX partially improves the positive half of things here,
* but you really don't want to do this.
* Needs to be checked again.
ADDSBX STX XOFFSV
TSTB ; sign extend B
* BEQ ADSBXD ; use only if we really want to optimize 0
BPL ADSBXU
NEGB ; high byte is -1 (low byte is not 0 anyway)
ADDB XOFFSV+1
DEC XOFFSV ; add -1 (I think )
LDX XOFFSV
ADSBXD RTS
ADSBXU ABX
STX XOFFSV
RTS
Return stack pointer math with byte offsets losing its meaning on the 6801.
You really want the speed when doing math on S, so you're just going to use D.
Again, you should recognize that the call writes the return address into the
allocated space on allocation, so if you've stored before allocation, you'll
be walking on what you stored.
The declarations, note that we are adding SOFFA for the double
accumulator:
* For S stack
* Even though we really don't want to be bumping the return stack that far,
* Using D is just faster on the 6801
ORG $90
...
SOFFA RMB 1
SOFFB RMB 1
SOFFSV RMB 2
And the code, watch the return address code:
* Here's what we can use the 6801 extensions for when doing unsigned byte offsets,
* but, really, use D instead:
ORG SOMETHING
ADDBS PULX ; get return address, restore stack address
STS SOFFSV
ADDB SOFFSV+1 ; can't use ABX because we need X for return
BCC ADDBSL
INC SOFFSV
ADDBSL STAB SOFFSV
LDS SOFFSV
JMP 0,X ; return through X
SUBBS NEGB
BNE ADDBS ; or BCS. but BNE works -- extend
INCA
BRA ADDBS
Doing it with D instead, but use negative offsets instead of the SUBDS entry
point:
* Do it with D, instead, but use negative offsets instead of SUBDS:
ADDDS PULX ; get return address, restore stack address
STS SOFFSV
ADDD SOFFSV ; can't use ABX because we need X for return
ADDDSL STD SOFFSV
LDS SOFFSV
JMP 0,X ; return through X
SUBDS COMA
NEGB
BNE ADDDS ; or BCS. but BNE works -- extend
INCA
BRA ADDDS
Moving the operands around, if we think we must subtract positive offsets instead of adding negative offset, gets a lot of improvement. Again, just use D instead and call SUBDS instead of trying to optimize with the 8-bit B accumulator:
* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS PULX ; get return address, restore stack pointer
STS SOFFSV
STAB SOFFB
BPL SUBBSM
INC SOFFSV ; subtract -1 (I think )
SUBBSM LDAB SOFFSV+1
SUBB SOFFB
BCC SUBBSL
DEC SOFFSV ; subtract the borrow
SUBBSL STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return through X
* Do it with D, instead
* use SOFFA instead of ADDDS
SUBDS PULX ; get return address, restore stack pointer
STS SOFFSV
STD SOFFA
LDD SOFFSV
SUBD SOFFA
STD SOFFSV
LDS SOFFSV
JMP 0,X ; return through X
At this point, I think it is obvious that long trains of INX are meaningless on the 6801: Two to four, in-line, sure. More, no.
Long trains for S also become questionable, but PULX can make an appearance, which is interesting, though not useful more than for something to think about:
* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S PULX
BRA ISB14S
SUB12S PULX
BRA ISB12S
SUB10S PULX
BRA ISB10S
SUB8S PULX
BRA ISB8S
SUB16S PULX
ISB16S DES
DES
ISB14S DES
DES
ISB12S DES
DES
ISB10S DES
DES
ISB8S DES
DES
DES ; SUB7S and less are shorter in-line
DES
DES
DES
DES
DES
JMP 0,X
* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S PULX
BRA IAD14S
ADD12S PULX
BRA IAD12S
ADD10S PULX
BRA IAD10S
ADD8S PULX
BRA IAD8S
ADD6S PULX
BRA IAD6S
ADD16S PULX
IAD16S INS
INS
IAD14S INS
INS
IAD12S INS
INS
IAD10S INS
INS
IAD8S INS
INS
IAD6S INS ; ADD5S and less are shorter in-line
INS
INS
INS
INS
INS
JMP 0,X
I guess, since I'm being noisy about SBX not being implemented on the 6801, I should also be noisy about ABS (add B to S) and SBS (subtract B from S) being missing.
But so much of the above really becomes irrelevant if we just liberate ourselves from the stack frame mentality/paradigm. Stack frames really ought to be classed among Monty Python's silly walks.
Stacks allocated entirely within a single page
Concerning the optimization of allocating stacks entirely within a page and only doing math on the low byte, the 6801 offers no improvements to that, only to make the optimization less meaningful. I'll repeat, with the full address math below to make it clear.
Oh, but working directly on the parameter stack pointer becomes more interesting.
* And stacks restricted within page boundaries no longer make as much sense on the 6801.
* Pseudo-registers somewhere in DP:
PSP RMB 2
XOFFSV RMB 2
XOFFA RMB 1
XOFFB RMB 1
SOFFA RMB 1
SOFFB RMB 1
SOFFSV RMB 2
...
ORG $500 ; or something
RMB 4 ; buffer zone
PSTKLIM RMB 64
PSTKBAS RMB 4 ; buffer zone
SSTKLIM RMB 32
SSTKBAS RMB 4 ; buffer zone
...
* B for parameter stack:
ADBPSX STX PSP
ADBPSP ADDB PSP+1 ; Stack allocated completely within page, never carries.
STAB PSP+1
LDX PSP
RTS
*
* D for parameter stack:
ADDPSX STX PSP
ADDPSP ADDD PSP
STD PSP ; does the whole pointer, negatives, too
LDX PSP
RTS
*
* B for parameter stack:
SBBPSX STX PSP
SBBPSP STAB XOFFB
LDAB PSP+1
SUBB XOFFB ; Stack allocated completely within page, never carries.
STAB PSP+1
LDX PSP
RTS
*
* D for parameter stack:
SBDPSX STX PSP
SBDPSP STD XOFFA
LDD PSP
SUBD XOFFA ; does the whole pointer
STD PSP
LDX PSP
RTS
* B for return stack:
ADBSP PULX ; return address
STS SOFFSV
ADDB SOFFSV+1 ; Stack allocated completely within page, never carries.
STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return via X
*
* D for return stack (but we saw this above):
ADDSP PULX ; return address
STS SOFFSV
ADDD SOFFSV ; does the whole pointer, negatives, too
STD SOFFSV
LDS SOFFSV
JMP 0,X ; return via X
* B for return stack:
SBBSP PULX ; return address
STS SOFFSV
STAB SOFFB
LDAB SOFFSV+1
SUBB SOFFB ; Stack allocated completely within page, never carries
STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return via X
*
* D for return stack (but we saw this above):
SBDSP PULX ; return address
STS SOFFSV
STD SOFFA
LDD SOFFSV
SUBD SOFFA ; does the whole pointer
STD SOFFSV
LDS SOFFSV
JMP 0,X ; return via X
As with the last chapter, I have not tested the code. I do think it
should run, modulo typos.
[JMR202411021012 addendum:]
Not stack frame related, but address math. I discussed it in the 6800 address math chapter, and I want to show the 6801 version of the code.
This is for accessing per-process global variables that don't need such high-speed access that they are worth slowing process switches down with, which is almost all per-process variables except when the hardware application only has a few very limited processes. See the discussion before the 6800 snippets.
* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
...
LOCBAS RMB 2
LBXPTR RMB 2
...
*
* And, to get the address of variables in the per-process variable space,
* something like these functions --
ADDLBB CLRA ; entry point for the byte offset in B
ADDLBD ADDD LOCBAS ; entry point for larger offsets in A:B
STD LBXPTR
RTS
*
ADDLBX BSR ADDLBB ; and load X
LDX LBXPTR
RTS
*
ADDLDX BSR ADDLBD ; and load X
LDX LBXPTR
RTS
[JMR202411021012 addendum end.]
And with this in mind, too, while thinking about how the 6801's enhanced
instruction set can make some of the above code much less intransigent, let's
remind ourselves
why the 6809
and 68000 don't need routines like these before we take a look at a concrete
example of stack frames on the 6801.
No comments:
Post a Comment