Some Address Math
for the
6800
Perhaps I would not have gotten so tangled up in
the discussion of stack frames
if I had simply written this chapter immediately after the
demonstration of 16- and 32-bit arithmetic on the 68000. But sometimes you just need to see a reason for doing something before you
see someone doing it, or it blows your mind.
What is the difference between address math and other math?
Not a lot. You still have to pay attention to signs and stuff, and watch what happens when you wrap around the limits of your registers. Rings are fun, but you have to get used to them.
Ah, yes, right. One thing about general address math is that you need to be aware of the limits of your registers. You often don't know in advance where in memory the address you're working on is going to be.
Not to say you don't have to be aware of limits in non-address math -- rather,
where the limits hit and how they hit can be different, so you have to watch a
different way.
One other difference is that, for general math, you want your call and result parameters in places where they can be easily carried from one stage in calculations to the next. That's why I have been demonstrating the use of the parameter stack versus global variables (versus registers).
For address math, if possible, you absolutely want your parameters and the result in registers, specifically the result in a particular register that can be used in addressing.
In the earliest CPUs, the math itself was hard enough (unknown enough) that
addressing seemed to be an afterthought -- or even outside the plans. You
can't plan well without knowing what you're planning for -- and what you're
planning.
We really didn't know what we were doing.
Intel, for instance, almost killed themselves in the mid-1970s working on a CPU design that was supposed to be the be-all-and-end-all of CPUs, the iAPX 432. But there was too much theory without experience, and it was slow and fought with itself to get work done. When they saw deadlines pass without end in sight, especially when rumors of what Motorola was doing hit the backyard fence, they scrambled and used part of what they had learned and produced the 8086, and the 8086 was definitely an improvement on the 8080 -- and saved their bacon when the 432 that was delivered didn't live up to promise. And the 8086 was a small enough step forward that it was easy for customers to adopt -- setting the stage for Intel to lead by adopting small improvements in steps that could be handled. But the 8086 also was, and its descendants still are, more than a little baroque.
Motorola, for their part, had figured out they needed to do something radical to stay competitive, and had started examining source code for the 6800 that they had access to, looking for ways to relieve computational bottlenecks. They used that research in the original design of the 68000, and there was a parallel team that had access to the research and put it to use in the design of the 6809.
And they hit a home run on the 6809 -- almost. Brought three runners in and
left the DP register stranded on 3rd, so to speak. If you think of DP as the
pinch runner or something. Okay, the metaphor doesn't quite work, unless you
think of the DP register as the pinch runner for a wider address space, which
it almost was.
The 68000 was another home run -- out of season and some overkill. And it has
some warts, too.
Every real CPU is going to have warts. It's a mathematical requirement.
I'm not kidding. There is an axiom in systems science
Every model is insufficient to reality.
And that has some consequences:
- Every system has vulnerabilities, and
- every system contains the seeds of its own undoing, and
- every market window is a sandpit.
Translated into general science, we know in advance that every theory and every law will eventually fail.
But that kind of cold water just is not popular in the sales department, so, instead of emblazoning it on the halls of all higher learning and in the chambers of legislatures, we hide it away.
(Mostly -- there is some recognition at times --
POSIWID.)
All of that to warn you:
Ugly code in here.
I did some handwaving and conceptualizing for the 6801 in
the unsteady footing chapter. I'm continuing with more handwaving and untested code in this chapter, but
for the 6800.
First, in the 6800, we have nothing special to add a constant to the index register with anything but an ephemeral result. That's great for some things like constant offsets (thus, the 6800's indexed mode), but not so great for some other things. And it's always a positive constant, which makes some stack-related uses hard.
In the 6801, we have ABX to add a small offset -- unsigned, less than 256 --
but no SBX to subtract an offset, and no signed ASBX or whatever.
The way the instruction set is constructed, we end up having to use a variable in memory to do the math, and because we have to use X to index the stack(s), passing the offset in as a dynamically allocated parameter is a case of trying to resolve a cyclic dependency.
Thus, we simply have to use a pseudo-register -- preferably in the direct page.
-- Which causes issues at interrupt time, unless we have separate
pseudo-registers used by separate routines for interrupt-time, and copy the
user tasks' pseudo-registers out and in on context switch.
We'll be using 16-bit negation rather frequently, keep a couple or three snippets in mind:
* For reference -- NEGate a 16-bit value in A:B --
NEGAB COMA ; 2's complement NEGate is bit COMplement + 1
NEGB
BNE NEGABX ; or BCS. but BNE works -- extends 0
INCA
NEGABX RTS
*
* Another way, using stack for temporary:
NEGABS PSHB
PSHA
CLRB ; 0 - A:B
CLRA
TSX
SUBB 1,X
SBCA 0,X
INS
INS
RTS
*
* Same thing using a temporary
* somewhere in DP:
...
SCRCHA RMB 1
SCRCHB RMB 1
...
* somewhere else
NEGABV STAA SCRCHA
STAB SCRCHB
CLRA : 0 - A:B
CLRB
SUBB SCRCHB
SBCA SCRCHA
RTS
...
I'm going to assume that you'll be reading the code and the comments closely
enough to tell when you should doubt me.
Assume you have these declarations for the pseudo-registers:
ORG $80
...
XOFFA RMB 1
XOFFB RMB 1
XOFFSV RMB 2
...
These entry points should add and subtract offsets in A:B. Note that the code
inverts A:B to do the subtraction, to avoid commutation issues. (Note
carefully the INCA. I think I have this right for handling the NEGB when B is
zero.)
ADDBX CLRA
ADDDX STX XOFFSV
ADDB XOFFSV+1
ADCA XOFFSV
STAB XOFFSV+1
STAA XOFFSV
LDX XOFFSV
RTS
SUBBX CLRA ; B is unsigned
SUBDX COMA
NEGB
BNE ADDDX ; or BCS. but BNE works -- extends
INCA
BRA ADDDX
As an alternative, we could move the operands around using more
pseudo-registers (and remembering the consequences). This code may be a little
easier to believe in, but it does mean two more bytes to save away and restore
on context switch.
* Alternative, don't use ADDDX, use XOFFA and XOFFB instead
SUBBX CLRA ; B is unsigned
SUBDX STAA XOFFA ; subtraction does not commute.
STAB XOFFB ; Handle operand order.
STX XOFFSV
LDAA XOFFSV
LDAB XOFFSV+1
SUBB XOFFB
SBCA XOFFA
STAA XOFFSV
STAB XOFFSV+1
LDX XOFFSV
RTS
You can optimize the above a bit if you limit offsets to 0 to 255, which is a
completely reasonable restriction for many applications. I won't show those. I
don't want to wear you out with too much untested code.
Signed byte offset (-128 to 127) is also completely reasonable for many
applications, and may offer some aesthetic satisfaction:
* this is faster than SUBDX and almost as fast as ADDDX,
* Range is -128 to 128 which should be enough for many purposes.
* But unsigned byte-only can be faster.
* Needs to be checked again.
ADDSBX STX XOFFSV
TSTB ; sign extend B
* BEQ ADSBXD ; use only if we really want to optimize 0
BPL ADSBXU
NEGB ; high byte is -1 (low byte is not 0 anyway)
ADDB XOFFSV+1
DEC XOFFSV ; add -1 (I think )
BRA ADSBXL
ADSBXU ADDB XOFFSV+1
BCC ADSBXL
INC XOFFSV
ADSBXL LDX XOFFSV
ADSBXD RTS
And we can do similar things with the return stack, S. S, in particular, should never need offsets larger than 255 on the 6800, so we'll focus on the unsigned byte options.
The stack has the additional constraints of requiring some means of handling the return address.
One more thing, you should recognize that the call writes the return address
into the allocated space on allocation. If there is something important there,
it's toast.
The declarations:
* For S stack
* unsigned byte only,
* because we really don't want to be bumping the return stack that much
ORG $90
...
SOFFB RMB 1
SOFFSV RMB 2
And the code, watch the return address code:
ADDBS TSX
LDX 0,X ; get return address
INS
INS ; restore stack address
STS SOFFSV
ADDB SOFFSV+1
BCC ADDBSL
INC SOFFSV
ADDBSL STAB SOFFSV
LDS SOFFSV
JMP 0,X ; return through X
SUBBS NEGB
BNE ADDBS ; or BCS. but BNE works -- extend
INCA
BRA ADDBS
Again, the subtraction can alternatively move the operands into the right order, at the cost of using another pseudo-register:
* use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS TSX
LDX 0,X ; get return address
INS
INS ; restore stack pointer
STS SOFFSV
STAB SOFFB
BPL SUBBSM
INC SOFFSV ; subtract -1 (I think )
SUBBSM LDAB SOFFSV+1
SUBB SOFFB
BCC SUBBSL
DEC SOFFSV ; subtract the borrow
SUBBSL STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return through X
You'll remember I made reference to long trains of INX and DEX as a substitute for direct math on X:
* For small increments <= 16
ADD16X INX
INX
ADD14X INX
INX
ADD12X INX
INX
ADD10X INX
INX
ADD8X INX
INX
ADD6X INX
INX
INX ; ADD4X and less shorter in-line
INX
INX
INX
RTS
* For small decrements <= 16
SUB16X DEX
DEX
SUB14X DEX
DEX
SUB12X DEX
DEX
SUB10X DEX
DEX
SUB8X DEX
DEX
SUB6X DEX
DEX
DEX ; SUB4X and less shorter in-line
DEX
DEX
DEX
RTS
Just jump to the label for the offset you need to add or subtract.
I know it looks ... ugly. But it works, and it avoids the use of pseudo-registers, and it's fast, and it actually doesn't use up more code space than the general routines we've looked at. These are worth considering.
And you're thinking, well, that's not going to work for the return stack?
Hah!
* For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S TSX
LDX 0,X
BRA ISB14S
SUB12S TSX
LDX 0,X
BRA ISB12S
SUB10S TSX
LDX 0,X
BRA ISB10S
SUB8S TSX
LDX 0,X
BRA ISB8S
SUB16S TSX
LDX 0,X
ISB16S DES
DES
ISB14S DES
DES
ISB12S DES
DES
ISB10S DES
DES
ISB8S DES
DES
DES ; SUB7S and less are shorter in-line
DES
DES
DES ; two less because of the return address
JMP 0,X
* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S TSX
LDX 0,X
BRA IAD14S
ADD12S TSX
LDX 0,X
BRA IAD12S
ADD10S TSX
LDX 0,X
BRA IAD10S
ADD8S TSX
LDX 0,X
BRA IAD8S
ADD16S TSX
LDX 0,X
IAD16S INS
INS
IAD14S INS
INS
IAD12S INS
INS
IAD10S INS
INS
IAD8S INS
INS
INS ; ADD7S and less are shorter in-line
INS
INS
INS
INS
INS
INS ; two more to cover the return address
INS
JMP 0,X
What's that? Do I hear complaints about the smell.
It's ugly, but it could be useful.
Stacks allocated entirely within a 256-byte page
Finally, if we are talking about stacks (and other largish things in memory), it may be possible to arrange them in memory so that the stacks lie completely within a single 256 byte page, such that the high byte of address does not change. This particular trick was used to great effect on the 6502 and 6805, in particular.
We can use it on the 6800 in some cases, if we can be absolutely sure that
everybody who ever touches the code is aware of the requirement to keep each
stack entirely within a single page.
* Stacks within page boundaries:
* Pseudo-registers somewhere in DP:
PSP RMB 2
XOFFSV RMB 2
XOFFB RMB 1
SOFFB RMB 1
SOFFSV RMB 2
...
ORG $500 ; or something
RMB 4 ; buffer zone
PSTKLIM RMB 64
PSTKBAS RMB 4 ; buffer zone
SSTKLIM RMB 32
SSTKBAS RMB 4 ; buffer zone
...
* For parameter stack:
ADBPSX STX PSP
ADBPSP ADDB PSP+1 ; Stack allocated completely within page, never carries.
STAB PSP+1
LDX PSP
RTS
*
SBBPSX STX PSP
SBBPSP STAB XOFFB
LDAB PSP+1
SUBB XOFFB ; Stack allocated completely within page, never carries.
STAB PSP+1
LDX PSP
RTS
* For return stack:
ADBSP TSX
LDX 0,X ; return address
ADDB #2 ; faster, same byte count
STS SOFFSV
ADDB SOFFSV+1 ; Stack allocated completely within page, never carries.
STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return via X
SBBSP TSX
LDX 0,X ; return address
ADDB #2 ; faster, same byte count
STS SOFFSV
STAB SOFFB
LDAB SOFFSV+1
SUBB SOFFB ; Stack allocated completely within page, never carries
STAB SOFFSV+1
LDS SOFFSV
JMP 0,X ; return via X
Again, I have not tested the code. It should run. I think.
As a reminder, we've already seen what code looks like without stack frames. The only reason I'm showing you this stuff is so that you understand why stack frames may not be preferred for many applications (and, if you can understand that, maybe you can sometime see it for all applications).
Well, no, not the only reason. Maybe the only reason I'm showing it to you now rather than later.
[JMR202411020931 addendum:]
This is not stack frame related, but it's address math related, and I think it would be good to discuss it here, lest I forget --
There are two approaches to per-process variables.- Pseudo-registers like PSP, XWORK, XOFFSV, SOFFSV, etc. will either be saved and restored on process switch or will have separate versions for each task, if there are not too many.
-
Most per-process variables with global allocation should be in a per-process
address space.
You'll usually use both, a few pseudo-registers for variables that need quick
access, and they need to just a few to keep the management overhead on
task/process switch to a minimum. Every pseudo-register must be saved and
restored on process switch --
Except for a couple of special cases,
- It's useful to keep system pseudo-registers separate from non-system pseudo-registers, complete with separate routines to manage them.
-
If there are just a few non-system processes in a small hardware
application, it may be useful to give each process its own pseudo-registers,
along with the routines to manage them.
What kinds of things need to be pseudo-registers?
XWORK and other such temporaries, including SOFFSV and such above.
And PSP, as well. (Note that, if the system functions use a parameter stack,
it should be a separate SPSP or something, which would have to have its own
support routines.)
If there are a lot of per-process variables, you would need, separate from
pseudo-registers, a process-local space. And you would need a pointer to that
space, with routines to access the variables there:
* In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
...
LOCBAS RMB 2
LBXPTR RMB 2
...
*
* And, to get the address of variables in the per-process variable space,
* something link these --
ADDLBB CLRA ; entry point for the byte offset in B
ADDLBD ADDB LOCBAS+1 ; entry point for larger offsets in A:B
ADCA LOCBAS
STAA LBXPTR
STAB LBXPTR+1 ; let other code load X
RTS
*
ADDLBX BSR ADDLBB ; and load X
LDX LBXPTR
RTS
*
ADDLDX BSR ADDLBD ; and load X
LDX LBXPTR
RTS
[JMR202411020931 addendum end.]
With all this in mind, look at
how the 6801's enhanced instruction set can make some of the above code
much less intransigent
before we take a look at a concrete example of stack frames on the 6800.
No comments:
Post a Comment