joel's programming fun: ALPP 02-20 -- Some Address Math for the 6800

Some Address Math
for the
6800

Perhaps I would not have gotten so tangled up in the discussion of stack frames if I had simply written this chapter immediately after the demonstration of 16- and 32-bit arithmetic on the 68000. But sometimes you just need to see a reason for doing something before you see someone doing it, or it blows your mind.

What is the difference between address math and other math?

Not a lot. You still have to pay attention to signs and stuff, and watch what happens when you wrap around the limits of your registers. Rings are fun, but you have to get used to them.

Ah, yes, right. One thing about general address math is that you need to be aware of the limits of your registers. You often don't know in advance where in memory the address you're working on is going to be.

Not to say you don't have to be aware of limits in non-address math -- rather, where the limits hit and how they hit can be different, so you have to watch a different way.

One other difference is that, for general math, you want your call and result parameters in places where they can be easily carried from one stage in calculations to the next. That's why I have been demonstrating the use of the parameter stack versus global variables (versus registers).

For address math, if possible, you absolutely want your parameters and the result in registers, specifically the result in a particular register that can be used in addressing.

In the earliest CPUs, the math itself was hard enough (unknown enough) that addressing seemed to be an afterthought -- or even outside the plans. You can't plan well without knowing what you're planning for -- and what you're planning.

We really didn't know what we were doing.

Intel, for instance, almost killed themselves in the mid-1970s working on a CPU design that was supposed to be the be-all-and-end-all of CPUs, the iAPX 432. But there was too much theory without experience, and it was slow and fought with itself to get work done. When they saw deadlines pass without end in sight, especially when rumors of what Motorola was doing hit the backyard fence, they scrambled and used part of what they had learned and produced the 8086, and the 8086 was definitely an improvement on the 8080 -- and saved their bacon when the 432 that was delivered didn't live up to promise. And the 8086 was a small enough step forward that it was easy for customers to adopt -- setting the stage for Intel to lead by adopting small improvements in steps that could be handled. But the 8086 also was, and its descendants still are, more than a little baroque.

Motorola, for their part, had figured out they needed to do something radical to stay competitive, and had started examining source code for the 6800 that they had access to, looking for ways to relieve computational bottlenecks. They used that research in the original design of the 68000, and there was a parallel team that had access to the research and put it to use in the design of the 6809.

And they hit a home run on the 6809 -- almost. Brought three runners in and left the DP register stranded on 3rd, so to speak. If you think of DP as the pinch runner or something. Okay, the metaphor doesn't quite work, unless you think of the DP register as the pinch runner for a wider address space, which it almost was.

The 68000 was another home run -- out of season and some overkill. And it has some warts, too.

Every real CPU is going to have warts. It's a mathematical requirement.

I'm not kidding. There is an axiom in systems science

Every model is insufficient to reality.

And that has some consequences:

Every system has vulnerabilities, and
every system contains the seeds of its own undoing, and
every market window is a sandpit.

Translated into general science, we know in advance that every theory and every law will eventually fail.

But that kind of cold water just is not popular in the sales department, so, instead of emblazoning it on the halls of all higher learning and in the chambers of legislatures, we hide it away.

(Mostly -- there is some recognition at times -- POSIWID.)

All of that to warn you:

Ugly code in here.

I did some handwaving and conceptualizing for the 6801 in the unsteady footing chapter. I'm continuing with more handwaving and untested code in this chapter, but for the 6800.

First, in the 6800, we have nothing special to add a constant to the index register with anything but an ephemeral result. That's great for some things like constant offsets (thus, the 6800's indexed mode), but not so great for some other things. And it's always a positive constant, which makes some stack-related uses hard.

In the 6801, we have ABX to add a small offset -- unsigned, less than 256 -- but no SBX to subtract an offset, and no signed ASBX or whatever.

The way the instruction set is constructed, we end up having to use a variable in memory to do the math, and because we have to use X to index the stack(s), passing the offset in as a dynamically allocated parameter is a case of trying to resolve a cyclic dependency.

Thus, we simply have to use a pseudo-register -- preferably in the direct page.

-- Which causes issues at interrupt time, unless we have separate pseudo-registers used by separate routines for interrupt-time, and copy the user tasks' pseudo-registers out and in on context switch.

We'll be using 16-bit negation rather frequently, keep a couple or three snippets in mind:

* For reference -- NEGate a 16-bit value in A:B --
NEGAB	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGABX	; or BCS. but BNE works -- extends 0
	INCA
NEGABX	RTS
*
* Another way, using stack for temporary:
NEGABS	PSHB
	PSHA
	CLRB	; 0 - A:B
	CLRA
	TSX
	SUBB	1,X
	SBCA	0,X
	INS
	INS
	RTS	
*
* Same thing using a temporary
* somewhere in DP:
	...
SCRCHA	RMB	1
SCRCHB	RMB	1
	...
* somewhere else
NEGABV	STAA	SCRCHA
	STAB	SCRCHB
	CLRA		: 0 - A:B
	CLRB
	SUBB	SCRCHB
	SBCA	SCRCHA
	RTS
	...

I'm going to assume that you'll be reading the code and the comments closely enough to tell when you should doubt me.

Assume you have these declarations for the pseudo-registers:


    	ORG	$80
	...
XOFFA	RMB	1
XOFFB	RMB	1
XOFFSV	RMB	2
	...

These entry points should add and subtract offsets in A:B. Note that the code inverts A:B to do the subtraction, to avoid commutation issues. (Note carefully the INCA. I think I have this right for handling the NEGB when B is zero.)


    ADDBX	CLRA
ADDDX	STX	XOFFSV
	ADDB	XOFFSV+1
	ADCA	XOFFSV
	STAB	XOFFSV+1
	STAA	XOFFSV
	LDX	XOFFSV
	RTS
SUBBX	CLRA	; B is unsigned
SUBDX	COMA
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDDX

As an alternative, we could move the operands around using more pseudo-registers (and remembering the consequences). This code may be a little easier to believe in, but it does mean two more bytes to save away and restore on context switch.


    * Alternative, don't use ADDDX, use XOFFA and XOFFB instead 
SUBBX	CLRA	; B is unsigned
SUBDX	STAA	XOFFA	; subtraction does not commute.
	STAB	XOFFB	; Handle operand order.
	STX	XOFFSV
	LDAA	XOFFSV
	LDAB	XOFFSV+1
	SUBB	XOFFB
	SBCA	XOFFA
	STAA	XOFFSV
	STAB	XOFFSV+1
	LDX	XOFFSV
	RTS

You can optimize the above a bit if you limit offsets to 0 to 255, which is a completely reasonable restriction for many applications. I won't show those. I don't want to wear you out with too much untested code.

Signed byte offset (-128 to 127) is also completely reasonable for many applications, and may offer some aesthetic satisfaction:


    * this is faster than SUBDX and almost as fast as ADDDX, 
* Range is -128 to 128 which should be enough for many purposes.
* But unsigned byte-only can be faster.
* Needs to be checked again.
ADDSBX	STX	XOFFSV
	TSTB	; sign extend B
*	BEQ	ADSBXD	; use only if we really want to optimize 0
	BPL	ADSBXU
	NEGB	; high byte is -1 (low byte is not 0 anyway)
	ADDB	XOFFSV+1
	DEC	XOFFSV	; add -1 (I think )
	BRA	ADSBXL
ADSBXU	ADDB	XOFFSV+1
	BCC	ADSBXL
	INC	XOFFSV
ADSBXL	LDX	XOFFSV
ADSBXD	RTS

And we can do similar things with the return stack, S. S, in particular, should never need offsets larger than 255 on the 6800, so we'll focus on the unsigned byte options.

The stack has the additional constraints of requiring some means of handling the return address.

One more thing, you should recognize that the call writes the return address into the allocated space on allocation. If there is something important there, it's toast.

The declarations:


    * For S stack
* unsigned byte only,
* because we really don't want to be bumping the return stack that much
	ORG	$90
	...
SOFFB	RMB	1
SOFFSV	RMB	2

And the code, watch the return address code:


    ADDBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack address
	STS	SOFFSV
	ADDB	SOFFSV+1
	BCC	ADDBSL
	INC	SOFFSV
ADDBSL	STAB	SOFFSV
	LDS	SOFFSV
	JMP	0,X	; return through X
SUBBS	NEGB
	BNE	ADDBS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDBS

Again, the subtraction can alternatively move the operands into the right order, at the cost of using another pseudo-register:


    * use SOFFB instead of ADDBS
* range 0 to 255
* Need to check again
SUBBS	TSX
	LDX	0,X	; get return address
	INS
	INS		; restore stack pointer
	STS	SOFFSV
	STAB	SOFFB
	BPL	SUBBSM
	INC	SOFFSV	; subtract -1 (I think )
SUBBSM	LDAB	SOFFSV+1
	SUBB	SOFFB
	BCC	SUBBSL
	DEC	SOFFSV	; subtract the borrow
SUBBSL	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return through X

You'll remember I made reference to long trains of INX and DEX as a substitute for direct math on X:


    * For small increments <= 16
ADD16X	INX
	INX
ADD14X	INX
	INX
ADD12X	INX
	INX
ADD10X	INX
	INX
ADD8X	INX
	INX
ADD6X	INX
	INX
	INX	; ADD4X and less shorter in-line
	INX
	INX	
	INX
	RTS

* For small decrements <= 16
SUB16X	DEX
	DEX
SUB14X	DEX
	DEX
SUB12X	DEX
	DEX
SUB10X	DEX
	DEX
SUB8X	DEX
	DEX
SUB6X	DEX
	DEX
	DEX	; SUB4X and less shorter in-line
	DEX
	DEX	
	DEX
	RTS

Just jump to the label for the offset you need to add or subtract.

I know it looks ... ugly. But it works, and it avoids the use of pseudo-registers, and it's fast, and it actually doesn't use up more code space than the general routines we've looked at. These are worth considering.

And you're thinking, well, that's not going to work for the return stack?

Hah!


    * For small decrements, 8 <= decrement <= 16
* Uses X for return
* Note that this writes the return address in the upper portion of the allocation,
* which may not be desirable.
SUB14S	TSX
	LDX	0,X
	BRA	ISB14S
SUB12S	TSX
	LDX	0,X
	BRA	ISB12S
SUB10S	TSX
	LDX	0,X
	BRA	ISB10S
SUB8S	TSX
	LDX	0,X
	BRA	ISB8S
SUB16S	TSX
	LDX	0,X
ISB16S	DES
	DES
ISB14S	DES
	DES
ISB12S	DES
	DES
ISB10S	DES
	DES
ISB8S	DES
	DES
	DES	; SUB7S and less are shorter in-line
	DES
	DES	
	DES	; two less because of the return address
	JMP	0,X

* For small increments, 8 <= increment <= 16
* Uses X for return
ADD14S	TSX
	LDX	0,X
	BRA	IAD14S
ADD12S	TSX
	LDX	0,X
	BRA	IAD12S
ADD10S	TSX
	LDX	0,X
	BRA	IAD10S
ADD8S	TSX
	LDX	0,X
	BRA	IAD8S
ADD16S	TSX
	LDX	0,X
IAD16S	INS
	INS
IAD14S	INS
	INS
IAD12S	INS
	INS
IAD10S	INS
	INS
IAD8S	INS
	INS
	INS	; ADD7S and less are shorter in-line
	INS
	INS	
	INS
	INS
	INS
	INS	; two more to cover the return address
	INS
	JMP	0,X

What's that? Do I hear complaints about the smell.

It's ugly, but it could be useful.

Stacks allocated entirely within a 256-byte page

Finally, if we are talking about stacks (and other largish things in memory), it may be possible to arrange them in memory so that the stacks lie completely within a single 256 byte page, such that the high byte of address does not change. This particular trick was used to great effect on the 6502 and 6805, in particular.

We can use it on the 6800 in some cases, if we can be absolutely sure that everybody who ever touches the code is aware of the requirement to keep each stack entirely within a single page.


    * Stacks within page boundaries:
* Pseudo-registers somewhere in DP:
PSP	RMB	2
XOFFSV	RMB	2
XOFFB	RMB	1
SOFFB	RMB	1
SOFFSV	RMB	2
	...
	ORG	$500	; or something
	RMB	4	; buffer zone
PSTKLIM	RMB	64
PSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* For parameter stack:
ADBPSX	STX	PSP
ADBPSP	ADDB	PSP+1	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS
*
SBBPSX	STX	PSP
SBBPSP	STAB	XOFFB
	LDAB	PSP+1
	SUBB	XOFFB	; Stack allocated completely within page, never carries.
	STAB	PSP+1
	LDX	PSP
	RTS

* For return stack:
ADBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	ADDB	SOFFSV+1	; Stack allocated completely within page, never carries.
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X

SBBSP	TSX
	LDX	0,X	; return address
	ADDB	#2		; faster, same byte count
	STS	SOFFSV
	STAB	SOFFB
	LDAB	SOFFSV+1
	SUBB	SOFFB		; Stack allocated completely within page, never carries
	STAB	SOFFSV+1
	LDS	SOFFSV
	JMP	0,X	; return via X

Again, I have not tested the code. It should run. I think.

As a reminder, we've already seen what code looks like without stack frames. The only reason I'm showing you this stuff is so that you understand why stack frames may not be preferred for many applications (and, if you can understand that, maybe you can sometime see it for all applications).

Well, no, not the only reason. Maybe the only reason I'm showing it to you now rather than later.

[JMR202411020931 addendum:]

This is not stack frame related, but it's address math related, and I think it would be good to discuss it here, lest I forget --

There are two approaches to per-process variables.

Pseudo-registers like PSP, XWORK, XOFFSV, SOFFSV, etc. will either be saved and restored on process switch or will have separate versions for each task, if there are not too many.
Most per-process variables with global allocation should be in a per-process address space.

You'll usually use both, a few pseudo-registers for variables that need quick access, and they need to just a few to keep the management overhead on task/process switch to a minimum. Every pseudo-register must be saved and restored on process switch --

Except for a couple of special cases,

It's useful to keep system pseudo-registers separate from non-system pseudo-registers, complete with separate routines to manage them.
If there are just a few non-system processes in a small hardware application, it may be useful to give each process its own pseudo-registers, along with the routines to manage them.

What kinds of things need to be pseudo-registers?

XWORK and other such temporaries, including SOFFSV and such above.

And PSP, as well. (Note that, if the system functions use a parameter stack, it should be a separate SPSP or something, which would have to have its own support routines.)

If there are a lot of per-process variables, you would need, separate from pseudo-registers, a process-local space. And you would need a pointer to that space, with routines to access the variables there:


    * In the DP somewhere, we have a base pointer for the per-process
* variable space, and a working register, something like
	...
LOCBAS	RMB	2
LBXPTR	RMB	2
	...
*
* And, to get the address of variables in the per-process variable space,
* something link these --
ADDLBB	CLRA			; entry point for the byte offset in B
ADDLBD	ADDB	LOCBAS+1	; entry point for larger offsets in A:B
	ADCA	LOCBAS
	STAA	LBXPTR
	STAB	LBXPTR+1	; let other code load X
	RTS
*
ADDLBX	BSR	ADDLBB	; and load X
	LDX	LBXPTR
	RTS
*
ADDLDX	BSR	ADDLBD	; and load X
	LDX	LBXPTR
	RTS

[JMR202411020931 addendum end.]

With all this in mind, look at how the 6801's enhanced instruction set can make some of the above code much less intransigent before we take a look at a concrete example of stack frames on the 6800.

Or you can jump ahead to getting numeric output in binary.

(Title Page/Index)

joel's programming fun

Wednesday, October 30, 2024

ALPP 02-20 -- Some Address Math for the 6800

Some Address Math
for the
6800

Stacks allocated entirely within a 256-byte page

No comments:

Post a Comment

Wednesday, October 30, 2024

ALPP 02-20 -- Some Address Math for the 6800

Some Address Mathfor the6800

Stacks allocated entirely within a 256-byte page

No comments:

Post a Comment

Some Address Math
for the
6800