Sunday, November 3, 2024

ALPP 02-24 -- Some Address Math for the 68000

  Some Address Math
for the
68000

(Title Page/Index)

After a break for multi-byte negation, because address math is so important, I think I should show you explicit 68000 corollaries for what I've shown you for the 6809, as well as the routines  for the 6801 and for the 6800.

When instructions become more general, they often take more bytes to encode. This is especially clear for the 68000. And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes. 

More than speed and byte count, this is why we define utility routines like we just looked at for the 6800 and 6801. We don't want to give ourselves too many opportunities for mistakes. (Macros can help with this, but we won't talk about that just yet.)

Between the 6809 and the 68000, it can be kind of a wash -- when you're working on 16-bit numbers and small applications that fit in a 64K memory space. When you start working with 32-bit numbers, it's advantage 68000, ... except then you also tend to work with 32-bit addresses, and the addresses can make byte count swell. 

I transliterated the fig implementation of Forth from 6800 to 68000, and the object image size increased by about 80% (real rough estimate). This is because I didn't want to restrict it to operating in the lower 32K of memory, minus the interrupt vector table, so the virtual machine i-codes (function addresses, really) swelled from 16-bit to 32-bit. And since the Forth is mostly a clot of i-codes, the overall image size swells. 

I started a conversion to direct call, which I got lost in (partly motivating this tutorial), and the code size does seem to improve a bit, but not completely to the size of the 6809 image.

Do look at assembly listings when you try to compare code sizes for stuff. In particular, the 68000 will often seem to take about twice the code bytes that the 6809 takes in these snippets. But when we move to concrete code where pieces come together, the code size comes down closer to the 6809 code size.

And I'll note again, being able to use single instructions instead of utility routines is nice, but it's actually more important that the 68000 has something of an optimal number of registers, so we don't have to worry about pseudo-registers in memory when switching processes.

As always, read the code and the comments in the code, and open up separate browser windows and compare side-by-side.

I'm showing the entire 68000 code in a single block because the abstract operations don't quite map the same, but I'm keeping the order roughly the same to keep it easy to find what to compare. 

[JMR202411070913 addendum:]

You may have missed the mention of the "here pointer" in the 6809 address math chapter:

LOCBAS	EQU	*

In Motorola assemblers, an asterisk where the assembler could parse an address means the location of the current instruction or directive, thus, "here". I'll need to explain more about it later, I'm sure.

[JMR202411070913 addendum end.]

How registers are mapping when moving from 6809 to 68000 --

  • I'm mapping the 6809's S to A7, of course;
  • U to A6;
  • DP will map to A5;
  • X mostly to A0;
  • Y to whatever.
  • B is sort-of mapped to D7;
  • A is sort-of mapped to D6 or the top bytes of D7 or D5 or something, depending on what I need it to do.

(And please don't just copy-and-paste code without thinking.)

* 68000 pointer math

	ORG	SOMETHING
* All of these work fine in-line, rather than called as subroutines.
* In fact, unless specifically specified otherwise, you should in-line.
* You can substitute any data register unless specified otherwise.
*
* Likewise, you can substitute any address register,
* except that A7 should always be in-lined --
* -- except for those routines which specifically handle the return address, 
* but those routines are not really intended to be used anyway.
* Calling a subroutine and playing with the return stack 
* without handling the return address
* just is not a good way to keep control of your program.
*
* And then there is alignment. 68000 needs 16- and 32-bit accesses 
* to be 16-bit aligned, and will throw address errors if they are not.
* (Later CPUs are not so restricted.)
*
* Negate Dn in 8, 16, or 32 bits:
NEGLD7	NEG.L	D7	; .L => 32 bits, .W => 16 bits, .W => 8 bits
	RTS
* On the 6800/6801/6809, you can negate (2's complement) a byte 
* using a 1-byte instruction.
* On the 68000, it takes a 2-byte instruction.
* It takes 5 bytes of instruction to negate 16 bits on 6800/1/9,
* and 13 bytes to negate 32 bits.
* But on the 68000, it takes just two,
* the above 16-bit op-code with a couple of bits changed.
* This is a common pattern with 68000 instructions.
*
* And, for all the time I spend explaining NEG, 
* since the 68000 can subtract registers in either order, 
* we really don't need NEG here.

* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDBX	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS
* Alternative
ADDBXalt
	AND.W	#FF,D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSBX	EXT.W	D7
	ADD.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
* Alternative
ADSBXalt
	EXT.W	D7
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
*
* Unsigned byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBBX	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bits
	RTS

* Signed byte offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSBX	EXT.W	D7
	SUB.W	D7,A0	; 16-bit source sign extended to 32 bit An
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDWX	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A0	
	RTS
* Alternative
ADDWXalt
	AND.L	#FFFF,D7
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
*
* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADSWX	ADD.W	D7,A0	
	RTS
* Alternative
ADSWXalt
	LEA	(A0,D7.W),A0	; takes more bytes
	RTS
* Alternative
ADSWXalt2
	LEA	(A0,A1.W),A0	; takes more bytes
	RTS
*
* Unsigned 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBWX	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A0	
	RTS

* Signed 16-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SBSWX	SUB.W	D7,A0	
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
ADDLX	ADD.L	D7,A0	
	RTS
* Alternative
ADDLXalt
	LEA	(A0,D7.L),A0	; takes more bytes
	RTS
* Alternative
ADDLXalt2
	LEA	(A0,A1.L),A0	; takes more bytes
	RTS
*
* 32-bit offset
* Should in-line. Any data register, any address register.
* A7 must in-line (see below).
SUBLX	SUB.L	D7,A0	
	RTS
*


*************
* For the return stack
* As explained above, just in-line the LEA.
* These are provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
* Just in-line the EXT.W and ADD.W
ADSBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned byte offset
* Just in-line the AND.W and ADD.W
ADDBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* Signed 16-bit offset
* Just in-line the ADD.W
ADSWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.
*
* Unsigned 16-bit offset
* Just in-line the AND.L and ADD.L
ADDWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	ADD.L	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
* See above about LEA instead of ADD.

* 32-bit offset
* Just in-line the ADD.L
ADDLS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	ADD.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned byte offset
* Just in-line the AND.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.W	#$FF,D7	; zero extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Unsigned 16-bit offset
* Just in-line the AND.L and SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	AND.L	#$FFFF,D7	; zero extend it
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*
* Signed byte offset
* Just in-line the EXT.W and SUB.W
SUBBS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	EXT.W	D7	; sign extend it
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0
*
* Signed 16-bit offset
* Just in-line the SUB.W
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.W	D7,A7	; 16-bit source sign extended to 32 bits
	JMP	(A0)	; return via A0

* 32-bit offset
* Just in-line the SUB.L
SUBWS	MOVE.L	(A7)+,A0	; get return address, restore stack address
	SUB.L	D7,A7	; 32 bits
	JMP	(A0)	; return via A0
*

* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X	LEA 	16(A0),A0
	RTS
ADD14X	LEA	14(A0),A0
	RTS
SUB16X	LEA	-16(A0),A0
	RTS
* Etc. In-line these.
INX	LEA	1(A0),A0	; Sigh. In-line it. Do not make trains with it. Please.
	RTS
DEX	LEA	-1(A0),A0	; See INX. In-line it. Do not make trains with it. PLEASE.
	RTS
* Note that we can also use ADDQ and SUBQ for offset less than 9
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than for examples of silly walks.
ADD16S	MOVE.L	(A7)+,A0
	LEA	16(A7),A7
	JMP	(A0)
* etc.
* Could all be replaced with just LEA	16(A7),A7 in-line!
* That's actually cheaper than just the instruction JSR!!!


* Synthetic stacks restricted within page boundaries make no sense at all
* on the 68000. Except, I suppose they could, sort-of.
*
* In the first place,
* we should be able to use an extra address register to make a third stack.
* If we do, addressing has already been covered, above.
*
* But if we want a software stack maintained by pointers in memory,
* for some reason,
* Given a pseudo-register somewhere in process local variable space
* accessed via A5:
	ORG	SOMEWHERE
	...
QSP	DS.L	1	; a synthetic stack pointer Q
* QSP-LOCBAS has to be within +/-32K on 68000, 2-byte op-code, 2-byte offset, syntax: QSP-LOCBAS(A5)
* 68020 and above allows 32-bit range, 4-byte op-code, 4-byte offset, syntax: (QSP-LOCBAS,A5)
	...
	DS.L	2	; buffer zone
QSTKLIM	DS.L	32
QSTKBAS	DS.L	2	; buffer zone
	...

* 32-bit Dn for synthetic stack (could/should be in-line):
ADDQSP	ADD.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte D7
ADDQSPS	ADD.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
ADDQSPW	ADD.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
*
* 32-bit Dn for synthetic stack (could/should be in-line):
SUBQSP	SUB.L	D7,QSP-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 256-byte page boundary
* so that carries cannot be generated:
* unsigned byte
SUBQSPS	SUB.B	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS
* If QSTKLIM-8 to QSTKBAS+7 are within an even 65536-byte page boundary
* so that carries cannot be generated:
* unsigned 16-bit D7
SUBQSPW	SUB.W	D7,QSP+3-LOCBAS(A5)	; 4 bytes in op-code (+/-32K)
	RTS

* 68000 has no memory indirection
QPSHD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	D7,-(A4)		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPSHD7LI
	SUBQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (SUBQ.W would be faster for medium stack)
	MOVE.L	D7,([A4])		; 4 bytes in op-code
	RTS
*
QPOPD7L	MOVE.L	QSP-LOCBAS(A5),A4	; 4 bytes in op-code
	MOVE.L	(A4)+,D7		; 2 bytes in op-code
	MOVE.L	A4,QSP-LOCBAS(A5)	; 4 bytes in op-code
	RTS
*
* 68020+ have memory indirection
QPOPD7LI
	MOVE.L	([A4]),D7		; 4 bytes in op-code
	ADDQ.L	#4,QSP-LOCBAS(A5)	; 4 bytes in op-code (ADDQ.W would be faster for medium stack)
	RTS


* Register offsets from A7 were dealt with above.

* Lest I forget --
* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* On the 6809, it could be done by pseudo-register or (with some glue) by DP.
* On the 68000, we are going to use a spare address register,
* and I am going to pick A5.
* All the address math has been shown above,
* the only issue is being explicit about the assembly language idiom.
* Lest I forget --
*
* Given 
	ORG	Whatever
LOCBAS	EQU	*
*	...
VAR	DS.B	m	; or .W or .L, etc.
*
* With A5 known to be set to LOCBAS,
	LEA	LOCBAS(PC),A5
* or
	MOVEA.L	#LOCBAS,A5
*
* In-line snippets --
* For variable VAR within 256 bytes of LOCBAS:
	...
	LEA	VAR-LOCBAS(A5),A0	; that's all! (4-byte op-code)
	...
*
* When VAR is 256 bytes or more away from LOCBAS, but less than 32768
* (or, even, below LOCBAS but within -32768), in other words, signed 16-bit offset:
	...
	LEA	VAR-LOCBAS(A5),A0	; same thing!
	...
*
* It's a little messier when the signed offset doesn't fit in 16 bits, 
* less than -32768 below, or 32768 or greater above --
	...
	MOVE.L	#VAR-LOCBASE,D7		; Any Dn. An will also work, if it's not in use. 6 bytes.
	LEA	(A5,D7.L),A0		; 4 bytes. total 10 bytes. 
	...
*
* From the 68020 on, 32-bit offsets are allowed, but the op-code is also 32-bits plus displacement:
	...
	LEA	(VAR-LOCBASE,A5),A0	; 8 byte total op-code
	...
* 
* Do I really need to show this as subroutines?
* signed 16-bit offset in D7:
LEALBWX	LEA	(A5,D7.W),A0	; PLEASE just do this in-line!
	RTS
*
* 32-bit offset in D7:
LEALBLX	LEA	(A5,D7.L),A0	; PLEASE just do this in-line!
	RTS
*			;-/
* 
* I assume you're not going to be wanting to keep LOCBAS
* in a pseudo-register called LB_BASE.
* But you might want to maintain a separate allocation area
* with a pointer in AL_BASE, like this:
LOCBAS	EQU	*
	...
AL_BASE	DS.L	1
	...
* for signed 16-bit offsets in D7: 
ADDLBW	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.W	D7,A0	; or LEA (A0,D7.W),A0
	RTS
* for unsigned 16-bit offsets:
ADDLBU	AND.L	#$0000FFFF,D7	; unsigned offset
* for 32-bit offsets
ADDLBL	MOVE.L	AL_BASE-LOCBAS(A5),A0
	ADD.L	D7,A0	; or LEA (A0,D7.L),A0
	RTS
*
* 68020 and above allow you to do weird things like this --
	...
	LEA	([AL_BASE-LOCBASE,A5],D7.L),A0
*	...					;  8-o
* ... quite literally letting you index directly off that pseudo-register
* out there in memory.
*
* As near as I can tell,
* memory indirect modes all require an address register,
* or the PC. 
* But that's not so bad, other than some of the modes being overkill.
*
* And, in spite of my mugging, maybe this has been a good way
* to expand your grasp of the power of the 68000 addressing modes.

* Sorry about the mugging. Sort-of. ;-/

As you can see, the 68000 just basically does almost all the address math you need without subroutines.

Including, to some extent, arrays, but let's not go there yet.

As with the previous three chapters, I have not tested the code. It should run, modulo typos.

The 68000 is hard to wrap your head around. I know. If the above doesn't make sense yet, it's okay. I'll point you back here from time to time when we are working with more concrete examples of using the above

Look at how I've been avoiding this. I think it's time to build a concrete example of stack frames on the 6801.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)


 

 

 

 

Saturday, November 2, 2024

ALPP 02-23 -- Synthesizing Multibyte NEGate on 6809 (Applies to 6800 and 6801)

  Synthesizing Multibyte NEG
on 6809
(Applies to 6800 and 6801)

(and 6805, with modifications, but we won't talk about that)

(Title Page/Index)

The fact that the NEGD routine effectively does not change from the 6800 to the 6809 had me looking at the 68000's NEGX instruction and NOT instruction and scratching my head as to why there was no NOTX instruction and the reasons for the rules for generating the X bit for the NEG, NEGX, and NOT instructions, and I started losing confidence in the NEGation sequence I have been using in my 6800/6801 and 6809 work:

NEGAB	COMA	; 2's complement NEGate is bit COMplement + 1
	NEGB
	BNE	NEGABX	; or BCS. but BNE works -- extends 0
	INCA
NEGABX	RTS

The theory is pretty straightforward. Radix complement negation is to subtract the number from the radix (or radix raised to the power of the number of columns) and remember your borrow (carry). 

Bit complement is 1's complement, or reduced radix complementr.

So 2's complement is 1's complement plus 1. 

And then you just carry any borrow from the first column as far as it carries. 

No, there's something wrong with that description, which is what had me going.

I guess it's more accurate to say the lack of carry carries until you get a non-zero column? As soon as you get a carry, everything to the left becomes 1's complement. Because the carry means borrow. And 1 - 1 is zero, but 0 - 1 is 1. Or something.

[JMR202411050852 clarification:]

The carry generated on the NEG instruction in Motorola CPUs is the borrow from the (virtual) subtraction from zero. This is the inverse of the carry from adding one.

When you subtract from zero, there's going to be a borrow for any non-zero operand.

When you add 1 to the 1's complement (bit inverse), the carry is only going to generate when the result of the add is zero -- which is exactly the same as when the argument (operand) is zero.

So when there is no borrow from the NEG is when there is carry from the add, and you can stop when there is no carry from the add, which, for me anyway, confirms the reasoning behind stopping when you no longer get 0 as a result.

[JMR202411050852 clarification end.]

Anyway, I was getting lost in something, so I came up with a little routine to test every possible result against straight-out subtracting from 0:

	org $2000
start	ldu #$2100
	ldd #$5a5a
	std ,--u
	ldd #0
	std ,--U
	std ,--u
testl	ldd 2,u
	std ,u
	ldd #0
	subd 2,u
	com ,u
	neg 1,u
	bne testni
	inc ,u
testni	cmpd ,u
	bne teste
	ldd #1
	addd 2,u
	std 2,u
	bne testl
	std 4,u
	nop
	nop
teste	ldd 4,u
	nop
	leau 6,u
	nop

Set a breakpoint at teste, step through the loop a couple of times, and let it rip, and if the value at the top of the U stack is cleared when you're done, every value worked correctly.

Now, I keep saying something about BCS, so I can test that, as well. Just change the

	bne	testni

to

	bcs	testni

and let'r rip again.

And it works.

And som'eres in there, it hit me like the proverbial ton of bricks going down again, bit complement (1's complement in a binary field) does not carry. So the NOT instruction is it's own extending form. And, yes, you prime the NEGX loop on the 68000 with a straight NEG.

... yeah, and I guess I'm not having a smart brain day today or something ...

Well, so, here's a 4-byte negate on 6809:

* negate the 32-bit number on top of stack:
NEG32	COM	,U
	COM	1,U
	COM	2,U
	NEG	3,U
	BNE	NEG32X
	INC	2,U
	BNE	NEG32X
	INC	1,U
	BNE	NEG32X
	INC	,U
NEG32X	RTS

It ought to work. 

Back to the regularly scheduled programming, as soon as I finish figuring out what instruction and addressing combinations on the 68000 are relevant to what I'm demonstrating.


(Title Page/Index)


 

 

 

 

Friday, November 1, 2024

ALPP 02-22 -- Some Address Math for the 6809

  Some Address Math
for the
6809

(Title Page/Index)

Maybe it feels like going around in circles, but address math is so important that I think I should show you explicit 6809 corollaries for the utility address math routines I've shown you for the 6801 and for the 6800.

When instructions become more general, they often take more bytes to encode.  And when you generalize an operation, it often takes more instructions to implement -- even with a more powerful instruction set CPU. And the more you repeat those multiple instructions, the more opportunity you have to make mistakes. 

This is why we define utility routines like we just looked at for the 6800 and 6801.

But in practice with the 6809, this is not usually the case. 

To get a sense of how the size is affected in real code, you will want to compare these examples I give to the concrete examples I have given -- and give later -- for the other processors.

As much as having actual instructions to do the work for you improves things, the more important improvement is eliminating almost all need for pseudo-registers that have to be managed when switching processes.

Remember to read the code and the comments in the code, and open up separate browser windows to compare side-by-side with the 6800 and 6801. Reading code is important.

Let me say it again: 

No need for pseudo-registers on either the 6809 or 68000!

Unless you really want to synthesize a third stack or something on the 6809. 

Almost -- That's modulo per-process global variables, depending on how you handle them. And modulo some use of stack as temporaries instead of pseudo-registers, because stack is just a better place for temporaries, and is so easily accessed on the 6809.

Let's look at the 6809 code. 

You'll (hopefully) notice that mapping the abstract operations to the 6809 works out somewhat different than for the 6800 and 6801. So I'm showing the 6809 code in a single block and relying more on comments in the code. The order of presentation is roughly the same, so it should be easy enough to find what to compare with what. 

One of the reasons I demonstrate an alternate way to NEGate the Double accumulator is to demonstrate a very useful way to use the stack to avoid using temporary variables in memory. (I guess I need to go back and make this explicit in the 6801 and 6800 address math chapters.)

Do not miss the fact that the 6809 has four indexable registers, and all the address math instructions work for all four indexable registers -- where the routines may not! Where I say in-line, that means just use the instructions rather than calling the routines.  

[JMR202411070913 addendum:]

I don't think I've explained the "here pointer" symbol and idiom yet:

ESPHIB	EQU	*

In Motorola assemblers, an asterisk where the assembler could parse an address means the location of the current instruction or directive, thus, "here". I will have to explain it further later.

[JMR202411070913 addendum end.]

(If you're wondering, fix the mnemonics for the required register -- LEAX for X, LEAY for Y, LEAU for U, LEAS for S, etc. And don't forget the addressing mode index registers. And, no, don't include the RTS at the end when you're inserting the code in-line. 8-/ I know you caught all that, but some people just copy-and-paste without thinking.)

* 6809 pointer math
*	ORG	$80
*	...
*XOFFA	RMB	1 ; don't need these at all
*XOFFB	RMB	1
*XOFFSV	RMB	2
*	...

	ORG	SOMETHING
* All of these work fine in-line, rather than called as subroutines
*
* Two ways to negate D on the 6809:
NEGD	COMA		; 6800 version -- still no NEGD
	NEGB            ; and sign extending doesn't help.
	BNE	NEGDX	; or BCS. but BNE works -- extends 0
	INCA
NEGDX	RTS
*
NEGDS	PSHS	D	; slightly slower, uses stack
	LDD	#0
	SUBD	,S++
	RTS
*
* Unsigned byte offset
* Absolutely should in-line. X only.
ADDBX	ABX	; X only
	RTS
*
* For unsigned byte offset other than X, zero extend B into A
* Destroys A.
* Should in-line for Y or U. Should use ABX for X. Must in-line for S.
ADDBY	CLRA	; for Y/U/S, zero extend B for unsigned offset
	LEAY	D,Y
	RTS
*
* Signed byte offset
* Should in-line for X, Y or U. Must in-line for S.
ADSBX	LEAX	B,X	; sign extended B, Y/U/S also
	RTS
*
* Signed byte offset
* Should in-line for X, Y or U. Must in-line for S.
SBSBX	NEGB		; signed subtract B, Y/U/S also
	LEAX	B,X
	RTS
*
* Unsigned byte offset, zero extend A
* Destroys A
* Could in-line for X, Y or U. Must in-line for S.
SUBBX	CLRA	; B is unsigned, therefore positive
* 16-bit offset, must in-line for S.
SUBDX	COMA		; no NEGD
	NEGB
	BNE	ADDDX	; or BCS. but BNE works -- extends
	INCA
* 16-bit offset, must in-line for S
ADDDX	LEAX	D,X	; Y/U/S also
	RTS

* Alternatively, use D for explicit subtraction
* Here as an example of math that can be done,
* probably not as a useful subroutine.
SUBBXS	CLRA	; B is unsigned, destroys A
SUBDXS	PSHS	D	; for subtraction
	EXG	X,D	; X to subtract, save D
	SUBD	,S++	; do the subtraction
	EXG	X,D	; Offset result to X, restore D
	RTS

* No particular reason to try to use ABX in signed byte offset.
* This is a solution to a puzzle, not useful code.
* You don't really want to do this.
ADDSBX	TSTB
	BPL	ADDSBXA
	LEAX	B,X	; Absolutely no reason not to use this in the first place.
	RTS
ADDSBXA	ABX
	RTS

*************
* For S stack
* As mentioned above, just in-line the LEAS.
* These are also provided as a solution to a puzzle,
* not as useful code.
*
* Signed byte offset
ADSBS	PULS	X	; get return address, restore stack address
	LEAS	B,S	; you really could just in-line this.
	JMP	,X	; return via X
*
* Unsigned byte offset, zero extend A, destroys A, X
ADDBS	CLRA		; just in-line the CLRA and the LEAS D,S
* 16-bit offset
ADDDS	PULS	X	; get return address, restore stack address
	LEAS	D,S
	JMP	,X	; return
*
* Do you really want to do this?
* Unsigned byte offset, zero extend into A, destroys A
SUBBS	CLRA
SUBDS	COMA
	NEGB
	BNE	ADDDS	; or BCS. but BNE works -- extend
	INCA
	BRA	ADDDS	; let ADDDS handle the return address and the math
* Do the math in D for explicit subtraction
* No more useful than the rest of this for X.
* Here just as an example of math that can be done.
SUBBSS	CLRA	; B is unsigned, destroys A
SUBDSS	LDX	,S	; get return address
	STD	,S	; save D
	TFR	S,D	; get S without endangering the stack
	ADDD	#2	; adjust for having D on the stack
	SUBD	,S	; finally subtract the offset
* Alternative 1, leaves D destroyed
	TFR	D,S	; update stack pointer
	JMP	,X	; return via X
* Alternative 2, restores offset in D
	PSHS	D	; working realllllly hard not to destroy D.
	LDD	2,S	; got the offset
	LDS	,S	; update S
	JMP	,X

* INX and DEX trains and INS and DES trains are meaningless.
* HOWEVER, just to remind ourselves:
* (And all of these work for Y and U, too but IN-LINE them!!)
* (They work for S if in-lined, as well.)
ADD16X	LEAX	16,X
	RTS
ADD14X	LEAX	14,X
	RTS
SUB16X	LEAX	-16,X
	RTS
* Etc. In-line these.
INX	LEAX	1,X	; Sigh. In-line it. Do not make trains with it. Please.
	RTS
DEX	LEAX	-1,X	; See INX. In-line it. Do not make trains with it. PLEASE.
*
* More solutions to puzzles.
* If you called these, you would have to juggle the return address as shown.
* You don't want to do that.
* Just in-line the LEAS instructions.
* Then there's no return address to juggle, no messing with X.
* DO NOT USE THIS CODE other than examples of silly walks.
ADD16S	PULS	X
	LEAS	16,S
	JMP	,X
* etc.
* Could all be replaced with just LEAS	16,S; in-line!
* That's actually cheaper than just the instruction JSR!!!


* And stacks restricted within page boundaries make no sense at all on the 6809.
* Pseudo-register somewhere in DP:
QSP	RMB	2	; a synthetic stack pointer Q
	...
	ORG	SOMETHING
	RMB	4	; buffer zone
QSTKLIM	RMB	64
QSTKBAS	RMB	4	; buffer zone
SSTKLIM	RMB	32
SSTKBAS	RMB	4	; buffer zone
	...

* signed B for synthetic stack:
ADBQSP	LDX	QSP
ADBQSX	LEAX	B,X	; does the whole pointer, negatives, too
	STX	QSP
	RTS
*
* unsigned B and D for synthetic stack:
ADUQSP	CLRA		; unsigned B entry point
ADDQSP	LDX	QSP
ADDQSX	LEAX	D,X	; does the whole pointer, negatives, too
	STX	QSP
	RTS
*
* Choose whether you want to negate D or move it around, and see above.
* Or just decide you can add a negative instead of subtracting
*
* Destroys A
SBSQSP	SEX	; sign extend B into A (Yes, that's the mnemonic.)
	BRA	SBDQSP
SBUQSP	CLRA	; B is unsigned, therefore positive
* 16-bit offset
SBDQSP	COMA		; no NEGD
	NEGB
	BNE	ADDQSX	; or BCS. but BNE works -- extends
	INCA
	BRA	ADDQSX

* Alternatively, use D for explicit subtraction
SBSQSPS	SEX	; sign extend B into A (Yes, that's the mnemonic.)
	BRA	SBDQSPS
SBUQSPS	CLRA	; B is unsigned, destroys A
SBDQSPS	PSHS	D	; for subtraction
	LDD	QSP	; Get things in the right place
	SUBD	,S++	; do the subtraction
	STD	QSP	; update
	RTS

* More stuff that there is no reason to do.
* Just in-line the LEAS B,S
ADBSP	PULS	X	; return address
	LEAS	B,S	; signed B, but full 16-bit address math.
	JMP	,X
*
* Just in-line the LEAS D,S
* D for return stack (but we saw this above):
ADDSP	PULS	X	; return address
	LEAS	D,S
	JMP	,X
*
* Just in-line the NEGB	and LEAS B,S, Still cheaper than the call.
* signed B for return stack:
SBBSP	PULS	X	; return address
	NEGB
	LEAS	B,S	; full 16-bit address math
	RTS
*
* This one might be worth a routine for,
* if you actually have to do it.
* D for return stack (but we saw this above):
SBDSP	PULS	X	; return address
	COMA
	NEGB
	BNE	SBDSPM
	INCA
SBDSPM	LEAS	D,S
	JMP	,X
* or
SBDSPS	LDX	,S	; return address	
	STD	,S	; offest
	TFR	S,D
	ADDD	#2	; adjust it
	SUBD	,S
	TFR	D,S
	JMP	,X

As you can see, the 6809 just basically does almost all the address math you need without subroutines.

Uhm, until we get to arrays, but let's not do that yet.

[JMR202411031752 correction:]

In the comments to the code, I suggested (or asserted?) that there would be no reason on the 6809 to allocate a stack entirely within a single page so that the stack pointer math would never overflow, and the increment and decrement could be handled with the INC and DEC instructions only, ignoring overflow.

On my way to bed last night, I realized that would not entirely be true.

Pointer variables in the direct page cannot be indirected without loading the variable into an index register. So if your top of stack pointer is process local, there would be no point in not using the auto-inc/dec modes and LEA instructions to do the index updates.

But if the synthesized stack or queue is global to all processes (such as a system resource allocation stack or queue), it may be reasonable to use absolute (extended mode) addressing, in which case memory indirection is available. In that case, it may be completely sensible to use the optimization of no-overflow INC or DEC in a stack or queue allocated entirely within a single page:

* A synthetic stack contained entirely in a page,
* using absolute (extended mode) addressing:
	ORG	$400	; anywhere that ESPLOB to ESPHIB-1 are all within a page
ESPLOB	RMB	4	; bumper, lowest related address
ESPLIM	RMB	64	; 32 2-byte items possible on stack
ESPBAS	RMB	4	; bumper
ESPHIB	EQU	*	; highest related address (plus 1)
	...
ESP	RMB	2	; only the low byte will change
	...
EPSHD	DEC	ESP+1	; stack all within a page!
	DEC	ESP+1	; no carry
	STD	[ESP]	; indirection
	RTS
*
EPOPD	LDD	[ESP]	; indirection
	INC	ESP+1	; stack all within a page!
	INC	ESP+1	; no carry
	RTS
*
ADDBESP	ADDB	ESP+1	; signed
	STB	ESP+1	
	RTS
*
SUBBESP	PSHS	B	; unsigned
	LDB	ESP
	ADDB	,S+
	STB	ESP
	RTS

Hopefully, I can devote a chapter or three to giving this proper treatment somewhere down the road.

[JMR202411031752 correction end.]

Oh, and I have mentioned, I think, the DP register, how it isn't as fully supported as I'd have liked

The DP can be used as a base for per-process global variables (in other words, variables local to the process, but globally/statically allocated within the process). I discussed this to a certain extent in the 6800 addressing math chapter.

* On the 6800 or 6801, this would be reference by a process-local
* LOCALBASE or similar pseudo-register, which I almost forgot to talk about.
* How to get the effective address of a variable in DP:
* Instead of 
*	LEAX	<VAR
* or
*	LEAX	VAR,DP
* or even 
*	LEAX	VAR-DPBASE,DP
* which we do not have in the 6809,
* we can do this --
*
* Given 
	ORG	$nn00		; even 256-byte page address
	SETDP	$nn
DPBAS	EQU	*
*	...
VAR	RMB	m
*
* In-line snippets --
* For variable VAR within 256 bytes of DPBAS:
	...
	LDB	#VAR-DPBASE	; put the offset in DP in B (unsigned)
	TFR	DP,A		; pull the base address high byte into A
	TFR	D,X		; move it to X
	...
*
* Using DP when VAR is 256 bytes or more away from DPBAS:
	...
	TFR	DP,A		; pull the base address high byte into A
	CLRB			; make the full base address
	ADDD	#VAR-DPBASE	; add the offset
	TFR	D,X		; move it to X
	...
*
* Or, if the assembler lets us split the offset up with advanced math:
	...
	TFR	DP,A
	LDB	#(VAR-DPBASE)&$FF	; bit-and mask -- no carry!
	ADDA	#(VAR-DPBASE)/$100	; add the high byte
	TFR	D,X
	...
* 
* As subroutines --
* unsigned offset in B:
LEADPUX	TFR	DP,A		; pull the base address high byte into A
	TFR	D,X		; move it to X
	RTS
*
* unsigned offset in D:
LEADPDX	TFR	DP,A		; pull the base address high byte into A
	CLRB			; make the full base address
	ADDD	#VAR-DPBASE	; add the offset
	TFR	D,X		; move it to X
	RTS
*
* Because DP is not in the index post-byte,
* in some applications, it may be better to keep 
* LOCBAS as a pseudo-register,
* in which case it would look like this --
* for small offsets < 128: 
ADDLBB	LDX	<LOCBAS	; but do this in-line!
	LEAX	B,X
	RTS
* for 127 < offset < 256, maybe, maybe not:
ADDLBU	CLRA		; unsigned offset
* for larger offsets
ADDLBD	LDX	<LOCBAS	; and definitely do this in-line, too!
	LEAX	D,X
	RTS	

As with the previous two chapters, I have not tested the code. It should run, modulo typos.

Even though I keep saying things like "in-line this", and "you don't need that", it may be hard to visualize the impact that 6809 addressing modes has on addressing math until we compare the stack frame code for the 6800 and 6801 to the stack frame code for the 6809.

Likewise the 68000. But let's get an overview of addressing math on the 68000 before we take a look at a concrete example of stack frames on the 6801. And on our way to addressing math on the 68000, let's take a detour for multi-byte negation on the 6809.

Or you can jump ahead to getting numeric output in binary.


(Title Page/Index)