joel's programming fun: September 2024

Sunday, September 29, 2024

ALPP 02-08 -- On the Beach with Parameters -- 16-bit Arithmetic on the 6800

On the Beach with Parameters --
16-bit Arithmetic
on the 6800

So we pretty much snuck the meat of the 16-bit arithmetic in already didn't we? We were passing byte parameters in and widening them in the last note and the previous two chapters, but we were doing 16-bit math.

Parameters. Oh, yeah. Those.

I wrote a couple of walls of text about parameters, and then decided I should show you code instead, or at least first. (Yes. Again.)

Let's define some library-style functions to add and subtract on the 6800, using the split stack parameter passing paradigm I keep talking about. Then I can philosophize a wall of text and maybe not put everyone to sleep.

We'll borrow this code from the improved Hello World examples, to declare the stack pointers and set the stacks up, and to push and pop both accumulators:


    	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOPD	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPUSHD	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS

We want to be able to load immediate values to the A:B pair. Some assemblers would allow us to load them something like this:


    VALUE	EQU	$1234
	...
	LDAA	#VALUE/256
	LDAB	#VALUE-VALUE/256

Some will even allow loading an address like this


    BUFFER	RMB	80	; text buffer
	...
	LDAA	#BUFFER/256
	LDAB	#BUFFER-VALUE/256

But the one we are presently using in EXORsim will not do either. -- at this time.

Even assemblers that allow the former may not allow the latter, under the assumption that addresses should never be divided or multiplied in legitimate code. Treating addresses like integers has traditionally been considered evidence of operator error on the programmer's part, and many assemblers will complain if you do.

We could go looking for an assembler that will do what we want, but for now we want a workaround. (And some people think the following run-time "syntactic sugar" makes code more "readable", anyway.)


    *
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load A:B immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.

What are you looking at me like that for? Yeah, this little bit of code to enable some syntactic sugar looks really strange when the concept of a return address is still fuzzy in your mind. And it seems so unnecessary. It takes space to define, the call takes as much space in code as the pair of immediate loads it replaces. WHY????

Well, if you study virtual machines like, for instance, the fig Forth run-time (the code for LIT), or Steve Wozniak's Sweet 16 VM that supplied 16-bit routines for some Apple II software, you recognize what it's doing. If you have a VM, it can be a way to save some bytes of object code, but what we're really trading is management time for runtime.

At a cost of a few cycles of (your) runtime, I can avoid the trouble of chasing done the bug in EXORsim, getting Joe H. Allen's attention, potentially discussing whether addresses should be allowed to have division done on them, etc., or, in the alternative, fixing it myself and forking the code like I did for the odd-ball EXORsim6801.

And you thought optimization was simple code size vs. speed. :)

Now we will define our addition and subtraction subroutines:


    * input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	ADDB	1,X	; right low
	ADCA	0,X	; right high, with carry
	STAB	3,X	; sum low
	STAA	2,X	; sum high
	INX		; adjust parameter stack
	INX
	STX	PSP
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
	INX		; adjust parameter stack
	INX
	STX	PSP
	RTS

If you're wondering whether the processor flags are correct after all that, only the Carry flag makes it through the stack pointer update unscathed. Moreover, if you're watching, you should notice that the Zero flag does not show whether the entire 16 bits of the result are zero, only one byte at a time, the high byte last here.

We can sort of fix the flags, something like this (untested):


    SUB16F	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
* In this version, we will set the flags almost as if it were SUBD:
	TPA		; save the flags
	ANDA	#$FB	; clear the Z flag in the copy
	STAA	0,X	; re-use this byte to save the copied flags
	ORAB	2,X	; OR low byte with high to set the correct Z flag
	TPA
	ANDA	#$04	; clear all but Z
	ORAA	0,X	; combine corrected Z with copied flags
	PSHA		; which is worse? return stack or DP?
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
	PULA		; get the flags back
	TAP		; replace the flags
	RTS

Wow, that's a lot of code! And we would want to test it thoroughly before using it for anything important. (It should work, but ...)

Pay particular attention to the order things are done:

We save the best copy of the flags.
Before we update the stack pointer, we borrow some of the stack space that is no longer in use to calculate what the flags should have been.
Then we save the corrected flags to the safest place we can think of.
Before updating the CPU flags, we update the stack pointer, so that updating the stack pointer will not thrash the flags we just calculated.
Then we get the flags back and restore them in the CPU.
RTS does not affect the flags. (This is a deliberate design decision by the CPU architects.)

So you can see how it could be done -- but we usually don't need all the flags corrected. (And, in fact, we didn't clear the Half-carry flag!)

I'll show some alternate approaches later.

Let's put this all together with some test code. You'll want to pay close attention to what happens in the CPU when it executes each of the new routines, but especially LD16I.


    * 16-bit addition and subtraction for 6800 on parameter stack, with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
PSP	RMB	2	; parameter stack pointer
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
PSTKLIM	RMB	64	; 16 levels of call at two parameters per call
PSTKBAS	RMB	2	; bumper space -- parameter stack is pre-dec
*
*
INISTKS	LDX	#PSTKBAS	; Set up the parameter stack
	STX	PSP
	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
PPOP16	LDX	PSP
	LDAA	0,X
	LDAB	1,X
	INX
	INX
	STX	PSP
	RTS
*
PPSH16	LDX	PSP
	DEX
	DEX
	STX	PSP
	STAA	0,X
	STAB	1,X
	RTS
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	ADDB	1,X	; right low
	ADCA	0,X	; right high, with carry
	STAB	3,X	; sum low
	STAA	2,X	; sum high
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
*
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDX	PSP
	LDAB	3,X	; left low
	LDAA	2,X	; left high
	SUBB	1,X	; right low
	SBCA	0,X	; right high, with borrow
	STAB	3,X	; difference low
	STAA	2,X	; difference high
	INX		; adjust parameter stack before restoring the flags
	INX
	STX	PSP
	RTS
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	ADD16	; result should be $E023
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	JSR	PPSH16
	JSR	SUB16	; result should be $58BE
	LDX	PSP
	LDAB	1,X	; load the result into A:B
	LDAA	0,X
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

If something doesn't work, go back and make sure you've copied everything correctly.

Once you've stepped through it, you might want to try other constants.

Before we move on to equivalent code for the 6801, let's compare how it would look with an interleaved (combined) parameter and return stack -- you know, the single stack discipline I keep disparaging. Here's a comparable test frame for the single stack:


    * 16-bit addition and subtraction for 6800 on return stack,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	95	; (64+31) roughly 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
*
*
INISTKS	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	TSX
	LDAB	5,X	; left low
	LDAA	4,X	; left high
	ADDB	3,X	; right low
	ADCA	2,X	; right high, with carry
ADD16S	STAB	5,X	; sum low
	STAA	4,X	; sum high
	LDX	0,X	; before we deallocate it
	INS		; drop return address
	INS
	INS		; drop right-hand addend
	INS
	JMP	0,X	; return
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	TSX
	LDAB	5,X	; left low
	LDAA	4,X	; left high
	SUBB	3,X	; right low
	SBCA	2,X	; right high, with borrow
	BRA	ADD16S	; Steal code.
* Could steal code this way in the parameter stack example, as well.
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	PSHB		; push in correct order
	PSHA
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	PSHB
	PSHA
	JSR	ADD16	; result should be $E023
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	PSHB
	PSHA
	JSR	SUB16	; result should be $58BE
	PULA
	PULB
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

On casual inspection and quick step-through, it looks simpler. It definitely runs faster.

Being able to use the processor's native PSH/PULA/B instructions instead of the PPUSH/PPOP routines definitely seems to be a plus.

But deeper inspection reveals some tricky games dodging the return address, games that, if you get them wrong, crash the program in amusing ways just when you really didn't want to be amused.

I know you don't believe me, but hold on to your doubts for a moment.

For further reference, here's a comparable set of routines and test code that uses a scratch area in the DP to pass values in and out. You could call this using direct page static globals as pseudo-registers:


    * 16-bit addition and subtraction for 6800 via scratch pad,
* with test code
* Joel Matthew Rees, October 2024
*
NATWID	EQU	2	; 2 bytes in the CPU's natural integer
*
*
* Blank line will end assembly.
	ORG	$80	; MDOS and EXbug docs say it should be okay here.
ENTRY	JMP	START
	NOP		; Just want even addressed pointers for no reason.
SSAVE	RMB	2	; a place to keep S so we can return clean
* parameter/scratch area for leaf functions only:
NLFT	RMB	2	; binary operator left side parameter
NRT	RMB	2	; binary operator right side parameter
NRES	RMB	2	; unary/binary operator result
NTEMP	RMB	2	; general scratch register for 
NPAR	EQU	NLFT	; unary operator parameter
NSCRAT	EQU	NLFT	; 
*
*
	ORG	$2000	; MDOS says this is a good place for usr stuff
NOENTRY	JMP	START
	NOP		; bump to aligned
	RMB	2	; a little bumper space
SSTKLIM	RMB	31	; roughly 16 levels of call, max
SSTKBAS	RMB	1	; 6800 is post-dec (post-store-decrement) push
	RMB	2	; a little bumper space
*
*
INISTKS	TSX		; point to return address
	LDX	0,X	; return address in X
	INS		; drop the return pointer on stack
	INS
	STS	SSAVE	; Save what the monitor gave us.
	LDS	#SSTKBAS	; Move to our own stack
	JMP	0,X	; return via X
*
*
* Don't need PPOP and PPSH
*
* Load a constant from the instruction stream into A:B, 
* continue execution after the constant.
* This is not self-modifying code, even though it feels like a trick
* and is playing with the return stack and instruction stream 
* in ways we wouldn't think we wanted to think we should.
* Call it a "necessary" bit of run-time syntactic sugar.
*
* Use it like this:
*	JSR	LD16I	; load D immediate
*	FDB	$1234	; "immediate" 16-bit value to load
*	JSR	SOMEWHERE ; or some other executable code.
*
LD16I	TSX		; point to top of return address stack
	LDX	0,X	; point into the instruction stream
	LDAA	0,X	; high byte from instruction stream
	LDAB	1,X	; low byte from instruction stream
	INS		; drop the return address we don't need
	INS
	JMP	2,X	; return to the byte after the constant.
*
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit sum
ADD16	LDAB	NLFT+1	; low
	LDAA	NLFT	; high
	ADDB	NRT+1	; low
	ADCA	NRT	; high, with carry
ADD16S	STAB	NRES+1	; sum low
	STAA	NRES	; sum high
	RTS
*
* input parameters:
*   16-bit left, right
* output parameter:
*   16-bit difference
SUB16	LDAB	NLFT+1	; low
	LDAA	NLFT	; high
	SUBB	NRT+1	; low
	SBCA	NRT	; high, with borrow
	BRA	ADD16S	; Steal code (5 bytes for 2)
* Could steal code this way in the parameter stack example, as well.
*
*
START	JSR	INISTKS
*
	JSR	LD16I
	FDB	$1234	; (FDB seems to want a comment.)
	STAB	NLFT+1
	STAA	NLFT
	JSR	LD16I
	FDB	$CDEF	; (FDB seems to want a comment.)
	STAB	NRT+1
	STAA	NRT
	JSR	ADD16	; result should be $E023
	LDAB	NRES+1
	LDAA	NRES
	STAB	NLFT+1
	STAA	NLFT
	JSR	LD16I
	FDB	$8765	; (FDB seems to want a comment.)
	STAB	NRT+1
	STAA	NRT
	JSR	SUB16	; result should be $58BE
	LDAB	NRES+1
	LDAA	NRES
	NOP
	NOP
* Repeat, without all the pushing and popping and jumping around:
	LDAB	#$34
	LDAA	#$12
	ADDB	#$EF
	ADCA	#$CD
	SUBB	#$65
	SBCA	#$87
*
DONE	LDS	SSAVE	; restore the monitor stack pointer
	NOP
	NOP		; landing pad

This probably looks even simpler.

It's not easy to see how complicated this becomes, how quickly, with a small test program like this, but it will become plain shortly (for some definition of shortly and some definition of plain).

And you should be looking at those six lines of code right before the DONE label and scratching your head.

All of that? Just to write the equivalent of the following?


    	LDAB	#$34
	LDAA	#$12
	ADDB	#$EF
	ADCA	#$CD
	SUBB	#$65
	SBCA	#$87

That is essentially what the test frame does. But, of course, we didn't write all of that just to write the test frame. We wrote it to allow us to do things well beyond what the test frame does.

And, in the cynical point of view, waste some of the applications' run-time cycles to reduce the design-time burden.

But, no, not just that. There are things you cannot reduce to constants at design- or compile-time.

Things will become a bit clearer after we get a look at this for the 6801, 6809, and 68000, after we get a look at reading keys from the keyboard, and maybe a bit of other prep so we can take up a simple project to prove that we can make something directly useful from assembly language.

In the meantime, let's get a look at how this looks on the 6801.

(Title Page/Index)

ALPP 02-XX -- Missed the Beach with Parameters -- 16-bit Arithmetic on the 6800, 6801, and 6809

This attempt at a chapter hit a wall of text and went somewhere south. Keeping it for records.
You probably want to go here, instead: https://joels-programming-fun.blogspot.com/2024/09/alpp-02-08-on-beach-with-parameters-16-bit-arithmetic-6800.html

On the Beach with Parameters --
16-bit Arithmetic
on the 6800, 6801, and 6809

(Title Page/Index)

So we snuck the 16-bit arithmetic in already didn't we? We were passing byte parameters in and widening them in the last note and the previous two chapters.

Parameters.

(Wall of text warning!)

Those were the numbers we were feeding in, via variables such as the ones labeled NLFT and NRT.

In software engineering, the parameters become, not just the numbers/values passed in to an automaton, but the variables that are the means of passing them in to the defined software function that implements an automaton.

A mathematical function is an abstract object that defines an abstract automaton's behavior, often invoking an abstract algorithm.

But a software function is a concrete implementation of a mathematical function, with the various sorts of limits that implementation imply. The implementation is via a procedural implementation of the algorithm, again with limits implied.

A mathematical parameter is a value which affects the operation of an automaton, including numeric or abstract values which are "input" into the automaton function.

A software parameter is also the device by which a particular parameter value is input, and we often include, since the output devices have a lot in common with the input devices, output devices as parameters.

In other words, result variables are similar to parameter variables in so many ways, it can be hard to distinguish them except by knowing that one is input and one is output, and it can be useful to consider them of the same class of object, objects used for moving data between parts of the code.

Scratch, or temporary, variables are also of the same class of object.

Hopefully, as we work through some of the descriptions of the mechanisms, I can make it clear which I'm talking about.

So far, except for the Hello World chapters, we've been allocating our parameter variables statically and globally. They are "static" because they persist from beginning to end, and before and after, really. And they are "global" because they are visible and known in all parts of the code.

When it's just a few functions, it's not hard to keep track of statically allocated globals, but, when it's hundreds or thousands of functions (or hundreds of thousands), tracking all those variables and their names can get a bit confusing. And the bytes of program space they use can get prohibitive.

And you can often have way too much fun finding out that you've given two (or more) variables the same name, so that when you think you're changing the parameters to one function, you're actually coincidentally changing the parameters to another at the same time. And either function or both misbehaves.

And, because all those variables consume all that RAM space, you tend to try to share variables between functions that you know shouldn't be in-flight at the same time, and then you forget, and fail to keep them from being in-process at the same time, and things get even more confusing.

So it's actually an important tool to have a means of defining parameter and variable names that don't need to be global as "local" to a function -- that is, you make their names invisible outside the function in which they are defined and used.Unfortunately, although some assemblers provide means for making definitions in some sense local, the syntax, and, too often, the semantics of those tools are not shared between the standard assemblers for each CPU.

If we keep our functions and our projects small, we don't actually need to have parameter names that the assembler sees at all. Sure, they can help when the definition mechanism doesn't get in the way, but we can work around not having them. Comments can suffice.

(With large projects where we can justify the heavy use of optimizers and mechanical correctness analysis, we will want named and characterized parameters, but we will not be tackling such large projects in this tutorial.)

Gack. I did not want to do one of these walls of text here. But we need it. One more point and then we can get back to code.

It's even more important to have a means of physically (so to speak) allocating and accessing parameters and variables that don't need to persist between invocations of a function, such that they don't even exist when the function is not in-process.

These parameters have traditionally been called "dynamic", since that seems to indicate their dynamic mode of existence. ("Ephemeral" just sounds a little too ghostly, I guess.)

And that is what we are going to talk about in this chapter.

In the Hello World chapters, we saw three ways to allocate and access parameters that are not static in persistence. Well, two and a half, maybe.

One is by passing them via CPU registers, and, if we need to keep them safe, we push them before we call other function routines, and pop them back when we're done. (This is the half-method.)

A second is by putting them directly on the same stack as the return addresses, being careful to keep the pushes and the pops balanced, and being just as careful not to overwrite the return addresses, or, at least, try to be careful.

Often, in our efforts to be careful, we construct something called a "stack frame", which is time-consuming and, well, clunky.

Clunky is not necessarily bad, but it can be, and often is.

A third is by passing them via a separate parameter stack.

Separate parameter stacks inherently avoid the conflicts with the return addresses, and you can even construct stack frames which are not clunky when you have a separate parameter stack.

The separate parameter stack requires additional maintenance, and many engineers eschew the additional maintenance as if the return stack didn't require maintenance.

Assuming the return address stack doesn't require maintenance is a very dangerous practice.

Conversely, maintaining the return address stack properly is about half-way to maintaining the separate parameter stack properly. It's a matter of keeping track of how many calls deep your code can go where and when, and how many bytes are needed at each level. It sounds intractible, but it isn't really.

Heh. The final point was another wall of text, and you're thinking I've forgotten about 16-bit arithmetic. But, since we've done that, I'm going to use getting them done properly as an opportunity to show how to pass parameters.

Ack. One more low wall-of-text. Sorry.

When a programmer wants to call a subroutine, he or she usually doesn't want to think too deeply about how the subroutine does its job. Just pass the parameters in the appropriate places, call it, and use the results.

On the other hand, the optimizer (whether mechanical or human) wants to know what's going on inside, make some efficiency judgements based on some given criteria, and decide whether to actually call the code or pull the non-parameter-handling code into the caller routine, in-line.

In high-level languages, this pulling code in is often called in-lining, but at the assembly language level, it's often called macro-expansion, because of something many assemblers have called macro definition (which we probably need to look at sometime).

In this chapter, I'm going to show how to define functions that implement 16-bit addition and subtraction, passing the parameters on the separate parameter stack. When we've looked at that, we can take quick detours to look at the other two approaches, which we will do in other chapters.

While we're here, subtraction on the 6800 is a bit more complicated, as I alluded to above, because we do, in fact, usually want to have the Z flag tell the whole story.

To see what I mean, let's convert the straight 16-bit addition source for the 6800 from above to subtraction:


    NLFT	FDB	132	; 132 in two bytes, high byte zero
NRT	FDB	188	; 188 in two bytes, high byte zero
RES	RMB	2	; 2-byte result
	...
	LDAB	NLFT+1	; Get the left low byte.
	LDAA	NLFT	; Get the left high byte.
	SUBB	NRT+1	; Subtract the right low byte.
	SBCA	NRT	; Subtract the right high byte.
	STAB	RES+1	; Store the result low byte.
	STAA	RES	; Store the result high byte.

The carry/borrow flag is unaffected by storing the result, so it's okay. (The H Half-carry and V oVerflow flags, we haven't talked about, so we won't at this point.) Since we stored the less significant byte and then the more significant byte, in that order, the N sign bit flag reflects the high bit of the A accumulator, or the more significant byte, which is correct.

So branch on carry set or clear, and branch on plus/minus branches will both work after the second store.

But the Z Zero flag only represents the final STAA instruction, so it only tells us whether the high byte is zero or not, which is almost never what we want to know.

Subtraction is often used to compare two numbers. If the result is positive, the left side is greater. If the result is negative, the right side is greater. And if the result is zero, both sides were equal.

With what we have, the Carry flag being set tells us that the right side was greater. But the Carry flag being clear tells us that either the left side was greater or the two numbers were equal.

As with adding, we can fix that by OR-ing the two bytes, which, in this case, since the result is both in memory and in the accumulators, is straightforward, just use the ORAA on the low byte at RES+1, then you can branch on zero and know that both bytes are represented in the Z flag. (We'll talk more about this later.) So, for example,


    NLFT	FDB	132	; 132 in two bytes, high byte zero
NRT	FDB	188	; 188 in two bytes, high byte zero
RES	RMB	2	; 2-byte result
FLAGS	RMB	1	; temporary for the flags
	...
	LDAB	NLFT+1	; Get the left low byte.
	LDAA	NLFT	; Get the left high byte.
	SUBB	NRT+1	; Subtract the right low byte.
	SBCA	NRT	; Subtract the right high byte.
	STAB	RES+1	; Store the result low byte.
	STAA	RES	; Store the result high byte.
* Usually, you don't want to go to all this trouble!
	TPA		; get the flags in A
	ANDA	#$FB	; clear the Z flag
	STAA	FLAGS
	ORAB	RES	; Set the correct Z flag
	TPA
	ANDA	#$04	; clear all but Z
	ORAA	FLAGS	; combine the flags
	TAP		; replace the flags
* Now all branches work as they should.

At some point, I need to talk about where the scratch RAM should be and why, but this should be enough for this note, I guess.

I don't recommend using the return stack, but we'll look at it:


    NL1	FCB	34	; just an arbitrary small number
NR1	FCB	66	; another arbitrary small number
RES1	RMB	2	; 2-byte result
C1	EQU	RES1	; To look at the carry from the sum.
R1	EQU	RES1+1	; To look at the the eight bit sum.
	...
	LDB	NR1	; Get the addend (right side).
	CLRA		; Clear storage for high byte.
	PSHS	A,B	; pushed in right order
	LDB	NL1	; Get the augend. A still clear.
	ADDD	,S++	; add the right side and pop it.
	STD	RES1	; Save the 9 bit result in 16 bits.

If we have the parameter stack set up, we can use that, instead:


    NL1	FCB	34	; just an arbitrary small number
NR1	FCB	66	; another arbitrary small number
RES1	RMB	2	; 2-byte result
C1	EQU	RES1	; To look at the carry from the sum.
R1	EQU	RES1+1	; To look at the the eight bit sum.
	...
	LDB	NR1	; Get the addend (right side).
	CLRA		; Clear storage for high byte.
	PSHU	A,B	; pushed in right order
	LDB	NL1	; Get the augend. A still clear.
	ADDD	,U++	; add the right side and pop it.
	STD	RES1	; Save the 9 bit result in 16 bits.

The subtraction version would simply replace ADDD with SUBD, and it would be done.

Do you see the meta-similarities between the 6809 and 68000?

Did I mention before that the 6809 is not the predecessor to the 68000, nor is it 68000-lite? They were developed in parallel, taking the 6800 as a springboard, referring to a study Motorola made of code for the 6800, looking for ways to relieve bottlenecks in code and improve efficiency, with the two projects heading slightly different directions.

Oh, and the 6801 project actually began while they were getting silicon on the 6809, so the 6801 is actually a third direction, which Motorola followed up on with the very-well received 68HC11. Ah, the things that could have been.

I know, I've mentioned this before. I'm sure I have. I harp on it too much.

(Title Page/Index)

Saturday, September 28, 2024

ALPP 02-07 -- A Note on Byte Widening and Scratch Registers on the 6800, 6801, and 6809

A Note on Byte Widening and Scratch Registers
on the 6800, 6801, and 6809

(Title Page/Index)

I briefly mentioned, in the introduction to byte arithmetic chapter on the 6800, 6801, and 6809, the necessity of expanding byte data to 16-bit to use the 6801 and 6809's ADDD and SUBD instructions. I gave a more concrete example of byte widening in the 68000's chapter on byte arithmetic that we just finished.

In the 68000, the data registers are plenty wide enough. All we need to do, in the case of unsigned byte widening, is to make sure that, before we load the byte, enough of the register is cleared out to hold the target width.

In the case of signed byte widening for the 68000, we can use the sign-EXTend instruction after loading the byte value, which we will take a look at later.

(Other 16-bit and wider CPUs have load instructions that automatically sign-extend or zero-extend registers on load, instead.)

Let's look at how we can widen a byte on the 6800.

Hmm? What? The 6800 doesn't have a 16-bit add, you say?

Well, yeah, but, (mutter, mumble, ...) actually, when we cleared A out before we used ADCA #0 and SBCA #0, we were effectively performing a byte widening. We just used an immediate zero byte as the high byte of the widened source.

Why are you looking at me like that?

Sigh. Okay, remember, add with carry (ADC) allows us to expand the width of the ADD instruction. So we can synthesize a sixteen-bit add quite easily:


    NLFT	FDB	132	; 132 in two bytes, high byte zero
NRT	FDB	188	; 188 in two bytes, high byte zero
RES	RMB	2	; 2-byte result
	...
	LDAB	NLFT+1	; Get the left low byte.
	LDAA	NLFT	; Get the left high byte.
	ADDB	NRT+1	; Add the right low byte.
	ADCA	NRT	; Add the right high byte.
	STAB	RES+1	; Store the result low byte.
	STAA	RES	; Store the result high byte.

And there's your 16-bit add. Just takes 6 instructions. >:-)

Comparing that to the 6801 and 6809, same declarations as above:


    NLFT	FDB	132	; 132 in two bytes, high byte zero
NRT	FDB	188	; 188 in two bytes, high byte zero
RES	RMB	2	; 2-byte result
	...
	LDD	NLFT	; Get the left 16-bit word.
	ADDD	NRT	; Add the right 16-bit word.
	STD	RES	; Store the result 16-bit word.

3 instructions.

Okay, it takes a few more than 6 on the 6800 if we need to get the flags perfectly right, but we don't usually need to get the flags perfect for addition. Hardly ever, really. The Carry is correct, and so is the Negative. The Zero only reflects the high byte, which is wrong, but we have ways to handle that. So we're good for most purposes on the 6800 -- for addition.

If we need to test for zero results, we can check that Carry is clear first, then OR the result low and high bytes together. If carry was set, we know the result was $10000 or greater, so, non-zero. ORing the low and high bytes together will leave non-zero if either was non-zero, so that completes the test. (I can make it sound simple, right? It's simple here because we have two copies of the result, one in memory and one in the accumulators. There is more to this, but I don't want to get too distracted. We'll come back to it.)

And in another point of fact, because loads and stores don't affect the carry flag, we could do all that math in 6 instructions just using a single accumulator instead of using both.

So, let's take that as a jumping-off point, declare the values we're adding as bytes again, and expand before adding. We'll start by declaring a byte of scratch RAM, preferably in the direct page:


    SCRTCH	RMB	1	; preferably in DP
	...
NLFT	FCB	132	; 132 in one byte
NRT	FCB	188	; 188 in one byte
RES	RMB	2	; 2-byte result
	...
	CLRA		; A high byte of zero
	STAA	SCRTCH	; Another high byte of zero
	LDAB	NLFT	; Get the left byte.
	ADDB	NRT	; Add the right byte.
	ADCA	SCRTCH	; Add the widening bytes.
	STAB	RES+1	; Store the result low byte.
	STAA	RES	; Store the result high byte.

So, do you believe me now?

Let's look at that on the 6801 (and 6809, by translating LDAB to LDB):


    SCRTCH	RMB	2	; preferably in DP
	...
NLFT	FCB	132	; 132 in one byte
NRT	FCB	188	; 188 in one byte
RES	RMB	2	; 2-byte result
	...
	CLRA		; A high byte of zero
	LDAB	NRT
	STD	SCRTCH	; widened 16-bit word
	LDAB	NLFT	; Get the left byte, A still clear.
	ADDD	SCRTCH	; Add the widened 16-bit word.
	STD	RES	; Store the result high byte.

Because the D register operators need 16-bit operands, widening for the 6801 and 6809 requires two bytes of scratch RAM instead of one -- unless you're doing the math one byte at a time, in which case you can still use the immediate zero argument to ADC as the source.

The above sources are slightly incomplete, so you might need to edit them a little to check what I am saying. Not much, but a little.

Subtraction is pretty similar, although we do care more about the flags after subtraction.

By widening before we add or subtract, the addition and subtraction sequences can be made quite parallel. You might want to simply substitute SUBtract for ADD and SuBtract with Carry for ADd with Carry in the above code to see what happens.

I keep trying to go a bit further on the flags and such, but that is a distraction.

One thing that I need to make clear before we go, where the 68000 with lots of accumulator-equivalet data registers shows how natural byte widening is, having only one 16-bit accumulator makes byte widening on the 6801 and 6809 use scratch registers.

And I'm thinking we can now, after you've played with the subtraction code, move forward to doing proper 16-bit arithmetic.

>:->

(Title Page/Index)

Sunday, September 22, 2024

ALPP 02-06 -- Introduction to Byte Arithmetic on the 68000

Introduction to Byte Arithmetic
on the 68000

(Title Page/Index)

If you haven't already worked through the introduction to byte arithmetic on the 6800, 6801, and 6809, you should. I explained the byte math there, so I'm going to focus on 68000 code here, and on the differences between the 68000 code and the 6800/6801/6809 code.

I'm going to show three different versions for the 68000. The first version follows, as closely as possible, the 6800/6801/6809 code.

But to reduce the back-and-forth between the editor, the assembler, and the debugger, and to reduce the number of files to edit, I'm putting the addition and subtraction code together into a single file:


    	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
* Adding and subtracting two 8-bit numbers in memory
* and storing them in memory,
* transliterating the 6800 code --
*
	EVEN
ENTRY	JMP	ADDM8


* The constants and result storage for the additions:

NL1	DC.B	34	; just an arbitrary small number
NR1	DC.B	66	; another arbitrary small number
C1	DS.B	1	; To hold one bit of carry from the sum.
R1	DS.B	1	; To hold the eight bit sum.

NL2	DC.B	132	; Somewhat larger arbitrary number
NR2	DC.B	188	; And another
C2	DS.B	1	; carry from 2nd sum
R2	DS.B	1	; 2nd sum


* I could have coordinated these better,
* But I'll rename the constants for the subtractions 
* to ML1, ML2, and ML3:

ML1	DC.B	66	; just an arbitrary small number
MR1	DC.B	34	; another arbitrary small number
RH1	DS.B	1	; To hold borrow/sign byte from the difference.
RL1	DS.B	1	; To hold the eight bit difference.

ML2	DC.B	132	; Somewhat larger arbitrary number
MR2	DC.B	188	; And another
RH2	DS.B	1	; borrow/sign from 2nd difference
RL2	DS.B	1	; 2nd difference

ML3	DC.B	34	; just an arbitrary small number
MR3	DC.B	66	; another arbitrary small number
RH3	DS.B	1	; To hold borrow/sign byte from the difference.
RL3	DS.B	1	; To hold the eight bit difference.



	EVEN
ADDM8	CLR.B	D6	; clear a register for the carry bit
	MOVE.B	NL1,D7	; Get the augend (left side addend).
	ADD.B	NR1,D7	; Add the addend (right side addend).
* 8-bit result is safely in D7.
	ROXL.B	#1,D6	; Recover the eXtend carry bit.
	MOVE.B	D6,C1	; Save it away.
	MOVE.B	D7,R1	; Save the sum itself.
*
	CLR.B	D6	; 2nd carry bit
	MOVE.B	NL2,D7	; 2nd augend (left side addend)
	ADD.B	NR2,D7	; 2nd addend (right side addend)
	MOVE.B	D7,R2	; 2nd sum, 8 bits
* MOVE does not alter X bit, carry eXtend still safe.
	ROXL.B	#1,D6	; 2nd carry bit
	MOVE.B	D6,C2	; Save it away.
* Could also use the ADDX.B with D5 pre-cleared method
* used in the subtraction examples below.
*
	NOP
	NOP


SUBM8	CLR.B	D5	; Need a constant zero. (No SUBX #0,Dn.)
	CLR.B	D6	; clear register D6 for the borrow/sign extension
	MOVE.B	ML1,D7	; Get the minuend (left side).
	SUB.B	MR1,D7	; Subtact the subtrahend (right side).
* Result is safely in D7.
	SUBX.B	D5,D6	; Recover the eXtend borrow bit, sign extended.
	MOVE.B	D6,RH1	; Save high byte away.
	MOVE.B	D7,RL1	; Save the difference low byte.
*
	CLR.B	D5	; should still be clear, actually.
	CLR.B	D6	; 2nd sign extension
	MOVE.B	ML2,RL2	; 2nd minuend to memory, just because we can
	MOVE.B	MR2,D7	; 2nd subtrahend in register
	SUB.B	D7,RL2	; 8-bit result safely saved away
	SUBX.B	D5,D6	; 2nd borrow, sign extended
	MOVE.B	D6,RH2	; Save high byte away.
*
	CLR.B	D5	; Again, should still be clear.
	CLR.B	D6	; 3rd borrow bit
	MOVE.B	ML3,D7	; 3rd minuend (left side)
	SUB.B	MR3,D7	; 3rd subtrahend (right side)
	MOVE.B	D7,RL3	; Save 8-bit result away
* MOVE does not alter X bit, borrow eXtend still safe.
	SUBX.B	D5,D6	; 3rd borrow, sign extended.
	MOVE.B	D6,RH3	; Save it away.
*
	NOP
	NOP

The comments pretty much explain what I've done with the variable names, also pretty much what is going on in the code.

Go ahead and open up a second browser window with the 6800/6801/6809 code and compare.

You can see that we can use memory and registers in much the same way as on the 8-bit processors, with some odd exceptions.

You'll note that having lots of data registers gives us more options in how to organize our code. In some places, I deliberately altered some steps to show that. But, in other places, I was forced to make some changes because of those odd exceptions.

Other than having lots of registers, specific things to pay attention to:

(1) Rotating bits into memory is limited to 16-bit wide memory targets. We can't rotate only 8 bit targets in memory. But we can rotate 8-bit targets in a data register, so we do that.

Speaking of rotating, rotates and shifts on the 68000 do not use the Carry flag!

They use the eXtended carry flag, thus, we have the mnemonic, ROXL.

Why not ROL?

Lemme 'splain!

Rotating through carry is actually a 9-bit rotate.

That's 8 bits in the target plus the carry (on the 6800, etc.) or the eXtended carry (on the 68000).

So you don't really have an 8-bit rotate on the 6800/6801/6809.

Fortunately, 9 bit rotates is what is most common, but, if 9 bits is too many, you have to short-circuit the carry with some odd logic that I'm not going to show yet.

The 68000 gives you true 8-bit rotates -- and 16-bit and 32-bit -- without going through the (eXtended) carry, and ROtate Left by 8/16/32 bits (without going through the eXtended carry) is what ROL and ROR mean on the 68000.

Erk. Lot's of explanation we didn't think we wanted just here, but keep it in mind. It'll come in useful later. Or, perhaps, it's a waste of micro-instructions in the 68000 CPU, but that's something to think about later.

Anyway, that's part of what's behind the differences in the bit rotation code.

(Why split the carry function into Carry and eXtended carry?

SHHHH!

You're not supposed to ask that.

Heh. Yet.)

(2) The Carry flag is cleared by MOVEs. The eXtended carry flag is not. This allows some variation in execution order with the X flag, which can come in handy sometimes. It also forces execution order sometimes.

(3) Instead of ADd with Carry (ADC) and SuBtract with Carry (SBC), the 68000 has ADD with eXtend and SUBtract with eXtend, ADDX and SUBX.

... aaaaand ...

Where other instructions on the 68000 are really flexible as to where the source or target are, the ADDX, SUBX, and CMPX instructions are not.

You can ADDX, SUBX, or CMPX two registers. Not memory-to-register, not register-to-memory. Or you can ADDX or SUBX memory-to-memory in auto-increment mode. And you can CMPX memory-to-memory in auto-decrement mode.

No immediate mode operands. No ADDX #0.

ARRRGGGGGGHHHHHHH!!!!!! WHY?!?!?!?!?!?

Sigh. There is reason to this madness, sort-of. Maybe just madness. We'll come back to this.

For now, it can be made to work by using another register to hold the zero.

Yes, that RISC trick.

Well, a partial response to the madness is due here. The 68000 design project began about the time certain departments within IBM were just digging into the 801 processor. Patterson would not go on sabbatical to DEC for a couple of years yet. Not just Motorola, everybody was exploring unknown territory, going boldly where no man (in history) had gone before. Motorola made some mistakes. We all made mistakes -- and are now living with the consequences.

RISC isn't the Holy Grail, anyway.

Guys like me will daydream about what might have been, but daydreaming has limits.

We can work with less-than-optimal. With some care, we can get pretty close to the what should have been, and that's why I'm writing this tutorial.

Let me step off the soap box, and we'll take a look at some nifty gadgetry in the 68000 that provides an alternate way of doing this. But first, copy the code to a text file, save it, maybe as arithm8.s or something, and assemble it with something like

$ vasmm68k_mot -Ftos -no-opt -o ARITHM8.PRG -L arithm8.lst arithm8.s

Get Hatari running, go through the

CTRL-Z, CD to the target emulated directory, ALT-BREAK, set the "b pc = TEXT :once" breakpoint, (c)ontinue

protocol and use the debugger to (s)tep through the code, showing (r)egister and (m)emory contents as appropriate.

The final results in memory should be


    > m TEXT 20
00013D10: 4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22   N...=*"B.d...@B"
00013D20: 00 20 84 bc ff c8 22 42 ff e0 42 06 1e 39 00 01   . ...."B..B..9..

Well, the actual address of TEXT should be (may be?) different, but the contents should be there. Showing the results you want in red (after the numbers we added and subtracted in blue):

4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22
00 20 84 bc ff c8 22 42 ff e0 42 06 1e 39 00 01

And you'll notice that I'm so focused I didn't include exit code. That's okay, just (q)uit when you're done.

Okay, that nifty gadgetry I promised:


    	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
* Adding and subtracting two 8-bit numbers in memory
* and storing them in memory,
* Using the Set conditional instruction to capture the Carry/borrow bit
* instead of the eXtended carry/borrow bit 
*
	EVEN
ENTRY	JMP	ADDM8


* The constants and result storage for the additions:

NL1	DC.B	34	; just an arbitrary small number
NR1	DC.B	66	; another arbitrary small number
C1	DS.B	1	; To hold one bit of carry from the sum.
R1	DS.B	1	; To hold the eight bit sum.

NL2	DC.B	132	; Somewhat larger arbitrary number
NR2	DC.B	188	; And another
C2	DS.B	1	; carry from 2nd sum
R2	DS.B	1	; 2nd sum


* I could have coordinated these better,
* But I'll rename the constants for the subtractions 
* to ML1, ML2, and ML3:

ML1	DC.B	66	; just an arbitrary small number
MR1	DC.B	34	; another arbitrary small number
RH1	DS.B	1	; To hold one bit of borrow from the difference.
RL1	DS.B	1	; To hold the eight bit difference.

ML2	DC.B	132	; Somewhat larger arbitrary number
MR2	DC.B	188	; And another
RH2	DS.B	1	; borrow from 2nd difference
RL2	DS.B	1	; 2nd difference

ML3	DC.B	34	; just an arbitrary small number
MR3	DC.B	66	; another arbitrary small number
RH3	DS.B	1	; To hold one bit of borrow from the difference.
RL3	DS.B	1	; To hold the eight bit difference.



	EVEN
ADDM8	CLR.B	C1	; clear memory for the carry bit/sign byte
	MOVE.B	NL1,D7	; Get the augend (left side addend).
	ADD.B	NR1,D7	; Add the addend (right side addend).
* Result is safely in D7, carry in C,
* But a MOVE will destroy the Carry in C.
	SCS	C1	; Recover the Carry bit.
	AND.B	#1,C1	; Mask off the unnecessary bits.
	MOVE.B	D7,R1	; Save the 8-bit sum itself.
*
	CLR.B	C2	; memory for 2nd carry bit/sign byte
	MOVE.B	NL2,R2	; 2nd augend to result memory
	MOVE.B	NR2,D7	; 2nd addend to a register
	ADD.B	D7,R2	; 8-bit result safely stored
* A MOVE will destroy the Carry in C.
	SCS	C2	; 2nd carry bit
	AND.B	#1,C2	; Mask off the unnecessary bits.
*
	NOP
	NOP


SUBM8	CLR.B	RH1	; memory for 1st borrow/sign byte
	MOVE.B	ML1,D7	; Get the 1st minuend (left side).
	SUB.B	MR1,D7	; Subtact the 1st subtrahend (right side).
* Result is safely in D7, borrow in C,
* But a MOVE will destroy the borrow in C.
	SCS	RH1	; 1st sign byte/borrow, DO NOT MASK!
	MOVE.B	D7,RL1	; Save the result low byte.
*
	CLR.B	RH2	; 2nd borrow/sign extension
	MOVE.B	ML2,RL2	; 2nd minuend (left side) to memory
	MOVE.B	MR2,D7	; 2nd subtrahend (right side)
	SUB.B	D7,RL2	; 2nd result safely stored.
* A MOVE will destroy the borrow in C.
	SCS	RH2	; 2nd borrow, sign extended
*
	CLR.B	D6	; 3rd borrow/sign byte in register
	MOVE.B	ML3,D7	; 3rd minuend (left side)
	SUB.B	MR3,D7	; 3rd subtrahend (right side)
	SCS	D6	; Get sign/borrow in regiser
* A MOVE will destroy the borrow in C.
	MOVE.B	D7,RL3	; Save 3rd difference, low byte.
	MOVE.B	D6,RH3	; Save sign/borrow away.
*
	NOP
	NOP

(1) Set conditionally is an interesting instruction. You can set a byte to all 1s on the specified condition. It does not text eXtended carry, but it does test Carry. So you can use Set Carry Set (SCS) immediately after an ADD or SUBtract or such, but if you do a MOVE, the Carry is gone.

(2) After an ADD, since we only want the borrow/carry from the high bit of the byte, all 1s from a SCS is too many 1s. So we can mask all but the bottom 1 out with AND immediate 1.

(3) AND is one of several instructions that can operate directly on memory, so if the target of SCS is in memory, we can mask the target directly.

Note that it takes more time to mask something in memory, because the processor has to read it out of memory into an hidden latch to work on it, then write it back. This is true of any operator that works directly on memory, including the Set conditionally operators. So there are trade-offs.

(3) After a SUBtract, SCS setting all ones on borrow is actually what we want, since all 1s is how we know it's negative. So there's no need to mask after a subtract.

(4) ADD and SUBtract are also operators that can work directly on memory. Again, they take more time operating on memory than they do operating on registers.

Oddly enough, for as nifty as this Set conditionally and operate-directly-on-memory gadgetry is nifty, it turns out not to be very exciting. But that is not necessarily bad.

After stepping through and watching the registers and memory, make one final check to see that the values store are correct as in the first example, com;pare below:


    00013D10: 4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22   N...=*"B.d...@B"
00013D20: 00 20 84 bc ff c8 22 42 ff e0 42 39 00 01 3d 18   . ...."B..B9..=.

Not that the instructions following the results are different from the first example, as they should be. Showing the results in red again:

4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22
00 20 84 bc ff c8 22 42 ff e0 42 39 00 01 3d 18

Finally, thinking about the 6801 and 6809's double accumulator instructions, the 68000 has the ability to do both 16-bit and 32-bit math and loads and stores.

But it has no easy way to concatenate the least significant bytes of two registers similar to the 6801/6809 double accumulator instructions. Well, okay, it's not that hard, I guess, but it's harder than just being able to STD.

And there are restrictions, such as 16-bit and 32-bit operands in memory have to be 16-bit aligned. (Some of the later processors in the family relax this requirement.)

To see how that will play out on the 68000, here's some code:


    	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
* Adding and subtracting two 8-bit numbers in memory
* and storing them in memory,
* Using 68000's 16-bit arithmetic --
*
	EVEN
ENTRY	JMP	ADDM8
* Instructions are always on even boundaries, 
* and use an even number of bytes.

* Remember that the 68000 requires 16-bit and 32-bit accesses 
* to be on 2-byte even boundaries.

* The constants for the additions:

NL1	DC.B	34	; just an arbitrary small number
NR1	DC.B	66	; another arbitrary small number
	EVEN
NSUM1	DS.W	1	; 2 bytes to hold carry/sign and 8-bit sum
NDH1	EQU	NSUM1	; carry/sign byte
NDL1	EQU	NSUM1+1	; 8-bit sum

NL2	DC.B	132	; Somewhat larger arbitrary number
NR2	DC.B	188	; And another
	EVEN
NSUM2	DS.W	1	; carry/sign and 8-bit sum
NDH2	EQU	NSUM2	; carry/sign byte
NDL2	EQU	NSUM2+1	; 8-bit sum


* I could have coordinated these better,
* But I'll rename the constants for the subtractions 
* to ML1, ML2, and ML3:

ML1	DC.B	66	; just an arbitrary small number
MR1	DC.B	34	; another arbitrary small number
	EVEN
MDIFF1	DS.W	1	; borrow/sign and 8-bit differenc
MDH1	EQU	MDIFF1	; To hold borrow/sign from the difference.
MDL1	EQU	MDIFF1+1	; To hold the eight bit difference.

ML2	DC.B	132	; Somewhat larger arbitrary number
MR2	DC.B	188	; And another
	EVEN
MDIFF2	DS.W	1	; borrow/sign and 8-bit differenc
MDH2	EQU	MDIFF2	; borrow/sign from 2nd difference
MDL2	EQU	MDIFF2+1	; 2nd difference

ML3	DC.B	34	; just an arbitrary small number
MR3	DC.B	66	; another arbitrary small number
	EVEN
MDIFF3	DS.W	1	; borrow/sign and 8-bit differenc
MDH3	EQU	MDIFF3	; To hold borrow/sign from 3r difference.
MDL3	EQU	MDIFF3+1	; 3rd eight bit difference.



	EVEN
ADDM8	CLR.W	D6	; Prepare left side for 16-bit math.
	MOVE.B	NL1,D6	; Get the augend (left side addend).
	CLR.W	D7	; Prepare right side for 16-bit math.
	MOVE.B	NR1,D7	; Get the addend (right side addend).
	ADD.W	D7,D6	; Add right into left.
* Result is safely in D6, including carry.
	MOVE.W	D6,NSUM1	; Save the 8-bit sum along with the carry/sign.
*
	CLR.W	D6	; Prepare left side for 16-bit math.
	CLR.W	D7	; Prepare right side for 16-bit math.
	MOVE.B	NL2,D6	; Get the augend (left side addend).
	MOVE.B	NR2,D7	; Get the addend (right side addend).
	ADD.W	D7,D6	; Add right into left.
* Result is safely in D6, including carry.
	MOVE.W	D6,NSUM2	; Save the 8-bit sum along with the carry/sign.
*
	NOP
	NOP

SUBM8	CLR.W	D6	; Prepare left side for 16-bit math.
	MOVE.B	ML1,D6	; Get the minuend (left side).
	CLR.W	D7	; Prepare right side for 16-bit math.
	MOVE.B	MR1,D7	; Get the subtrahend (right side).
	SUB.W	D7,D6	; Subtract right from left.
* Result is safely in D6, including carry.
	MOVE.W	D6,MDIFF1	; Save the 8-bit sum along with the carry/sign.
*
	CLR.W	D6	; Prepare left side for 16-bit math.
	CLR.W	D7	; Prepare right side for 16-bit math.
	MOVE.B	ML2,D6	; Get the minuend (left side).
	MOVE.B	MR2,D7	; Get the subtrahend (right side).
	SUB.W	D7,D6	; Subtract right from left.
* Result is safely in D6, including carry.
	MOVE.W	D6,MDIFF2	; Save the 8-bit sum along with the carry/sign.
*
	CLR.W	D6	; Prepare left side for 16-bit math.
	MOVE.B	ML3,D6	; Get the minuend (left side).
	CLR.W	D7	; Prepare right side for 16-bit math.
	MOVE.B	MR3,D7	; Get the subtrahend (right side).
	SUB.W	D7,D6	; Subtract right from left.
* Result is safely in D6, including carry.
	MOVE.W	D6,MDIFF3	; Save the 8-bit sum along with the carry/sign.
*
	NOP
	NOP

Check the results:


    00013D10: 4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22   N...=*"B.d...@B"
00013D20: 00 20 84 bc ff c8 22 42 ff e0 42 46 1c 39 00 01   . ...."B..BF.9..

The sums and differences in red:

4e f9 00 01 3d 2a 22 42 00 64 84 bc 01 40 42 22
00 20 84 bc ff c8 22 42 ff e0 42 46 1c 39 00 01

The comments do tell a lot about what is happening, but they are a little terse.

Knowing what the code looks like, I can say that all the EVEN directives in there are unnecessary. But I put them in for emphasis, and just in case I do something strange with the code later when I've forgotten a lot about it. And for anyone else who needs to read the code.

I hope the data declarations and allocations are clear enough. Unlike high-level languages, assembler can't enforce the intended access widths and boundaries. The best we can do is provide labels where access is intended, with comments about the intended width. Trailing equates can help when you intend alternate access points and widths.

Why am I using 16-bit addition and subtraction here, when the 6800, etc. code did not?

If we use byte addition and subtraction, we end up with the carry/borrow in the flags, but not in the register. That means we have to have instructions to bring it in, which we have already seen. We're trying to avoid that.

Well, there's an approach I haven't mentioned, using branches, but I am deliberately avoiding that. It looks like this:


    	...
ADDM8	MOVE.B	NL1,D7	; Get the augend (left side addend).
	ADD.B	NR1,D7	; Add the addend (right side addend).
	BCC.S	ADDM8NC
	OR.W	#$0100,D7	; set bit 8
	BRA.S	ADDM8M
ADDM8NC	AND.W	#$FEFF,D7	; clear bit 8
ADDM8M	AND.W	#$01FF,D7	; clear the rest
	MOVE.W	D7,NSUM1	; Save the 8-bit sum along with the carry/sign.
	...
SUBM8	MOVE.B	ML1,D7	; Get the minuend (left side).
	SUB.B	MR1,D7	; Subtract the subtrahend (right side).
	BCC.S	SUBM8NB
	OR.W	#$FF00,D7	; set the borrow/sign
	BRA.S	SUBM8M
SUBM8NB	AND.W	#$00FF,D7	; clear the borrow/sign
SUBM8M	MOVE.W	D7,MDIFF1	; Save the 8-bit sum along with the carry/sign.
	...

For the addition code, the 68000 Bit Set (BSET) and Bit Clear (BCLR) instructions could also be used, but you still have to mask off the remaining bits with the final AND.W, because you don't know what they were when you started.

And I am deliberately avoiding mentioning that, you understand.

:-/

If you're wondering, it's good to avoid branches if it doesn't cost too much to do so.

Back to the file we were working on -- because we are going to use word arithmetic to do byte math, we need to clear the lower 16 bits of the registers we are using. Otherwise, we don't know what's up there in the higher order byte, and it's likely to be stuff that trashes our results.

We can clear both words and then load both words, or we can clear and load and clear and load. Either way.

We clear them because the bytes are unsigned. If the were signed, we could use the EXT.W instruction to sign-extend them after loading, instead.

This is a pattern on the 68000 -- for extending unsigned data that will be widened, you clear the width you need before loading the data. For extending signed data, you load and then sign EXTend the data.

After that, I hope it's all straightforward. Do the math in 16-bit width, store the result in 16-bit width. Subtracting at the wider width takes care of the high byte for you.

Does this feel like the approach we should have been looking at from the start? I hope so.

By the way, RISC CPUs use this approach pretty much exclusively. It's part of the concept of RISC.

While we're talking about it, how would we go about expanding the 8-bit operands to use 16-bit math on the 6809? Maybe we should take a quick look at that next.

(Title Page/Index)

Friday, September 20, 2024

ALPP 02-05 -- Introduction to Byte Arithmetic on the 6800, 6801, and 6809

Introduction to Byte Arithmetic
on the 6800, 6801, and 6809

(Title Page/Index)

In the last four chapters, we reused the Hello World example to introduce and demonstrate the use of a separate parameter stack, first, synthesizing the parameter stack in software on the 6800, and 6801, then using the advanced addressing capabilities of the 6809 and 68000 to directly implement the parameter stack using the CPU's own instructions.

It may have been a little deep.

It's okay. It just gets deeper. But in this chapter, we're going to relax a bit and get a better feel for some of the fundamentals of arithmetic on digital logic processors.

8-bit CPUs generally work on data in 8-bit bytes. (Other size bytes have existed.)

Just as a reminder, the word "byte" is a pun. This is bite-sized data, so to speak. But you knew that. You also knew that there are other byte sizes than 8 bits, right?

The 6800, 6801, and 6809 all have 8-bit bytes. So do the members of the 68000 family, although their natural integer width is 16 or 32 bits. They address memory in 8-bit bytes.

Eight bits (binary digits) in a latch yield 256 discrete values. These values can represent, among many other things, up to 256 states in a control system, up to 256 characters (as in ASCII or EBCDIC text) in a small character set, or small integers. One common thing to have them represent is the unsigned small integers 0 to 255_ten.

Eight bits could also represent signed small integers, such as -128_ten to +127_ten, or -127_ten to -0 and +0 to +127_ten, but we'll set signed integers aside for a moment. (Sort-of. Just a moment.)

Let's take a closer look at adding small unsigned integers.

One question that arises when you know you can only represent numbers from 0 to 255 is what happens when you go over 255?

What happens when you go over nine in a column when adding decimal (base ten) numbers by hand? Let's see if we can remember:


     34
+66
---
???

First column adds up to ten. Write down your zero, carry your one to the next column:


     1  (carries)
 34
+66
---
__0

Nine and the carried one makes ten in the second column, write down the zero, carry the one:


    11  (carries)
 34
+66
---
_00

And so forth.


    11  (carries)
 34
+66
---
100

Let's replay that in hexadecimal:


     22
+42
---
???

Two and two is four, no carry:


     0  (carries)
 22
+42
---
__4

Two and four is six, no carry:


    00  (carries)
 22
+42
---
_64

Interesting. 64_sixteen is 100_ten. I wonder what 64_ten is in hexadecimal. :)

How would this sum look in binary?


     00100010
+01000010
---------
?????????

       0  (carries)
 00100010
+01000010
---------
________0


      10  (carries)
 00100010
+01000010
---------
_______00


     010  (carries)
 00100010
+01000010
---------
______100


    0010  (carries)
 00100010
+01000010
---------
_____0100

... and a few more uninteresting columns without carries until we get:


    00000010  (carries)
 00100010
+01000010
---------
_01100100

Bit serial CPUs actually do that one bit at a time. But none of the processors we are working with are bit serial. The 8-bit processors all do eight bits in parallel. And the 68000 does sixteen bits in parallel.

So, what happens when you get a carry from the highest bit in a byte?

Consider a byte to be a column in base 256, and it is plain. You get a carry to the next byte.

Just like carries from column to column in base ten never exceed 1, and carries from column to column in hexadecimal never exceed 1, carries from column to column in binary never exceed 1, and carries in base 256 never exceed 1.

So, when you go over 255 in unsigned byte additions in a CPU, you have a carry, of course. And the CPU, if it's a 6800, 6801, or 6809, records it in the (C)arry flag.

There are other ways to do this, but most 8-bit CPUs do it this way.

So let's look at some code.

We've done 8-bit additions before, using immediate constants. This time, we'll separate the numbers from the code and put them in global constants and variables (even though I've said global constants and variables are not what we generally want to do). Also, I'll be showing several different approaches, rather than just showing you some "optimal" subroutine. And we'll do something so we don't lose the Carry flag:


    * Add two 8-bit numbers in memory and store them in memory:
*
ENTRY	JMP	ADDM8
*
NL1	FCB	34	; just an arbitrary small number
NR1	FCB	66	; another arbitrary small number
C1	RMB	1	; To hold one bit of carry from the sum.
R1	RMB	1	; To hold the eight bit sum.
*
NL2	FCB	132	; Somewhat larger arbitrary number
NR2	FCB	188	; And another
C2	RMB	1	; carry from 2nd sum
R2	RMB	1	; 2nd sum
*
*
ADDM8	CLR	C1	; clear storage for the carry bit
	LDAB	NL1	; Get the augend (left side addend).
	ADDB	NR1	; Add the addend (right side addend).
* Result is safely in B.
	ROL	C1	; Save the carry bit away.
	STAB	R1	; Save the sum itself.
*
	CLR	C2	; 2nd carry bit
	LDAB	NL2	; 2nd augend (left side addend)
	ADDB	NR2	; 2nd addend (right side addend)
	STAB	R2	; 2nd sum, 8 bits
* ST and LD do not alter C, carry still safe.
	ROL	C2	; 2nd carry bit
	NOP
	NOP

Assemble that and step through it on EXORsim running a 6800 instance:


    $ ./exor --mon
Load facts file 'facts'
'exbug.bin' loaded.
  EXBUG-1.1 detected
'mdos.dsk' opened for drive 0 (double sided)

OSLOAD...

Hit Ctrl-C for simulator command line.  Starting simulation...

>         0 A=00 B=00 X=0000 SP=00FF ------          0020: B6 E8 00 LDA E800                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% a 2000
2000: * Add two 8-bit numbers in memory and store them in memory:
2000: *
2000: ENTRY	JMP	ADDM8
2003: *
2003: NL1	FCB	34	; just an arbitrary small number
2004: NR1	FCB	66	; another arbitrary small number
2005: C1	RMB	1	; To hold one bit of carry from the sum.
2006: R1	RMB	1	; To hold the eight bit sum.
2007: *
2007: NL2	FCB	132	; Somewhat larger arbitrary number
2008: NR2	FCB	188	; And another
2009: C2	RMB	1	; carry from 2nd sum
200a: R2	RMB	1	; 2nd sum
200b: *
200b: *
200b: ADDM8	CLR	C1	; clear storage for the carry bit
Address at 2001 set to 200B
200e: 	LDAB	NL1	; Get the augend (left side addend).
2011: 	ADDB	NR1	; Add the addend (right side addend).
2014: * Result is safely in B.
2014: 	ROL	C1	; Save the carry bit away.
2017: 	STAB	R1	; Save the sum itself.
201a: *
201a: 	CLR	C2	; 2nd carry bit
201d: 	LDAB	NL2	; 2nd augend (left side addend)
2020: 	ADDB	NR2	; 2nd addend (right side addend)
2023: 	STAB	R2	; 2nd sum, 8 bits
2026: * ST and LD do not alter C, carry still safe.
2026: 	ROL	C2	; 2nd carry bit
2029: 	NOP
202a: 	NOP
202b: 
% u 2000
2000: 7E 20 0B            JMP $200B
2003: 22 42               BHI $2047
2005: 00                  ???
2006: 00                  ???
2007: 84 BC               ANDA #$BC
2009: 00                  ???
200A: 00                  ???
200B: 7F 20 05            CLR $2005
200E: F6 20 03            LDB $2003
2011: FB 20 04            ADDB $2004
2014: 79 20 05            ROL $2005
2017: F7 20 06            STB $2006
201A: 7F 20 09            CLR $2009
201D: F6 20 07            LDB $2007
2020: FB 20 08            ADDB $2008
2023: F7 20 0A            STB $200A
2026: 79 20 09            ROL $2009
2029: 01                  NOP
202A: 01                  NOP
202B: 00                  ???
202C: 00                  ???
202D: 00                  ???
% s 2000

          0 A=00 B=00 X=0000 SP=00FF ------ ENTRY    2000: 7E 20 0B JMP 200B  EA=200B(ADDM8) 
>         1 A=00 B=00 X=0000 SP=00FF ------ ADDM8    200B: 7F 20 05 CLR 2005                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          1 A=00 B=00 X=0000 SP=00FF ------ ADDM8    200B: 7F 20 05 CLR 2005  EA=2005(C1)    
>         2 A=00 B=00 X=0000 SP=00FF ---Z--          200E: F6 20 03 LDB 2003                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          2 A=00 B=00 X=0000 SP=00FF ---Z--          200E: F6 20 03 LDB 2003  EA=2003(NL1) D=22 
>         3 A=00 B=22 X=0000 SP=00FF ------          2011: FB 20 04 ADDB 2004                

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          3 A=00 B=22 X=0000 SP=00FF ------          2011: FB 20 04 ADDB 2004 EA=2004(NR1) D=42 
>         4 A=00 B=64 X=0000 SP=00FF ------          2014: 79 20 05 ROL 2005                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          4 A=00 B=64 X=0000 SP=00FF ------          2014: 79 20 05 ROL 2005  EA=2005(C1)    
>         5 A=00 B=64 X=0000 SP=00FF ---Z--          2017: F7 20 06 STB 2006                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          5 A=00 B=64 X=0000 SP=00FF ---Z--          2017: F7 20 06 STB 2006  EA=2006(R1) D=64 
>         6 A=00 B=64 X=0000 SP=00FF ------          201A: 7F 20 09 CLR 2009                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          6 A=00 B=64 X=0000 SP=00FF ------          201A: 7F 20 09 CLR 2009  EA=2009(C2)    
>         7 A=00 B=64 X=0000 SP=00FF ---Z--          201D: F6 20 07 LDB 2007                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          7 A=00 B=64 X=0000 SP=00FF ---Z--          201D: F6 20 07 LDB 2007  EA=2007(NL2) D=84 
>         8 A=00 B=84 X=0000 SP=00FF --N---          2020: FB 20 08 ADDB 2008                

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          8 A=00 B=84 X=0000 SP=00FF --N---          2020: FB 20 08 ADDB 2008 EA=2008(NR2) D=BC 
>         9 A=00 B=40 X=0000 SP=00FF H---VC          2023: F7 20 0A STB 200A                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

          9 A=00 B=40 X=0000 SP=00FF H---VC          2023: F7 20 0A STB 200A  EA=200A(R2) D=40 
>        10 A=00 B=40 X=0000 SP=00FF H----C          2026: 79 20 09 ROL 2009                 

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% s

         10 A=00 B=40 X=0000 SP=00FF H----C          2026: 79 20 09 ROL 2009  EA=2009(C2)    
>        11 A=00 B=40 X=0000 SP=00FF H-----          2029: 01       NOP                      

6800 Monitor: Ctrl-C to exit, 'c' to continue, or type 'help'
% d 2000 10
2000: 7E 20 0B 22 42 00 64 84  BC 01 40 7F 20 05 F6 20 ~ ."B.d...@. .. 
%

In the first case, we didn't have a carry from the byte addition. Or, rather, the carry was 0, and the ROL instruction recovered the 0 carry.

We see here that the ROL instruction on the 6800, 6801, and 6809 can be used to move (rotate) the carry bit into a register or a byte of memory.

If you want, do it again on EXORsim6801, but it should do essentially the same thing. Since you're only looking at the values of registers while stepping through the code, there is no need to bother with either getting the terminal emulator's attention or with pausing at the end for this code on the 6801. You won't need to add the code to do that. Same code. Same execution.

Another way to capture the carry is to use the ADd with Carry instruction, adding the carry and an immediate zero to the zero in a cleared accumulator like this:


    	CLRA	; 2nd carry bit
	LDAB	NL2	; 2nd augend (left side addend)
	ADDB	NR2	; 2nd addend (right side addend)
	STAB	R2	; 2nd sum, 8 bits
* ST and LD do not alter C, carry still safe.
	ADCA	#0	; 0 in A plus carry bit
	STAA	C2	; 2nd carry bit

I'm going to emphasize the comments in the above code -- LoaDs and STores do not affect the Carry flag on 680X CPUs. We can store the 8-bit result first and capture the carry after if we want.

Just for completeness, the 6801 and the 6809 will allow you to treat the carry byte and the 8-bit sum byte as a single 16-bit integer result, as well:


    	...
NL1	FCB	34	; just an arbitrary small number
NR1	FCB	66	; another arbitrary small number
RES1	RMB 2	; 2-byte result
C1	EQU	RES1	; To look at the carry from the sum.
R1	EQU	RES1+1	; To look at the the eight bit sum.
	...
	CLRA	; clear storage for the carry bit
	LDAB	NL1	; Get the augend (left side addend).
	ADDB	NR1	; Add the addend (right side addend).
	ADCA	#0	; Save the carry bit away.
	STD	RES1	; Save the 9 bit result in 16 bits.

Note the STore Double accumulator at the end, instead of storing each accumulator separately.

I intentionally put the carry byte below the sum byte in memory to allow them to be treated together as a single 16-bit integer. This is the byte order the 6800, 6801, and 6809 use, most significant byte first. (68000, too.)

Using three labels, RES1 for the 16-bit result, and C1 and R1 for the separate halves of the result, helps make it clear which half of the integer you're looking at even if you forget which order they are in.

Now, if you want, you can try the above two variations. (I recommend it.) And when you're done playing with that on the 6800, edit the LDAB and LDAA instructions to LDB and LDA, and the STAB and STAA instructions to STB and STA, and assemble the code on a 6809 instance and step through it there as well. It will execute this code exactly the same except for timing (which you won't notice in the emulator), but it will give you a chance to practice and notice more about what goes on.

(You may be wondering whether it wouldn't be possible to do the math on the 6801 and 6809 using the 16-bit ADD Double and SUB Double accumulator instructions. It can be done, but you need more byte of memory because the ADDD and SUBD instructions want a 16-bit addend as well as the 16-bit augend.)

If you get really ambitious, you might use the (m)emory change command to change the constants that are getting added and step through again with different values, without editing the source code and re-assemlbing.

(Are the op-codes output for the source code on this also all the same on the 6809? You might want to check. 8-*)

It will run the same on all three.

So, what about subtraction?

Let's review how subtraction works when we do it on paper:


     66
-34
---
???

First column, difference is 2, no borrow.


     0     (borrow)
 66
-34
---
__2

Second column, difference is 3, no borrow.


    00     (borrow)
 66
-34
---
_32

And we're done. That wasn't interesting -- at least, not obviously so. Let's look at something that generates a borrow.


     62
-34
---
???

First column, 2 minus 4 is minus 2. So we'll borrow a 1 from the next column over, subtracting 1 from that and adding 10 to this:


      -1       (borrow)
   6 12
  -3  4
  --  -
  __  8

Take the borrow from 6, that's 5.


     0-1     (borrow)
  65 12
  -3  4
  --  -
  _2  8

5 minus 3 is 2, no borrow generated, difference is 28.

What we did was use ten's complement math in the column that needed a borrow. We borrowed from the next column over because a 1 in that column is a 10 in this column, and we remembered to take the borrowed 1 away when we moved to the next column over.

We can do the same thing in hexadecimal (sixteen's complement), but that would be a distraction.

We can also do the same thing in binary. This is called two's complement, and is very useful.

Of the two above differences, 66 minus 34 is the more interesting in binary:


     01000010  (66_ten)
-00100010  (34_ten)
---------
 ????????

Okay, the first (bit) column on the right, 0 minus 0 is 0. Not particularly ineresting:


           0   (borrow) 
 01000010  (66_ten)
-00100010  (34_ten)
---------
 _______0

But things don't get really interesting until the 6th bit:


       00000   (borrow) 
 01000010
-00100010
---------
 ___00000

Can't take 1 from 0, so we borrow from the next column over:


     -1    00000   (borrow) 
 01 10 00010  (10_two == 2_ten)
-00  1 00010
---  - -----
 __  1 00000

1 from 2 is 1. And,


       -1    00000   (borrow) 
 0  1 10 00010  (10_two == 2_ten)
-0  0  1 00010
--  -  - -----
 _  0  1 00000

In the next column, take the borrow from 1 and that's 0, and 0 - 0 is 0. And the next column, 0 - 0 is 0. Which gives us


       -1    00000   (borrow) 
 0  1 10 00010  (10_two == 2_ten)
-0  0  1 00010
--  -  - -----
 0  0  1 00000

00100000_two, which is 32_ten. If you haven't already, you might want to get paper and pencil out and make up some binary problems and work them out by hand, to prove to yourself that this works, and is essentially the same thing you're familiar with in base-ten math, but in base two.

What happens if we take a larger number from a smaller? Let's flip 66 - 34 over and see what we can see:


     34
-66
---
???

Because we've already done this and just flipped it over, we know that the answer is -32. That's the way we usually work by hand when we have a large number to take from a small number. And things have actually been done this way in some computers in the past.

But we're deliberately going to see what happens when the borrow runs off the left edge of the numbers.

You definitely want to be following along with pencil and paper on this one. Leave some room to the left when you write it down:

First column, 4 minus 6 is ...


    -1        (borrow)
 3 14
-6  6
--  -
__  8

Gotta borrow. 14 - 6 is 8.

Second column, (1-) + 3 - 6 is ...


    -1 -1     (borrow)
  312 14
-   6  6
--  -  -
__  6  8

Remember to scratch out the used borrow. Gotta borrow again. 12 - 6 is 6.

But now what? We've got a borrow hanging off the left, and nothing to borrow it from.

Let's invent an infinity of 0s out there. Well, not an infinity, but one or two:


     -1  -1     (borrow)
 00 312 14
-00   6  6
---   -  -
___   6  8

Apply the borrows and ...


    -1  -1  -1  -1     (borrow)
 0 010 010 312 14
-0   0   0   6  6
--   -   -   -  -
__   9   9   6  8

Again, ten's complement.

Subtract 9968 from 10000, and we get 32. And, somehow, we just know that the leftmost 9 in the result really means we have to subtract from an appropriate power of ten and call the result negative.

Yeah, it's a stretch, but we do crazy things like this in computers because it makes the mechanical adders easier to build. (Early digital computers actually worked in decimal, or in decimal coded binary, or in bi-quinary, because it was easier for many of those involved to intuit at the time.)

Two's complement for binary is especially convenient. Try it on the above problem. Fill the result out to eight bits by continually borrowing a 1 from the next left column and you should get

11100000

Or you could fill it out to sixteen bits and get


    1111111111100000

And maybe you can intuit that the 1 on the left could be seen like a minus sign.

But how do we know that these numbers are not 224_ten and 65504_ten?

Unfortunately, unless we remember where the numbers came from, we don't.

In other words, when we are working with integers in computers that use two's complement signed integers, we have to remember (and remind ourselves) whether the particular number we are working with is supposed to be a two's complement signed number or an unsigned number. Yes, it's a bother, but it's necessary.

(... on signed and unsigned integers. -- We're not looking at fixed-point or floating point numbers here, but you have to remember what you're looking at with those, as well.)

Okay, having seen how this works, let's look at some code.

After the subtraction, we could use the ROtate Left instruction to just capture the carry/borrow as we did with addition, but the SuBtract with ~~Borrow~~ Carry instruction can do a little more for us. So we'll use that, instead. (It's the same Carry flag that we used in the addition routines, but it records the borrow after a subtraction. Yes, it works.)

You'll be pleasantly surprised at how subtracting zero and a borrow from zero plays out:


    * Subtract two 8-bit numbers in memory and store them in memory:
*
ENTRY	JMP	SUBM8
*
NL1	FCB	66	; just an arbitrary small number
NR1	FCB	34	; another arbitrary small number
RH1	RMB	1	; To hold one bit of carry from the sum.
RL1	RMB	1	; To hold the eight bit sum.
*
NL2	FCB	132	; Somewhat larger arbitrary number
NR2	FCB	188	; And another
RH2	RMB	1	; carry from 2nd sum
RL2	RMB	1	; 2nd sum
*
NL3	FCB	34	; just an arbitrary small number
NR3	FCB	66	; a larger arbitrary small number
RH3	RMB	1	; To hold one bit of carry from the sum.
RL3	RMB	1	; To hold the eight bit sum.
*
*
SUBM8	CLRA		; clear accumulator A for the carry/sign extension
	LDAB	NL1	; Get the minuend (left side).
	SUBB	NR1	; Subtact the subtrahend (right side).
* Result is safely in B.
	SBCA	#0	; Recover the carry bit, sign extended.
	STAA	RH1	; Save high byte away. 
	STAB	RL1	; Save the difference low byte.
*
	CLRA		; 2nd sign extension
	LDAB	NL2	; 2nd minuend (left side)
	SUBB	NR2	; 2nd subtrahend (right side)
	SBCA	#0	; 2nd carry, sign extended
	STAB	RL2	; Save 2nd difference, low byte.
	STAA	RH2	; Save high byte away.
*
	CLRA		; 3rd carry bit
	LDAB	NL3	; 3rd minuend (left side)
	SUBB	NR3	; 3rd subtrahend (right side)
	STAB	RL3	; Save 3rd difference, low byte.
* ST and LD do not alter C, carry still safe.
	SBCA	#0	; 3rd carry, sign extended.
	STAA	RH3	; Save it away.
	NOP
	NOP

Assemble that and step through it on the 6800, and on the 6801 if you want. Then edit the LD and ST instructions for the 6809 and assemble it and step through it there, as well. Here's the result at the end, (d)umped out:


    % d 2000 10
2000: 7E 20 0F 42 22 00 20 84  BC FF C8 22 42 FF E0 4F ~ .B". ...."B..O

Repeating that line in mark-up-able HTML so I can mark the results in red without depending on your browser to read <span> tags buried in pre-formatted text:

2000: 7E 20 0F 42 22 00 20 84 BC FF C8 22 42 FF E0 4F

We now expect the result for the first subtraction:

42_sixteen - 22_sixteen == 20_sixteen

(That's

66_ten - 34_ten == 32_ten

right?)

The results for the second and third have a borrow from the most significant bit, and by subtracting the borrow instead of rotating it in, we end up with minus one in the stored high byte.

0 - 1 = -1

As we've seen, in signed two's complement arithmetic, a byte (or other size integer) full of 1s is -1. Subtracting 0 and the borrow from 0 does this for us quite nicely!

If you're wondering whether it's possible to use the ADD Double (ADDD) and SUBtract Double (SUBD) instructions on the 6801 and 6809, yes, sort-of. But it requires an extra couple of bytes of memory because both sides, the minuend and the subtrahend, want a 16-bit operand, so we have to expand both operands.

Keep this in mind for the 68000, though.

Let's look at how that works one more time from a bit more of a high-level view:

First, let's do the math in base ten:

132_ten - 188_ten == -56_ten
34_ten - 66_ten == -32_ten

One more than the largest number in 8 bits is 256. If we add those results (adding negative numbers) to 256, we get

256_ten + (-56_ten) == 200_ten => C8_sixteen
256_ten + (-32_ten) == 224_ten => E0_sixteen

And the byte full of 1s is FF_sixteen, which explains the results stored away, $FFC8 and $FFE0.

Okay, break out the hexadecimal calculator again. Enter those numbers and do the math in hexadecimal and decimal. Maybe even binary.

If you're not seeing it, switch to hexadecimal base and enter -1. Then switch to binary. You should have a readout full of 1s somewhere in there.

Now switch to decimal and enter -56, then switch to hexadecimal and then binary and watch the bit patterns. Remember, using the four bits at a time conversion to hexadecimal,

1100 1000_two == C8_sixteen

When you've figured out what's being displayed where, and are satisfied that the bit patterns match what I'm telling you, clear the display, switch to decimal, and enter -32, then, again, switch to hexadecimal and binary and remember that

1110 0000_two == E0_sixteen

We'll get more satisfaction about this once we have enough assembly language to build a calculator that does integer math in various bases, but I hope we've gotten enough satisfaction to move on.

Go ahead and play with it a bit more, and then let's look at this in 68000 assembler.

(Title Page/Index)

Wednesday, September 11, 2024

ALPP 02-XX -- Introduction to Address Math on the 6800, 6801, 6809, and 68000

This attempt at a chapter went sideways. Keeping it for the records.
Go here, instead: https://joels-programming-fun.blogspot.com/2024/09/alpp-02-05-introduction-to-byte-arithmetic-6800-6801-6809.html

Introduction to Address Math
on the 6800, 6801, 6809, and 68000

(Title Page/Index)

In the last chapter, we saw how the 68000 provides similar advanced addressing features to the 6809's.

Now we want to take an introductory look at general address math on all four processors.

Why do you need to do address math?

We've seen one of the use cases, allocating and deallocating space on the stacks. It's also necessary when allocating and deallocating space for variables and other stuff that needs to stick around between calls to various routines.

It's also useful when accessing distinct fields within structured variables and other objects, and when linking such objects together.

The 6800 doesn't have much in the way of address math. INX and DEX are about it. So you have to use the more general integer math, which only comes in byte size on the 6800.

So you have to synthesize the general math, along with some specific useful cases of general math. That's what we'll look at first.

You've noticed by now that there is a carry bit in the 6800's processor state. It's the least significant bit of the state flags, which is convenient when trying to find out its state. But we don't need to know that for combining two 8-bit adds to make a 16-bit add, because we have the ADd Carry instruction that adds the carry bit leftover from the last operation in along with its specified operands.

ADD and ADC are binary operands. Binary operands on 680X and 680XX family CPUs specify one accumulator or register as both a source and a target, and one memory operand as the second source -- thus, in the case of ADD and ADC, add something in memory to a register.

So you can use the Carry bit to chain the addition of bytes together, pretty much the way you use the carry bit in ordinary arithmetic to chain columns together.

(If you want, you can consider the addition to be to numbers in base 256, and think of each byte as a column. If that does anything for you. If not, forget I mentioned it.)

Here's how to add two 16-bit numbers on the 6800 when both are in memory, either in the direct page or in the extended absolute addressing area:


    * Add two 16-bit numbers in memory and store them in memory:
*
ENTRY	JMP	ADDM16
*
N1	FDB	57738	; just an arbitrary number
N2	FDB	17715	; another arbitrary number
RES	RMB	3	; Enough room to hold a carry from the high bytes.
*
ADDM16	LDAB	N1+1	; do the low byte (right columns) first.
	ADDB	N2+1	; Sets the Carry flag if the result is too big to fit.
	STAB	RES+2	; LoaD and STore do not affect the Carry flag.
	LDAA	N1	; Now do the high bytes (left columns).
	ADCA	N2	; ADd in the Carry from the low bytes as well.
	STAA	RES+1	; Save the high byte away.
	LDAB	#0	; CLR would clear the carry.
	ROLB		; Move the carry in
	STAB	RES	; Save the carry in the low bit of the highest byte.
	NOP
	NOP

This is not the way we usually do this, but it is useful for stepping through and watching the columns get added together.

There is no real reason for using A for the high byte. B could be used instead and, other than that, the code would not change. Using B only would have the potential advantage of preserving A.

This code would work as is on the 6801 and 6809, with the proviso that you would use LDA for LDAA and STA for STAA on the 6809. But, of course, it could be done 16 bits at once on both:


    * Add two 16-bit numbers in memory and store them in memory:
*
ENTRY	JMP	ADDM16
*
N1	FDB	57738	; just an arbitrary number
N2	FDB	17715	; another arbitrary number
RES	RMB	3	; Enough room to hold a carry from the high bytes.
*
ADDM16	LDD	N1	; Both bytes at once
	ADDD	N2	; Sets the Carry flag if the result is too big to fit.
	STD	RES+1	; LoaD and STore do not affect the Carry flag.
	LDAB	#0	; CLR would clear the carry.
	ROLB		; Move the carry in
	STAB	RES	; Save the carry in the low bit of the highest byte.
	NOP
	NOP

... using STA instead of STAA at the end for the 6809 code.

As a reader challenge, work the 6809 code out. (Hardly worth calling a challenge, but do it anyway.)

And, especially if you don't feel comfortable that you know what's happening, take the time to play with the code

On the 68000, of course we can do it a byte at a time or 16 bits at once, or ...

Wait a minute.

If we do it a byte at a time or even 16 bits at once, things get a little weird with the 68000. It's intended as a 16-bit/32-bit processor, and it kind of shows in places, particularly when using 24-bit results and other odd sizes.

In the 68000, the Carry function is split up between the X flag, used for eXtending multiple precision additions and subtractions, and the C flag, used for testing and branching. Instead of ADC and SBC, there is ADDX and SUBX, and ADDX and SUBX operate only on registers or in predecrement mode on numeric strings in memory. (Interestingly, the 68000 adds a CMPM instruction, to operate only in postincrement mode on numeric strings in memory.)

And then there is the even address requirements for the 68000. If you store or load by 16 bits or 32, you have to access memory on an even (divisible by 2) boundary.

Sounds wonky?

Yes and no. Let's look at what that means for the 16+1 bit math from above:


    * Add two 16-bit numbers in memory and store them in memory:
* Both of these are messy because we are doing odd addresses.

Fix these***

*
ENTRY	JMP	ADDM16W
*
	EVEN
N1	DC.W	57738	; just an arbitrary number
N2	DC.W	17715	; another arbitrary number
RES	DS.B	3	; Enough room to hold a carry from the high bytes.
*
	EVEN
ADDM16W	CLR.B	D6	; pre-clear the carry byte
	MOVE.W	N1,D7	; One 16 bit word at a time
	ADD.W	N2,D7	; Sets the Carry and eXtend flags on carry.
	ROXL.B	#1,D6	; Save the eXtended carry in X first
	MOVE.B	D6,RES	; and move it into place (rotate on memory is word only).
	MOVE.B	D7,RES+2	; Save result low byte. Does not affect the X flag.
	ASR.W	#8,D7	; bring the high byte down
	MOVE.B	D7,RES+1	; because we can't save a 16-bit word to an odd address
	NOP
	NOP
	CLR.W	RES	; clear the result so we can do it again.
	CLR.B	RES+2
ADDM16	LEA	RES+3,A3	; point one beyond RES, same as MOVE #RES+3,A3
	LEA	N2+2,A2		; point one beyond N2, same as MOVE #N2+2,A2
	MOVE.B	-(A2),-(A3)	; copy low byte
	MOVE.B	-(A2),-(A3)	; copy high byte
	ADDQ.L	#2,A3		; back up on RES (100% sure clears X)
	LEA	N1+2,A2		; Actually, it was already there
	CLR.B	D7		; pre-clear the carry byte
	ADDX.B	-(A2),-(A3)	; add low bytes directly in memory
	ADDX.B	-(A2),-(A3)	; and high bytes directly in memory
	ROXL.B	#1,D7		; Save the eXtended carry in X
	MOVE.B	D7,-(A3)	; and move it into place (rotate on memory is word only).
	NOP
	NOP

So it takes more instructions than we want to think it would take -- because of the odd alignment issues.

(FWIW, certain later members of the 680X0 family relax the alignment require about memory access, but the instruction set does not get dressed-out to fill in certain gaps for byte-level operands. So with those processors, we could improve the number of instructions a little, but not completely to the degree the 6809 allows. I won't look at those CPUs at this point, we have more important things to do.)

But, wait! (There's more ...)

Addresses on a 68000 are not 16 bit.

Well, unless your runtime is only using a 64 K page within the total memory space or something like that. But that's a story for another time.

And, addresses don't use the carry bit. It's modular math. They just wrap around.

(Wraparound is useful when adding negative offsets, but requires some care in pointer comparison.)

Hmm. The 6800 and 6801 code is easy enough to read through and remove the part extracting and saving the carry, but the 68000 code needs another look.


    * Add two 16-bit numbers in memory and store them in memory:
* Much less messy with even addresses.
*
ENTRY	JMP	ADDM16W
*
	EVEN
N1	DC.W	57738	; just an arbitrary number
N2	DC.W	17715	; another arbitrary number
RES	DS.B	2	; No carry, keep it aligned
*
	EVEN
ADDM16W	CLR.B	D6	; pre-clear the carry byte
	MOVE.W	N1,D7	; One 16 bit word at a time
	ADD.W	N2,D7	; Add it.
	MOVE.W	D7,RES	; and store the result non-destructively.
	NOP
	NOP
	CLR.W	RES	; clear the result so we can do it again.
ADDM16B	LEA	RES+2,A3	; point one beyond RES, same as MOVE #RES+3,A3
	LEA	N2+2,A2		; point one beyond N2, same as MOVE #N2+2,A2
	MOVE.W	-(A2),-(A3)	; copy all at once
	ADDQ.L	#2,A3		; back up on RES, probably (100% sure) clears X
	LEA	N1+2,A2		; Actually, it was already there
* All this setup just so we can do this,
* one byte at a time to show how to chain adds on the 68000:
	ADDX.B	-(A2),-(A3)	; add low bytes directly in memory
	ADDX.B	-(A2),-(A3)	; and high bytes directly in memory
* Yes, ADDX can be done .W and .L widths, as well.
	NOP
	NOP

(Title Page/Index)

Sunday, September 29, 2024

On the Beach with Parameters -- 16-bit Arithmeticon the 6800

On the Beach with Parameters -- 16-bit Arithmeticon the 6800, 6801, and 6809

Saturday, September 28, 2024

A Note on Byte Widening and Scratch Registers on the 6800, 6801, and 6809

Sunday, September 22, 2024

Introduction to Byte Arithmeticon the 68000

Friday, September 20, 2024

Introduction to Byte Arithmeticon the 6800, 6801, and 6809

Wednesday, September 11, 2024

Introduction to Address Mathon the 6800, 6801, 6809, and 68000

On the Beach with Parameters --
16-bit Arithmetic
on the 6800

On the Beach with Parameters --
16-bit Arithmetic
on the 6800, 6801, and 6809

A Note on Byte Widening and Scratch Registers
on the 6800, 6801, and 6809

Introduction to Byte Arithmetic
on the 68000

Introduction to Byte Arithmetic
on the 6800, 6801, and 6809

Introduction to Address Math
on the 6800, 6801, 6809, and 68000