Saturday, July 27, 2024

ALPP 01-09 -- Sequentially Accessing a Simple List on the 6800, 6801, 6809, and 68000

Sequentially Accessing a
Simple List on the 6800, 6801, 6809, and 68000

(Title Page/Index)

 

Now that we've learned to treat that list of small integers as something like a list, let's access that list sequentially. Why? Because sometimes you want to work through a list in order.

We've done most of this before, so it should go quickly. (Right?)

Again, we are going to re-use source from the exercise we just finished, changing it slightly:

ENTRY	JMP	START
*
BYTTBL	FCB	8
	FCB	5
	FCB	2
	FCB	7
	FCB	4
*
START	LDX	#BYTTBL
	LDAB	0,X
	INX
	ADDB	0,X
	INX
	ADDB	0,X
	INX
	ADDB	0,X
	INX
	ADDB	0,X
	NOP
DONE	NOP

Don't you think that's a minor change? 

Heh. Well, I I have a reason for thinking so. 

Can you guess what it's going to do? Can you guess what INX means?

INX is a mnemonic for INcrement X. Now you know. And now you may be guessing what it does. 

Assemble and run it according to the patterns we've been using, referring to the previous chapters if you can't remember details:

  • Get a session of the 6800 version of EXORsim running with the --mon switch;
  • start an (a)ssembly at $1000, or maybe $2000 this time;
  • copy this source and paste it in to assemble it;
  • (d)isassemble it from appropriate addresses to see what assembled;
  • set a breakpoint at the landing pad NOP (the DONE label);
  • turn (t)racing on;
  • use the (c)ontinue command to run it from the ENTRY point. Or you can (s)tep through each instruction, instead, if you want.

Confirm that the summation occurs correctly as before, and watch the value of X as the CPU works through the code.

Now, you may notice, that inserting the INX instructions uses more code and more processor cycles. But it maintains a pointer in X to the currently interesting item in the list, which is what we are trying to do this time. Because we can. And because there will be times when that's what we want to do, and we will need to know how.

This is called the post-increment mode of addressing, since we INcrement X after each access. 

Each access? Well, we don't have to increment after the last access, unless our algorithm requires the pointer treatment to be uniform. Right now, we just want to look at the difference, more than use these in some explicit algorithm.

Why is it called post-increment access mode instead of post-access increment mode? I'm not sure. Post-access incrementing is what we are doing. But mathematicians can use strange grammar sometimes. (And strange grammars, as well, but that's a whole 'nuther story.)

A partial explanation can be found in the terms "post-inc" and "pre-dec", which are common jargon in discussions of low-level computer programming. In both cases, the redundant word "access" has been deleted:

  • post-access increment => post-access inc => post-inc
  • pre-access decrement => pre-access dec => pre-dec

And, yes, the technical literature then re-uses the coinage, including in the less-used pre-inc and post-dec modes that we will visit later on.

We can deal with this, I think?

The 6801 is again the same for this, both the source code and the object. It will be a bit quicker -- slightly reduced cycle counts for the 6801 on both the indexed mode operators and the INX instruction. It's a bit of a rinse-and-repeat, but go ahead and check the simulation on the 6801, to keep the processes fresh in your head.

INX on the 6809

You might think it surprising and counter-intuitive, but the 6809 has no INX instruction.

WHAT?!? No INX?!?!?!? 

So, of course, it has something better. So to speak.

Oh, it really is better, but if you start with a familiarity with the incrementing instructions, it might be easy to miss. 

You've seen the LEA instruction already, in relation to position independent addressing. Now see what you think of this:

	LEAX	1,X

Load into index register X the address 1 past the current address in X.

(If you define this as a macro called INX, then you have an INX instruction for the 6809, but I do not want to talk about macros yet. So just forget I mentioned it, okay? ;-P)

Why? you ask, would a person want to invoke all the complexities of LEA just to increment X?

Why not? That's basically what the address calculation unit is for, isn't it? -- calculating addresses?

Here's a version for the 6809:

ENTRY	JMP	START
*
BYTTBL	FCB	8
	FCB	5
	FCB	2
	FCB	7
	FCB	4
*
START	LEAX	BYTTBL,PCR
	LDB	0,X
	LEAX	1,X	; INX equivalent
	ADDB	0,X
	LEAX	1,X
	ADDB	0,X
	LEAX	1,X
	ADDB	0,X
	LEAX	1,X
	ADDB	0,X
	NOP
DONE	NOP

Give it a try.

Admittedly, LEAX 1,X on the 6809 is a 2-byte instruction, where INX on the 6800 and 6801 is a 1-byte instruction. And LEAX 1,X takes 5 clock cycles, where INX only takes 4 on the 6800 and 3 one the 6801.

So, ... Why?

Adding fancy instructions does take room in the op-code map. The index mode post-byte that the 6809 indexing modes requires takes another cycle to process, too. And Motorola developed the 6801 after the 6809, so it benefits from some improvements in circuitry design and layout that Motorola picked up in-between.

It's a definite question why Motorola never improved the 6809, but what I heard from probably reliable sources was that Motorola didn't want the 6809 eating into the 68000's market. 

And it would have. Initially. 

At similar memory cycle timing, for primarily 8-bit stuff, the 6809 can be faster than the 68000 -- if there is very little use of division and only light use of multiplication of integers larger than 8 bits. And re-using 6809 source code on the 68000 can be kind of tricky, if you aren't using a discipline like I will be demonstrating here once we get the basics down.So upgrading from the 6809 to the 68000 is not a given.

Well, so then Motorola would have been stuck with customers wanting upgraded 6809s, like the customers that asked for upgraded 6800s that didn't require conversion like the 6809 did, and management was worried about losing focus, if I understand correctly.

It's a legitimate worry when you don't have a lot of talented engineers in the pool available to hire from. But ... what about the 6805 with its weird 8-bit index register? And that even weirder bit-serial 6804 that you probably will never hear about? Why then did Motorola do those?

Incidentally, the series that started with the 6805, from my inexpert survey, was probably as profitable for Motorola as any other series, and was longer lived than most, including the 68HC11. Both are still hiding in a lot of the electronics you use every day. I want to do chapters for both of them at some point, but I need to find decent open source/libre emulators and support software. Or build my own.

(And there is a non-Motorola side-path I'm deliberately ignoring here. Maybe sometime later.)

Oh, there I go, getting distracted again.

So, yeah, LEAX 1,X does not seem to be a win. 

But LEAX 2,X instead of two INX instructions? Two bytes, either way. One instruction and just 5 cycles on the 6809, vs. two instructions and 6 cycles total for the 6801, 8 cycles total for the 6800. That's a win. And the 6809 has more magic in that index post-byte, to cover for the more common not-a-win case of incrementing by 1:

ENTRY	JMP	START
*
BYTTBL	FCB	8
	FCB	5
	FCB	2
	FCB	7
	FCB	4
*
START	LEAX	BYTTBL,PCR
	LDB	,X+
	ADDB	,X+	; INX implied
	ADDB	,X+
	ADDB	,X+
	ADDB	,X	; Following the 6800 code, no final increment
	NOP
DONE	NOP

Of course you're going to give this one a try and make sure it works as advertised.

Is it a win? 

In this source code, you take care of the post-access increment in the same instruction as the load or add. The assembler source makes it clear that the access is post-inc, whereas INX instructions are not always for post-inc accesses. Win for clarity, also saves a few cycles on each access, vs. the ADDB 0,X ; INX pair.

But converting 6800 source code to the 6809 becomes, through this kind of magic, shall we say, not straightforward. It will require allocating time for an engineer to work through the source, possibly adding bugs that will then need to be fixed. And much of the existing 6800 source code was definitely put together without any real discipline, and is thus hard to understand, and ...

In my way of thinking, that added engineering time could be considered an investment. More than one engineer will end up understanding the code. And it will be an opportunity preemptively attacking undiscovered bugs in the original code. 

But even today (especially today?), management (and accounting and boards of directors) do not seem to understand that kind of investment in intangible capital.

If you're interested in comparing the op-codes and cycle counts of the processors, do a web search for the programming manuals. Look for something like

Motorola 6801 programming manual

You should find the PDFs on Bitsavers and Internet Archives, and in other usual places. The 6809 even has an HTML manual on-line (Thank you, Maddes and the docs team.) 

No, actually, even if you don't think you are all that interested in details like byte and cycle counts, go get the manuals anyway. You need them. Don't forget the 68000 manual.

INX on the 68000

Let's go back and get the 68000 source from the most recent example:

	EVEN	
ENTRY	JMP	START
*
BYTTBL	DC.B	8	; byte data doesn't have to be aligned.
	DC.B	5
	DC.B	2
	DC.B	7
	DC.B	4
*
	EVEN		; But 68K code does have to be even aligned.
START	MOVE.L	#BYTTBL,A0
	MOVE.B	0(A0),D1
	ADD.B	1(A0),D1
	ADD.B	2(A0),D1
	ADD.B	3(A0),D1
	ADD.B	4(A0),D1
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

Let's try that with a 68000 equivalent of INX:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
	EVEN	
ENTRY	JMP	START
*
BYTTBL	DC.B	8	; byte data doesn't have to be aligned.
	DC.B	5
	DC.B	2
	DC.B	7
	DC.B	4
*
	EVEN		; But 68K code does have to be even aligned.
START	LEA	BYTTBL(PC),A0
	MOVE.B	(A0),D1
	ADDQ.L	#1,A0	; INX equivalent
	ADD.B	(A0),D1
	ADDQ.L	#1,A0
	ADD.B	(A0),D1
	ADDQ.L	#1,A0
	ADD.B	(A0),D1
	ADDQ.L	#1,A0
	ADD.B	(A0),D1	; No trailing INX.
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

So, do you see that? The 68000 actually has a direct equivalent of the INX instruction.

What is INX, really? It's an instruction to add 1 to the index register in the 6800/6801. If we had an ADDX instruction for the 6809, to ADD some source to the index register, INX would be the approximate equivalent of

	ADDX	#1	; => INX (theoretical)

We don't have an ADDX on the 6809, but we do have something like it on the 68000. (We actually have an ADDX on the 68000, but that's something rather different, which we will discuss later.) 

On the 68000, any address register could be our index register (equivalent of the 6809/68006801 X). Since we use A0 in the previous chapter, we'll use it now. (It agrees with stack order, by the way.) Here's an approximate equivalent of the 6800/1/9 INX instruction:

	ADD.L	#1,A0	; INX equivalent

ADD.L immediate, long add. Why long? Because addresses in the 68000 are long, unless you specifically know they are not. And we don't know that. But that means we have to waste bytes in the code for a 32-bit version of the small integer 1. In hexadecimal, the combined op-code and immediate argument would be 48 bits:

$D1FC
$0000
$0001

In binary,

1101000111111100
0000000000000000
0000000000000001

For the curious, the various fields of the instruction are

1101: add address
000: A0 is destination
111: size long (32 bit) operation
111 100: immediate source
0000000000000000: high 16 bits
0000000000000001: low 16 bits

That's a lot of zero bits, just to increment by 1. 

So Motorola defined the ADDQ instruction that I use above. ADDQ increments by any immediate value from 1 to 8, with the value specified internal to the single 16-bit instruction. In hexadecimal, the complete instruction is

$5288

In binary,

0101001010001000

That's 16 bits instead of 48. Again, for the curious, the various fields are

0101: add quick
001: data (number to add, 0 means 8)
0: addq (1 is subq)
10: size long (32 bit) operation
001: target is address register
000: A0

SUBQ is the corollary decrement.  And, by the way, either can target any register. (Memory, too.)

Refer back to the last chapter and 

  • Copy the source to a text file in the host OS;
  • Save it in your working directory with a name you'll maybe recognize, 8 characters or less;
  • Open a terminal command-line shell and change to your working directory;
  • Assemble it with vasm, don't forget the
    -Ftos TOS output and
    -no-opt optimization switches;
  • Start a Hatari session
  • In Hatari:
    • Hit Ctrl-C to drop out of GEM into the EmuTOS CPM shell;
    • Change to the working directory in EmuTOS;
    • Hit Alt-PAUSE to invoke the debugger;
    • Move with the mouse to the debugger in the command-line terminal and
      • Set a breakpoint (b) on TEXT segment entry;
      • Continue (c) execution in the EmuTOS shell;
    • Run the executable you created with vasm;
    • Use the mouse to go back to the debugger;
      • Disassemble the code to make sure it's all where it should be;
      • Find the address of the START label and disassemble from there, too;
      • Show the registers (r); and
      • Step (s) through the code, watching the index and sum change;
    • Continue (c) back to EmuTOS, or quit (q) back to the host OS.

Convinced?

We could also use LEA instead of ADDQ, like we did on the 6809. The instruction would look like

	LEA	1(A0),A0

It takes an extra word (two bytes) of code, as compared with ADDQ, and it takes more cycles to complete, but it can be done. 

We are more interested in the post-inc mode:

	ADD.B	(A0)+,D1

Let's do that now:

	OPT LIST,SYMTAB	; Options we want for the stand-alone assembler.
	MACHINE MC68000	; because there are a lot the assembler can do.
	OPT DEBUG	; We want labels for debugging.
	OUTPUT
***********************************************************************
	EVEN	
ENTRY	JMP	START
*
BYTTBL	DC.B	8	; byte data doesn't have to be aligned.
	DC.B	5
	DC.B	2
	DC.B	7
	DC.B	4
*
	EVEN		; But 68K code does have to be even aligned.
START	LEA	BYTTBL(PC),A0
	MOVE.B	(A0)+,D1	; post-access inc implied
	ADD.B	(A0)+,D1
	ADD.B	(A0)+,D1
	ADD.B	(A0)+,D1
	ADD.B	(A0),D1	; Leaving out the trailing post-inc.
	NOP
DONE	NOP
* One way to return to the OS or other calling program
	clr.w	-(sp)	; there should be enough room on the caller's stack
	trap	#1		;	quick exit

By the way, do you see that push instruction before the trap back to EmuTOS? 

:)

I'll explain later, maybe in the next example.

For now, go ahead and copy this source into a text file, assemble it, and run it in the debugger, and then we will be done with this list of small constants -- for a while, at least.

Once you have that tested, and are satisfied that it works as advertised, you might want to clean up your working directory a bit. Make a subdirectory called step1 or basicadr or s1_adr or something, and move all the 68000 code files we have made to it.

Next, let's see if we can put some of this together to put out a message, and hopefully this all will begin to make some sense.


(Title Page/Index)

 

No comments:

Post a Comment