joel's programming fun: Software Differences between the 6800, the 6801/6803, and the 6809

Following on from my previous post about the software differences between the 6800 and the 6801/6803, I'm going to try to set out the software differences between those and the 6809. You may want to read that post first for background.

In terms of history, the 6801 came after the 6809, not, as some seem to expect, before.

Certain of Motorola's customers looked at the 6809 and said they wouldn't know what to do with all of that, and couldn't they have a chip that was just the 6800 with some ROM and RAM built-in instead?

Motorola already had the 6802/6808 out, which combined the 6800 and 128 bytes of RAM. The 6802 also had most of the CPU clock generation circuitry built in. The 6808 (not directly related to the 68HC08 and later derivatives of the 6805) was an odd chip, basically the 6802 with the RAM disabled (presumably because the RAM failed Q/A tests).

The 6802 was intended to be paired with custom ROM versions of the 6846, which was similar to MOS Technologies' RIOT chip, but definitely not the same. The 6846 had 2K of ROM, a parallel port, and a hardware timer device. (I had a Micro Chroma 68, which had the 6808 or 6802 paired with a 6846 with a monitor program in the ROM -- Fun little board for prototyping, and I should have done more than I did with it.)

As I said in the post linked above, Motorola borrowed a few ideas from the 6809, but designed the 6801 to be strictly compatible with the 6800 -- except for the new op-codes and the condition codes for the CPX compare X instruction. And they improved the instruction timing. But they failed to put in the missing direct-page versions of the unary operators like increment, decrement, and such.

And one of Motorola's big customers said the 6801 was still too much CPU, couldn't they strip it down more and make it cheaper?

And Motorola complied, producing the 6805, which is a true 8-bit CPU with only one accumulator and only an 8 bit wide index. But it has bit manipulation instructions for the direct page, and it has direct-page versions of all instructions. For unary instructions, it has 16-bit offset indexed versions of the instructions. On the 6805, instead of index+constant-offset, you (often) wanted to think constant-base+index. (Oh. And the initial 6805s had no push or pop, just branch and jump to subroutine. You had to do software stacks if you wanted them.) This was all in fewer transistors than the original 6800.

Motorola would evolve the 6805 in HCMOS versions, with more instructions and such. Some years later, Motorola brought out the 68HC08, which added a high byte of index to the 68HC05. (And then there was the 68HCS08 with push and pop, and the 'R08 and ....)

And later still, Motorola brought out the 68HC11, which added a Y index, bit manipulation instructions, and a little more to the 68HC01. The 68HC11 was strictly upward-compatible with the 6801. And we are roughly a decade ahead of the story now. But it's relevant.

As I say, the 6809 preceded the 6801 and 6805. The latter two benefited from lessons learned in the 6809, which the 6809 was never evolved to benefit from. (Nor was the 68000, unfortunately, until the CPU32, much later.)

Remember, the 6801 is almost (99.9% or so) perfectly object-code compatible with the 6800, but with faster instruction timings on a lot of instructions.

The 6805 is not object-code compatible, but inherits most of the overall instruction set and run-time model of the 6800 series. It also has better instruction times than the 6800, more effective use of memory space, and more flexible indexing.

The 6809 is not object-code compatible. Instructions are laid out a little different, and index encoding is significantly different. It inherits most of the instruction set of the 6800, and it extends the register and run-time model of the 6800 series, but it does not have all the improved instruction timings. The instructions it is missing can be synthesized, but care has to be exercised in the process in many cases.

And it departs from the 6800 in an important, non-trivial way --

Pushes on the 6800 and 6801 are post-decrement. Pops (PULs) are pre-increment. The stack pointer S is always pointing to the next available byte on the stack. That's a bit of a departure from most stack implementations, but the 6800 makes up for it in the TSX (transfer S to X) instruction by adding 1 before the transfer. And it makes up for it in the TXS instruction (transfer X to S) by subtracting 1 before the transfer. So, after a TXS, X is pointing to the last item pushed.

But if you store S in RAM and load it into X with the 6800 or 6801, this adjustment does not happen. You have to remember to use INX (increment X) after the LDX (load X), or adjust the offset to X. In other words, on the 6800,


      PSHA
    TSX
    ORAA   0,X

but


      PSHA
    STS	   TEMP
    LDX	   TEMP
    ORAA   1,X	; adjusting the offset instead of INXing.

On the 6809, pushes are pre-decrement, and pops (PULs) are post-increment. So the stack register is always pointing to the last item pushed, without adjustment. If you play games with the stack in 6800 or 6801 code, you have to be careful with this when moving the code to the 6809. (But the games usually played with the stack on the 6800 or 6801 are pretty much beside-the-point on the 6809. There are much better ways. More below.)

Using the above example, on the 6809,


      PSHS   A
    TFR	   S,X
    ORA    ,X

and


      PSHS   A
    STS	   TEMP
    LDX    TEMP
    ORA	   ,X	; no adjusting the offset or index.

or, better yet,


      PSHS   A
    ORA	   ,S+  ; but this is getting a little ahead of the story.

That's a long preamble, but I hope it will help you know what to look for as I compare the 6809 with the other two processors.

In broad overview, the differences in the 6809 are as follows, kind of in the order they tend to stand out:

Pairing A:B as double accumulator D, as in the 6801 (but not including 16-bit shifts).
Five indexable registers (X, Y, U, S, and PC) in the 6809, two of which (U and S) can be stack registers, as opposed to the 6800/6801 single X index register and single S stack register.
Movable direct page, with associated DP (upper 8 bits) direct page register. (This was not implemented as well as we might have hoped, more below.)
Extended interrupt model. (More registers means more time required to save them all, so an interrupt service that used only one or two registers could be assigned to the FIRQ fast interrupt request, and save time by only saving what it used.)
Two extra software interrupts (which allows the one-byte SWI to be used as a software breakpoint and the two-byte SWI2 and SWI3 to be used as system calls, if you want to do system calls with SWIs).
Stack model that always points directly at the top of stack, as mentioned above.
Compactly encoded extended indexing model:
- no offset,
- small (5 bit) signed constant offset,
- medium (8 bit) signed constant offset,
- long (16 bit) signed constant offset,
- A, B or D accumulator signed offset,
- post-increment and pre-decrement indexing (for efficient data moving -- goodbye stack blasting!),
- using the PC as an index register (as noted above),
- final memory indirect with any of the above,
- plus memory indirect with extended mode (but not, unfortunately, with direct page mode),
- and, for some reason, no way to use a load effective address (LEA) instruction to directly calculate the address of a variable in the direct page (further explained below).
Unary instructions all have direct-page mode, but are slower in direct-page mode than you might hope.
Effective address calculation instructions (LEA) to put the target addresses calculated in each of the indexed modes above into one of X, Y, U, or S. (JMP instruction allows indexed modes, so PC as a target is not needed.)
Note that LEAS and LEAU do not affect the condition codes, where LEAX and LEAY affect the Z bit.
No INX/DEX or INS/DES. Use LEAX 1,X/LEAX -1,X and LEA 1,S/LEA -1,S instead (in other words, for single increment/decrement. See below).
Full compare instructions for the four proper index registers, compatible with the 6801 CPX but not the 6800 CPX.
Multiple register push and pop for both S and U stacks.
- (And, hello, again, stack blasting. Hah.
  Be very careful if you do. Even with interrupts masked, this does not do the trick:
  PULU A,B,X,Y PSHS A,B,X,Y ; look at what happens when you repeat! Probably the best you should want to do is
  PULU A,B,X STD ,Y++ STX ,Y++ unless you like wee-hours-of-the-morning debugging sessions.)
Register-to-register transfer (TFR) instruction with source and register encoded in post-byte replaces the Txx transfer instructions in the 6800.
Register-with-register exchange (EXG) instruction that uses the same post-bytes as TFR.
Has ABX as in the 6801 (but not 6800), not affecting the condition codes.
LEAX B,X would be almost equivalent, except ABX treats B as unsigned and sets no flags by the result, where LEAX B,X sets the Z flag.
No ABA (add B to A) or SBA (subtract B from A). Instead, you can do it something like this: PSHS B ADDA ,S+ ; for ABA which is slower, but essentially equivalent if your S stack pointer is pointing to valid memory.
Instead of SEC, CLC, etc., for working with the condition codes, the 6809 has
- ORCC #FLAGBITS ; for setting flags
- ANDCC #~FLAGBITS ; for clearing flags
Neither of the 16-bit shifts that the 6801 has.
For LSRD use
LSRA RORB For LSLD use
LSLB ROLA
SYNC instruction can be used for fast synchronization to external input or to wait for an interrupt (slightly different from 6800/6801 WAI).
6809 has long branches, which aid in writing position-independent code. Conveniently, LBRA (long branch always) and LBSR are allocated single-byte op-codes to help motivate their use.
And the 6809 has BRN/LBRN as in the 6801.

I think that pretty much covers it, except the devil that is in the details.

In other words, you can map most single 6800 and 6801 instructions to single 6809 instructions. Those that don't map to single 6809 instructions either map to pairs of instructions or, especially in the case of the interrupt instructions, have to be fixed to account for the larger register set being pushed by the interrupt and for some technical differences implied by hardware in whatever design you're using.

But if you do that kind of unintelligent transliteration, you'll want to do at least a second pass. For instance,


      INX
    INX
    INX
    INX

which maps mechanically to,

    LEAX   1,X
    LEAX   1,X
    LEAX   1,X
    LEAX   1,X

really wants to be replaced with the single instruction


      LEAX    4,X

And


      LDAA    0,X
    LDAB    1,X
    INX
    INX

which maps mechanically to


      LDA    0,X
    LDB    1,X
    LEAX   1,X
    LEAX   1,X 
  really wants to be replaced with the single instruction


      LDD     ,X++

Unfortunately, the above kinds of optimizations tend to be hidden by the ways programmers tend to try to optimize 6800 code.

I think I've covered stack games pretty well. If you understand how to build a stack frame on the 6801 and understand what I've mentioned about the stacks and the TFR instruction above, you should probably be able to extrapolate how to do stack frames on the 6809.

Oh, parameter handling is a separate topic, but I guess I should at least give it a mention.

On the 6800 or 6801, if you are passing parameters on the call stack (which I don't prefer, myself), your calling routine on the 6800 might do something like this:


      PSHA    ; parameter in A
    JSR     SOMEFUN
    INS
    ...

and your called routine might do something like this:


  SOMEFUN
    ...
    TSX
    LDAB    2,X   ; skip over return address to parameter
    ...
    RTS

On the 6809 it would like like this:


      PSHS    A    ; parameter in A
    LBSR    SOMEFUN
    LEAS    1,S
    ...
    
    ...
SOMEFUN
    ...
    LDAB    2,S   ; skip over return address to parameter
    ...
    RTS

And since I've mentioned that I prefer splitting the stacks, on the 6800, a separate software parameter stack would look something like


  PMSTKP      ; preferably in the direct page
    RMB     2
    ...
PUSHA       ; Looks like a lot of bother, I know.
    LDX     PMSTKP
    DEX
    STX     PMSTKP
    STAA    0,X
    RTS
* (We need this, too:)
INCPS1      ; Looks like more bother, yes.
    LDX    PMSTKP
    INX
    STX    PMSTKP
    RTS
    ...

    ...
*** Woops! not this:    JMP     PUSHA   ; parameter in A
    JSR    PUSHA   ; parameter in A (this)
    JSR    SOMEFUN
*** Facepalm! not this:    INS
    JSR    INCPS1  ; increment parameter stack (this)
    ...
    
    ...
SOMEFUN
    ...
    LDX     PMSTKP
    LDAB    0,X   ; no return address to worry about
    ...
    RTS

On the 6809 the separate software stack pointer is in U:


      PSHU    A    ; parameter in A
    LBSR    SOMEFUN
    LEAU    1,S
    ...
    
    ...
SOMEFUN
    ...
    LDAB    ,U   ; no return address to worry about
    ...
    RTS

Which I think is very clean.

I need to talk a little about the direct page register.

The DP register allows a potential trap. If interrupt handler routines use variables in the direct page, they should load and set their own DP, or they should use the full extended address.

For example, if you have a timer variable at $00E2 that is incremented in the IRQ service routine, you might do this:


      SETDP   0
    ...
IRQ 
    ...
    INC   $00E2
    ...
    RTI

and the assembler would assemble that as a direct page reference.

If the user code that gets interrupted has DP set to $02, the address that will get incremented at interrupt time will be $02E2, not $00E2, which is not what you want at all.

So what you should do is


      ...
  SETDP   0
IRQ 
    LDA   #0
    TFR   A,DP
    ...
    INC   $00E2  ; now the address is right
    ...
    RTI   ; restores the interrupted DP

Or you might do


      ...
IRQ 
    LDA   #0
    TFR   A,DP
    ...
    INC   >$00E2  ; force extended mode
    ...
    RTI

(I think I'm not getting that backwards, > to force extended and < to force direct page.) If your assembler doesn't allow both of the above, it is not a decent 6809 assembler. Get another one.

~~Here's how to~~

I think I need to show how to calculate the effective address of a direct-page variable:


  DPBASE	EQU n*$100
    ORG   DPBASE
    ...
DPVAR RMB 1
    ...
    ORG   CODE
    SETDP   n
    LDA   #n
    TFR   A,DP
    ...
    LDB   #DPVAR-DPBASE
    TFR   DP,A
    TFR   D,X   ; X now contains the address of DPVAR
    ...

And I need to mention the PC relative addressing mode, even though you won't be considering it when moving 6800 or 6801 code to the 6809. This is just so you can get a real idea of why the 6809 is so interesting when it doesn't automatically run 6800 code faster.

Let's look at a simple 7 constant table buried in code that needs to be position independent:


  *
MFCONST:
    FCB   CONST0
    FCB   CONST1
    FCB   CONST2
    FCB   CONST3
    FCB   CONST4
    FCB   CONST5
    FCB   CONST6
*
MOREFUN
    ... 
    CMPB  #7
    BHS   MFERROR
*** ERK! Not this:    LDX   MFCONST,PCR  ; the assembler should calculate the offset for you.
    LEAX  MFCONST,PCR  ; the assembler should calculate the offset for you. (this)
    LDA   B,X
    ...

Without the PCR addressing, you'd need a loader to patch the address of MFCONST used in MOREFUN to be able to run MOREFUN at arbitrary addresses. With the PCR addressing, MOREFUN and the constant table can be in ROM.

[JMR202210011737: add]

As an example of source code that is being kept very parallel, I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

You can peruse the source tree and compare files:

https://osdn.net/users/reiisi/pf/nsvtl/scm/tree/master/

[JMR202210011737: add end]

joel's programming fun

Monday, September 19, 2022

Software Differences between the 6800, the 6801/6803, and the 6809

No comments:

Post a Comment