Sunday, December 20, 2020

64-bit Addition on four retro CPUs -- 6800, 6801, 6809, and 68000

The key to doing 64-bit addition and subtraction on a CPU without a native 64-bit add is finding and using some means of catching and using the carry/borrow from the less significant parts. Once you have that figured out, you're home free.

Well, except for the problem of where you get the numbers from and where you store the result. 

We can relieve ourselves of a lot of copying and other maintenance by using a stack for the parameters.

* Loop mode 64-bit add for 6800
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
*   left side parameter at DSP[8]
*   right side at DSP[0]
*
* Exit:
*   result at DSP[0]
*
add64bit
  LDX    DSP
  LDAA    #8
  CLC
addloop
  LDAB    15,X
  ADCB    7,X
  STAB    15,X
  DEX
  DECA
  BNE    addloop
  LDAB    DSP+1    ; deallocate right
  LDAA    DSP
  ADDB    #8
  ADCA    #0
  STAB    DSP+1
  STAA    DSP
  RTS

You can flatten, or unroll the loop, but it consumes a total of 48 bytes to do so:

* Unrolled 64-bit add for 6800
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
*   left side parameter at DSP[8]
*   right side at DSP[0]
*
* Exit:
*   result at DSP[0]
*
add64bitflat
  LDX    DSP
  LDAB    15,X
  ADDB    7,X    ; First ADD makes valid carry.
  STAB    15,X
  LDAB    14,X    ; Leave A untouched.
  ADCB    6,X    ; Use carry from here out.
  STAB    14,X
  LDAB    13,X
  ADCB    5,X
  STAB    13,X
  LDAB    12,X
  ADCB    4,X
  STAB    12,X
  LDAB    11,X
  ADCB    3,X
  STAB    11,X
  LDAB    10,X
  ADCB    2,X
  STAB    10,X
  LDAB    9,X
  ADCB    1,X
  STAB    9,X
  LDAB    8,X
  ADCB    0,X
  STAB    8,X
  LDAB    DSP+1    ; deallocate right
  ADDB    #8    ; Leave A untouched here, too.
  STAB    DSP+1
  LDAB    DSP
  ADCB    #0
  STAB    DSP
  RTS

The 6801 gives us the D double accumulator to work with, but using it requires some analysis. We'll start by simplifying the deallocation code with the ABX instruction:

* Loop mode 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
*   left side parameter at DSP[8]
*   right side at DSP[0]
*
* Exit:
*   result at DSP[0]
*
add64bit
  LDX    DSP
  LDAA    #8
  CLC
addloop
  LDAB    15,X
  ADCB    7,X
  STAB    15,X
  DEX
  DECA
  BNE    addloop
  LDAB    #16    ; deallocate right
  ABX
  STX    DSP
  RTS

Now we'll show a version that makes some use of the D accumulator:

* Partially unrolled loop mode 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
* Uses counter variable (also probably in direct page).
*
* Enter:
*   left side parameter at DSP[8]
*   right side at DSP[0]
*
* Exit:
*   result at DSP[0]
*
add64bithalfunrolled
  LDX    DSP
  LDAA    #3
  STAA  COUNTER
  LDD    14,X
  ADDD    6,X
  STD    14,X
addloophalfunrolled
  LDD    12,X
  ADCB    5,X
  ADCA    4,X
  STD    12,X
  DEX
  DEX
  DEC    COUNTER
  BNE    addloophalfunrolled
  LDAB    #14    ; adjust & deallocate right
  ABX
  STX    DSP
  RTS

Comparing the above with a fully unrolled add, half unrolled doesn't really buy as much as it seems:

* Fully unrolled 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
*   left side parameter at DSP[8]
*   right side at DSP[0]
*
* Exit:
*   result at DSP[0]
*
add64bitunrolled
  LDX    DSP
  LDD    14,X
  ADDD    6,X
  STD    14,X
  LDD    12,X
  ADCB    5,X
  ADCA    4,X
  STD    12,X
  LDD    10,X
  ADCB    3,X
  ADCA    2,X
  STD    10,X
  LDD    8,X
  ADCB    5,X
  ADCA    4,X
  STD    8,X
  LDAB    #8    ; adjust & deallocate right
  ABX
  STX    DSP
  RTS

The 6809 version shows some interesting advantages in the setup and allocation, and gives an opportunity for creativity with the accumulator indexing:

* Loop mode 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
*   left side parameter at 8,U
*   right side at ,U
*
* Exit:
*   result at ,U
*
add64bit
  LDA    #8
  ANDCC #$FE    ; clear carry
addloop
  LEAX    8,U    ; Point to left operand.
  LDA    #7    ; count/offset
  ANDCC #$FE    ; clear carry
addloop
  LDB    A,X
  ADCB    A,U
  STB    A,X
  DECA
  BHS    addloop
  LEAU ,X    ; deallocate right
  RTS

Half-unrolled on the 6809 doesn't really make a lot of sense either:

* Half unrolled loop mode 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
*   left side parameter at 8,U
*   right side at ,U
*
* Exit:
*   result at ,U
*
add64bithalfunrolled
  LDA    #3
  STA    ,-U
  LEAX    ,U
  LDD    14,U
  ADDD    6,U
  STD    14,U
addloophalfunrolled
  LDD    12,U
  ADCB    5,U
  ADCA    4,U
  STD    12,U
  LEAU -2,U    ; artifact false allocation
  DEC    ,U    ; slow variable on 6809!
  BNE    addloophalfunrolled
  LEAU    14,U    ; adjust stack pointer and deallocate
  RTS

Compare that with the fully unrolled (which the 6801 gets fairly close to, as well):

* Fully unrolled 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
*   left side parameter at 8,U
*   right side at ,U
*
* Exit:
*   result at ,U
*
add64bitunrolled
  LDD    14,U
  ADDD    6,U
  STD    14,U
  LDD    12,U
  ADCB    5,U
  ADCA    4,U
  STD    12,U
  LDD    10,U
  ADCB    3,U
  ADCA    2,U
  STD    10,U
  LDD    8,U
  ADCB    1,U
  ADCA    ,U
  STD    8,U
  LEAU    8,U    ; adjust stack pointer and deallocate
  RTS

And, now, for the 68000 byte-for-byte looped version (untested):

* Loop mode 64-bit byte-by-byte add for 68000
* Parameters pointed to by stack pointer A6.
*
* Enter:
*   left side parameter at 8(A6)
*   right side at (A6)
*
* Exit:
*   result at (A6)
*
add64bit
  MOVE.L    #7,D7    ; Because I like D7
  LEA    8(A6),A0
  LEA    16(A6),A1
  ANDI    #$EF,CCR    ; clear carry
addloop
  ADDX    -(A0),-(A1)
  DBF    D7,addloop
  LEA    8(A6),A6    ; deallocate left
  RTS

We are not going to even look at half-unrolled on the 68000, although it's kind of interesting. Fully unrolled:

add64bitunrolled
  MOVEM.L    (A6)+,D4/D5/D6/D7    ; pop up front
  ADD.L     D5,D7
  ADDX.L    D4,D6
  MOVEM.L    D6/D7,-(A6)    ; push back on
  RTS

And there you have it. Several different ways to add integers between 

9,223,372,036,854,775,807

and 

-9,223,372,036,854,775,808

on microprocessors


No comments:

Post a Comment