The key to doing 64-bit addition and subtraction on a CPU without a native 64-bit add is finding and using some means of catching and using the carry/borrow from the less significant parts. Once you have that figured out, you're home free.
Well, except for the problem of where you get the numbers from and where you store the result.
We can relieve ourselves of a lot of copying and other maintenance by using a stack for the parameters.
* Loop mode 64-bit add for 6800
* Parameters pointed to by software stack
pointer
* in (probably direct page) variable DSP.
*
* Enter:
*
left side parameter at DSP[8]
* right side at DSP[0]
*
*
Exit:
* result at DSP[0]
*
add64bit
LDX DSP
LDAA #8
CLC
addloop
LDAB 15,X
ADCB 7,X
STAB 15,X
DEX
DECA
BNE addloop
LDAB DSP+1 ; deallocate right
LDAA DSP
ADDB #8
ADCA #0
STAB DSP+1
STAA DSP
RTS
You can flatten, or unroll the loop, but it consumes a total of 48 bytes to do so:
* Unrolled 64-bit add for 6800
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
* left side parameter at DSP[8]
* right side at DSP[0]
*
* Exit:
* result at DSP[0]
*
add64bitflat
LDX DSP
LDAB 15,X
ADDB 7,X ; First ADD makes valid carry.
STAB 15,X
LDAB 14,X ; Leave A untouched.
ADCB 6,X ; Use carry from here out.
STAB 14,X
LDAB 13,X
ADCB 5,X
STAB 13,X
LDAB 12,X
ADCB 4,X
STAB 12,X
LDAB 11,X
ADCB 3,X
STAB 11,X
LDAB 10,X
ADCB 2,X
STAB 10,X
LDAB 9,X
ADCB 1,X
STAB 9,X
LDAB 8,X
ADCB 0,X
STAB 8,X
LDAB DSP+1 ; deallocate right
ADDB #8 ; Leave A untouched here, too.
STAB DSP+1
LDAB DSP
ADCB #0
STAB DSP
RTS
The 6801 gives us the D double accumulator to work with, but using it requires some analysis. We'll start by simplifying the deallocation code with the ABX instruction:
* Loop mode 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
* left side parameter at DSP[8]
* right side at DSP[0]
*
* Exit:
* result at DSP[0]
*
add64bit
LDX DSP
LDAA #8
CLC
addloop
LDAB 15,X
ADCB 7,X
STAB 15,X
DEX
DECA
BNE addloop
LDAB #16 ; deallocate right
ABX
STX DSP
RTS
Now we'll show a version that makes some use of the D accumulator:
* Partially unrolled loop mode 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
* Uses counter variable (also probably in direct page).
*
* Enter:
* left side parameter at DSP[8]
* right side at DSP[0]
*
* Exit:
* result at DSP[0]
*
add64bithalfunrolled
LDX DSP
LDAA #3
STAA COUNTER
LDD 14,X
ADDD 6,X
STD 14,X
addloophalfunrolled
LDD 12,X
ADCB 5,X
ADCA 4,X
STD 12,X
DEX
DEX
DEC COUNTER
BNE addloophalfunrolled
LDAB #14 ; adjust & deallocate right
ABX
STX DSP
RTS
Comparing the above with a fully unrolled add, half unrolled doesn't really buy as much as it seems:
* Fully unrolled 64-bit add for 6801
* Parameters pointed to by software stack pointer
* in (probably direct page) variable DSP.
*
* Enter:
* left side parameter at DSP[8]
* right side at DSP[0]
*
* Exit:
* result at DSP[0]
*
add64bitunrolled
LDX DSP
LDD 14,X
ADDD 6,X
STD 14,X
LDD 12,X
ADCB 5,X
ADCA 4,X
STD 12,X
LDD 10,X
ADCB 3,X
ADCA 2,X
STD 10,X
LDD 8,X
ADCB 5,X
ADCA 4,X
STD 8,X
LDAB #8 ; adjust & deallocate right
ABX
STX DSP
RTS
The 6809 version shows some interesting advantages in the setup and allocation, and gives an opportunity for creativity with the accumulator indexing:
* Loop mode 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
* left side parameter at 8,U
* right side at ,U
*
* Exit:
* result at ,U
*
add64bit
LDA #8
ANDCC #$FE ; clear carry
addloop
LEAX 8,U ; Point to left operand.
LDA #7 ; count/offset
ANDCC #$FE ; clear carry
addloop
LDB A,X
ADCB A,U
STB A,X
DECA
BHS addloop
LEAU ,X ; deallocate right
RTS
Half-unrolled on the 6809 doesn't really make a lot of sense either:
* Half unrolled loop mode 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
* left side parameter at 8,U
* right side at ,U
*
* Exit:
* result at ,U
*
add64bithalfunrolled
LDA #3
STA ,-U
LEAX ,U
LDD 14,U
ADDD 6,U
STD 14,U
addloophalfunrolled
LDD 12,U
ADCB 5,U
ADCA 4,U
STD 12,U
LEAU -2,U ; artifact false allocation
DEC ,U ; slow variable on 6809!
BNE addloophalfunrolled
LEAU 14,U ; adjust stack pointer and deallocate
RTS
Compare that with the fully unrolled (which the 6801 gets fairly close to, as well):
* Fully unrolled 64-bit add for 6809
* Parameters pointed to by stack pointer U.
*
* Enter:
* left side parameter at 8,U
* right side at ,U
*
* Exit:
* result at ,U
*
add64bitunrolled
LDD 14,U
ADDD 6,U
STD 14,U
LDD 12,U
ADCB 5,U
ADCA 4,U
STD 12,U
LDD 10,U
ADCB 3,U
ADCA 2,U
STD 10,U
LDD 8,U
ADCB 1,U
ADCA ,U
STD 8,U
LEAU 8,U ; adjust stack pointer and deallocate
RTS
And, now, for the 68000 byte-for-byte looped version (untested):
* Loop mode 64-bit byte-by-byte add for 68000
* Parameters pointed to by stack pointer A6.
*
* Enter:
* left side parameter at 8(A6)
* right side at (A6)
*
* Exit:
* result at (A6)
*
add64bit
MOVE.L #7,D7 ; Because I like D7
LEA 8(A6),A0
LEA 16(A6),A1
ANDI #$EF,CCR ; clear carry
addloop
ADDX -(A0),-(A1)
DBF D7,addloop
LEA 8(A6),A6 ; deallocate left
RTS
We are not going to even look at half-unrolled on the 68000, although it's kind of interesting. Fully unrolled:
add64bitunrolled
MOVEM.L (A6)+,D4/D5/D6/D7 ; pop up front
ADD.L D5,D7
ADDX.L D4,D6
MOVEM.L D6/D7,-(A6) ; push back on
RTS
And there you have it. Several different ways to add integers between
9,223,372,036,854,775,807
and
-9,223,372,036,854,775,808
on microprocessors
No comments:
Post a Comment