Monday, January 13, 2025

ALPP 03-15 -- Converting Numbers for Output and Input with Multiplication and Division (Theory)

Converting Numbers for Output and Input
with Multiplication and Division

(Title Page/Index)

 

Now that we've debugged getting an input key from the ST's keyboard and outputting its ASCII code value in hexadecimal and binary on the 68000, a natural next step would be to learn how to parse numbers from the input. 

But that will require multiplying and dividing by ten, because we usually interact with numbers in decimal base -- radix base ten. 

(Yeah, I'm not all that comfortable trying to remember the digits of π in hexadecimal or binary, either. And I'm not going to go out of my way to memorize those, particularly when I know how to get a computer to calculate them any time I need them, as in bc, using obase to set binary and hexadecimal output radix base:

$ bc -l
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
obase=2
4*a(1)
11.00100100001111110110101010001000100001011010001100001000110100101\
01
obase=16
4*a(1)
3.243F6A8885A308D2A

arctangent(1) is, of course, π/4. Yeah, if you're looking at the final digits above, the last byte that bc calculates in the scale you specify will be somewhat incorrect. The scale above is the default of 20 decimal digits when starting bc with the  -l option.)

When you're working in binary, getting numbers in and out in decimal requires converting between binary and decimal. 

We call it decimal ("dec" is ten) because it is radix base ten, and we have become used to writing our numbers in (radix) columns using that base. Something to do, as we suppose, with the number of fingers we have.

Converting between binary (radix base two) and decimal (radix base ten) requires multiplying and dividing by two and ten. 

Output, Working Left-to-right

One approach to display each digit of a numeric value in decimal is to proceed from left (most significant in our traditional column order) to right (least significant).

We start by finding the largest power of ten smaller than the number and divide the number by that value. The quotient will be the first digit on the left. 

Then we repeat with the remainder, until there is no more remainder. 

The advantage of this approach is that we can start writing digits down where we are.

The disadvantage is that we have to find the largest power less than the number before we can start.

One way to make it easier to find the largest power of ten smaller is to have a pre-calculated array of powers of ten to compare the number to.

Output, Working Right-to-left

Another approach is to guess or calculate how much space we need and start from the right (least significant column) and proceed to the left (towards the most significant). Divide the number by ten itself. The remainder is the right-most digit. 

Then we repeat with the quotient until the quotient goes to zero.

The advantage of this approach is that we will always be dividing by ten. There is no need to find the largest power of ten smaller to start with.

The disadvantage is that we have to guess how much space to leave -- or calculate it. 

But we can avoid either guessing or calculating the amount of space by doing our initial work in a temporary buffer somewhere, then copying the buffer to output.

Efficiency

Which is more efficient depends on a lot of things, but, in many cases, the code for the former can be organized so that it is as if only one actual complete division is performed, the iteration for each column producing one digit. By comparison, the latter approach requires a division for each column, and the division is by a small number, which is the sort of division that takes the most processor cycles.

But if we are just trying to get output going, we may find it easier to allocate the conversion buffer as a process global variable and use the latter method.

Input, Working Left-to-Right

To input a decimal value from the keyboard, we can get each digit in order from left to right, multiplying the accumulated value by ten before adding the digit we got, repeating until there are no more digits entered (or perhaps until the accumulated value overflows).

Input, Working Right-to left

Or we can read all the digits first into a global conversion buffer, count the number of digits, and multiply each digit by the appropriate power of ten as we go. And that also requires multiplication.

Multiplication, either way.

Efficiency

Again, efficiency appears to be more on the side of the left-to-right approach. But, again, we my find it easier (more efficient use of programmers' time) to declare the buffer, copy input into the buffer until we get a non-numeric input,  and parse/convert the number in the buffer instead of directly from the keyboard.

But thinking about efficiency too early in the planning stages is a mistake, unless you are actually not thinking about efficiency so much as trying to understand the problem.

Approaching Implementation

I don't know about you, but I find multiplication to be easier than division. 

Why? 

Memorizing the multiplication tables is fairly easy, and once we have the table memorized, you can look at each pair of digits from the multiplier and multiplicand and directly produce a digit in the product, with possible carry. 

It's a straight-forward input-driven process.

Trying to memorize division tables means memorizing lots of possible products and the factors used to produce them, and there are so many possibilities we don't usually get motivated to do that. (There are certain patterns that we can memorize that help, though.) And then we use what we remember to look at the quotient and guess which product of which pair of factors is applicable. 

Even when we do that for each digit in the dividend, we often have to guess and then we have to check our guesses and, if the our guess is correct, only then can we reduce the dividend, and count and record each digit of the quotient.

Essentially, we look at the divisor and the dividend and go searching for the quotient.

Aaaaaaannnddd ---

Checking whether we have found the correct digit of the quotient at each step requires multiplication. 

Erk. Does it feel like we're being corralled into understanding multiplication?

Let's look at multiplication.


(Title Page/Index)


Monday, January 6, 2025

ALPP 03-XX (17) -- Demonstrating Left Shift -- 6800

I'm putting this here for reference. Eventually, I plan to do a chapter on shifts, and most of this will be demonstrated there. I've only tested part of the code.

Demonstrating Left Shift --
6800

(Title Page/Index)

 

I've shown you some theoretical background on bit shifting left and multiplying by powers of 2, and I want to move ahead because we can't print out the results in decimal yet. 

But I talked it over with God. Oh, some people will understand, some people won't. I could call it a hunch -- a strong hunch, strong enough to keep me from proceeding -- if you prefer. 

And the result? There's a lot of code in these. Read through the 6809 and 68000 versions, scan the other two, and test one or more if you are inclined. Come back for reference if things get murky when we talk about synthesizing the multiplication and division routines.

I didn't want to consume four posts for this, to show the rigging and the test code for each processor, but it's going to be four posts even without the rigging framework. 

But you've already seen everything in the rigging framework anyway, in the single character input chapters. I'm going to let you move the new stuff into the rigging framework yourself this time.

Starting with the 6800 code for character input, open the file rt_rig03_6800.asm up in a text editor and save it as rt_rig04_6800.asm. Keep it open and open inkey_6800.asm (or whatever you saved it as) up and save it as shftst_6800.asm (or whatever). Change the inclusion (EXP) line to include rt_rig04_6800.asm instead of rt_rig03_6800.asm.

Now cut INCHAR and INCHNE out of shftst_6800.asm , from the comments on AECHO to the hook to INCHV, and move them into someplace appropriate in rt_rig04_6800.asm. You might want to move them in two or three separate pieces, or you might want to move those lines altogether at once to the same place. Your choice.

Also grab PDUP for the 6800 and 6801.

Save the two files and make sure they still assemble and run as in chapter 3-10.

Now, if you want to do this part yourself, cut the test code out of multest_6800.asm and replace it with appropriate 6800 code from the last chapter. Yes, that does mean you'll need to convert the 6809 code to 6800 code. It's not hard, just tedious, and instructive.

If you don't want to do the conversion yourself, or if you want to see how I'd do it, the following is some demonstration code I produced. A little less than two thirds of the way down, I realized I was heading the wrong direction and quit testing it. Some of the remaining code is known not to do what I intended, and the rest of the remaining code is not tested, but I'm leaving it here for reference.

 For the 6800, assuming PSP in X and the bytes to test on the parameter stack:
* test shifts and multiplies for 6800 (EXORsim)
* using parameter stack,
* with test frame
* Copyright Joel Matthew Rees, December 2024
*
	EXP	rt_rig04_6800.asm
****************
* Program code:
*
*
INX8	INX
INX7	INX
INX6	INX
	INX
INX4	INX
	INX
	INX
	INX
	RTS
*
DEX8	DEX
	DEX
DEX6	DEX
	DEX
DEX4	DEX
	DEX
	DEX
	DEX
	RTS
*
* Unrolled 64-bit integer shift left 1 bit:
LSL64	LSL	7,X	; least significant byte (byte 7)
	ROL	6,X	; next less significant byte (byte 6)
	ROL	5,X	; next more significant byte (byte 5)
	ROL	4,X	; next more significant byte (byte 4)
	ROL	3,X	; next more significant byte (byte 3)
	ROL	2,X	; next more significant byte (byte 2)
	ROL	1,X	; next more significant byte (byte 1)
	ROL	0,X	; most significant byte on (byte 0)
	RTS
*
* 64-bit integer shift left 1 bit in a loop:
LSL64LP	LDX	PSP	; but do not update!
	JSR	INX7	; point to last byte
	LDAA	#7	; bytes to ROL
	LSL	0,X	; least significant byte starts with LSL
SHL64L	DEX		; carry not affected
	ROL	0,X	; next more significant byte
	DECA		; count down, carry not affected
	BNE	SHL64L	; do next
	RTS
* Ends with X pointing to most significant byte
*
PGSTRT	LDAB	#$5A	; a common bit pattern to watch move
	ROLB		; $B4	9-bit rotate left
	ROLB		; $68
	ROLB		; $D1
	ROLB		; $A2
	ROLB		; $45
	ROLB		; $8B
	ROLB		; $16
	ROLB		; $2D
	ROLB		; $5A
	NOP		; Pause for a look.
*
	ROLB
	ADCB	#0	; $B4	8-bit rotate left
	ROLB
	ADCB	#0	; $69
	ROLB
	ADCB	#0	; $D2
	ROLB
	ADCB	#0	; $A5
	TBA		; is another common bit pattern to watch move
	ROLB
	ADCB	#0	; $4B
	ROLB
	ADCB	#0	; $96
	ROLB
	ADCB	#0	; $2D
	ROLB
	ADCB	#0	; $5A
	NOP
*
	LSLB		; 1st: $A55A	16-bit shift left
	ROLA		; $4AB4
	LSLB		; 2nd
	ROLA		; $9568
	LSLB		; 3rd 
	ROLA		; $2AD0
	LSLB		; 4th 
	ROLA		; $55A0
	NOP
	LSLB		; 5th 
	ROLA		; $AB40
	LSLB		; 6th 
	ROLA		; $5680
	LSLB		; 7th 
	ROLA		; $AD00
	LSLB		; 8th 
	ROLA		; $5A00
	NOP
*
	LDX	PSP	; 32-bit shift left mixed stack/register
	DEX		; allocate two bytes
	DEX
	STX	PSP
	LDAA	#$87
	LDAB	#$65
	STAB		1,X
	STAA		0,X	; $8765 on stack
	LDAA	#$43
	LDAB	#$21	; $4321 in D
	LSLB		; least significant byte
	ROLA		; next less significant byte
	ROL	1,X	; next more significant byte on stack
	ROL	0,X	; most significant byte on stack
	LSLB		; 2nd time
	ROLA
	ROL	1,X
	ROL	0,X
	LSLB		; 3rd time
	ROLA
	ROL	1,X
	ROL	0,X
	LSLB		; 4th time
	ROLA
	ROL	1,X
	ROL	0,X	; result -- 7654:3210
	NOP
*
	LDAB	#$10	; set up test data
	STAB	1,X	; X still has PSP
	ADDB	#$22
	LDAA	#7
T64LP	STX	PSP	; allocate before store
	STAB	0,X
	ADDB	#$22
	DEX
	DECA
	BNE	T64LP
	NOP
* Check the contents of the parameter stack when done.
	LDX	PSP	; sync X and PSP
	JSR	LSL64	; unrolled loop
	JSR	LSL64LP	; loop
	JSR	LSL64
	JSR	LSL64LP
	NOP
* Should be shifted left one hexadecimal digit.
	JSR	LSL64
	JSR	LSL64LP
	JSR	LSL64
	JSR	LSL64LP
	NOP
* Should be shifted left another hexadecimal digit,
* which is a full byte!
* But that's a hard way to shift left 8 bits.
* Let's try an easier, quicker way:
	LDAA	#7
LS8BITL	LDAB	1,X
	STAB	0,X
	INX
	DECA
	BNE	LS8BITL
* Wasn't that fast?
	NOP
	LDX	PSP
	JSR	INX8	; drop them all
	STX	PSP	
	NOP
* Multiply 8 bits by 4
	LDAB	#65	; $41
	LSLB	; multiply by 2
	LSLB	; ignore carry and multiply by 2
	NOP
* Multiply 16 bits by 4
	LDAB	#65
	CLRA
	LSLB	; multiply by 2
	ROLA	; catch the carry
	LSLB	; and again
	ROLA	; catch the carry again
	NOP
* Multiply 16 bits by 16
	LDAB	#65
	CLRA
	LSLB	; multiply by 2
	ROLA	; catch the carry
	LSLB	; and again
	ROLA	; catch the carry again
	LSLB	; and again
	ROLA	; catch the carry again
	LSLB	; and again
	ROLA	; catch the carry again
	NOP
* Multiply 16 bits by 16, using loop
	LDAB	#65
	CLRA
	JSR	PPSHD	; PSP in X on return
	LDAB	#4
	JSR	PPSHD
	LDAA	2,X	; operand on parameter stack
	LDAB	3,X
MUL16WL	LSLB	; multiply by 2
	ROLA	; catch the carry
	DEC	1,X
	BNE	MUL16WL
	NOP
	INX		; drop count
	INX
	STX	PSP
	STAA	0,X	; save result
	STAB	1,X
	NOP		; re-use 2 bytes of allocation
* Other powers of 2: 2^7 == 128
	LDAB	#83	; $53 X 128
	CLRA		; for the high bits
	LSLB		; 1st
	ROLA
	LSLB		; 2nd
	ROLA
	LSLB		; 3rd
	ROLA
	LSLB		; 4th
	ROLA
	LSLB		; 5th
	ROLA
	LSLB		; 6th
	ROLA
	LSLB		; 7th
	ROLA
	STAA	0,X	; $2980 == 10624
	STAB	1,X
	NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
	LDAB	#83	; $53 X 128
	CLRA	; for result
	LSRB	; bit 0 to carry, B becomes high byte
	RORA	; bit 0 of B now in bit 7 of A
	NOP
* Saturation math:
	LDAB	#83	; $53 X 128
	LSRB		; bit 0 to carry
	RORB		; now to bit 7
	ANDB	#$80	; chop off the lost, double-shifted high bits
	NOP
* Extend the saturation math with recovery (de-optimization):
	LDAB	#83	; $53 X 128 (9 bits => 7 to left is 2 to right)
	LSRB		; bit 0 to carry
	RORB		; now to bit 7
	TBA
	ROLA		; bring the high bits back into position
	ANDB	#$80	; chop off the high bits
	ANDA	#$7F	; chop off the low bits
	NOP
* and again, more efficiently, but not most efficiently
	LDAB	#83	; $53 X 128 (8 bits => 7 to left is 1 to right)
	TBA		; make two halves
	LSRA		; bit 0 to carry, high bits
	RORB		; now to bit 7 (bit 0 to carry)
	ANDB	#$80	; chop off the high bits
	NOP
* 2^6 == 64
	LDAB	#83	; $53 X 64
	CLRA		; for the high bits
	LSLB		; 1st
	ROLA
	LSLB		; 2nd
	ROLA
	LSLB		; 3rd
	ROLA
	LSLB		; 4th
	ROLA
	LSLB		; 5th
	ROLA
	LSLB		; 6th
	ROLA
	STAA	0,X
	STAB	1,X
	NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
	LDAB	#83	; $53 X 64
	CLRA	; for result
	LSRB	; bit 0 to carry
	RORA	; old bit 0 of B now in bit 7 of A
	LSRB	; old bit 1 of B to carry
	RORA	; old bit 1,0 of B now in bit 7,6 of A
	NOP
* Saturation math:
	LDAB	#83	; $53 X 64
    	LSRB		; bit 0 to carry
    	RORB		; now to bit 7, old bit 1 to carry
    	RORB		; now to bit 7,6 in order
    	ANDB	#$C0	; chop off the remainder
	NOP
* Extend the saturation math with recovery (de-optimization):
	LDAB	#83	; $53 X 64 (9 bits => 6 to left is 3 to right)
    	LSRB		; bit 0 to carry
    	RORB		; now to bit 7, old bit 1 to carry
    	RORB		; now to bit 7,6 in order
	TBA
	ROLA		; recover high bits including last carry
    	ANDB	#$C0	; chop off the high bits
	ANDA	#$3F	; chop off the low bits
	NOP
* and again, copying first
	LDAB	#83	; $53 X 64 (8 bits => 6 to left is 2 to right)
	TBA		; make two halves
	LSRA		; bit 0 to carry, high bits
	RORB		; now to bit 7 (bit 0 to carry)
	LSRA		; bring the high bits into place (bit 1 to C)
    	RORB		; now to bit 7,6 in order
	ANDB	#$C0	; chop off the high bits
	NOP
* 2^5 == 32
	LDAB	#83	; $53 X 32
	CLRA		; for the high bits
	LSLB		; 1st
	ROLA
	LSLB		; 2nd
	ROLA
	LSLB		; 3rd
	ROLA
	LSLB		; 4th
	ROLA
	LSLB		; 5th
	ROLA
	STAA	0,X
	STAB	1,X
	NOP
* compare going the other direction,
* ends with high byte in B, low byte in A:
	LDAB	#83	; $53 X 32
	CLRA		; for the high bits
	LSRB	; bit 0 to carry
	RORA	; old bit 0 of B now in bit 7 of A
	LSRB	; old bit 1 of B to carry
	RORA	; old bit 1,0 of B now in bit 7,6 of A
	LSRB	; old bit 2 of B to carry
	RORA	; old bit 2,1,0 of B now in bit 7,6,5 of A
	NOP
* Saturation math:
	LDAB	#83	; $53 X 32
    	LSRB		; bit 0 to carry
    	RORB		; now to bit 7, old bit 1 to carry
    	RORB		; now to bit 7,6 in order, old bit 2 to carry
    	RORB		; now to bit 7,6,5 in order
    	ANDB	#$E0	; chop off the high bits
	NOP
* Extend the saturation math with recovery:
	LDAB	#83	; $53 X 32 (9 bits => 5 to left is 4 to right)
    	LSRB		; bit 0 to carry
    	RORB		; now to bit 7, old bit 1 to carry
    	RORB		; now to bit 7,6 in order, old bit 2 to carry
    	RORB		; now to bit 7,6,5 in order
	TBA
	ROLA		; recover last carry into high bits
    	ANDB	#$E0	; chop off the remainder
	ANDA	#$1F	; chop off the low bits
	NOP
* and again, copying first
	LDAB	#83	; $53 X 32 (8 bits => 5 to left is 3 to right)
	TBA		; make two halves
	LSRA		; bit 0 to carry, high bits
	RORB		; now to bit 7 (bit 0 to carry)
	LSRA		; bring the high bits into place (bit 1 to C)
    	RORB		; now to bit 7,6 in order
	LSRA		; once more (old bit 2 to C)
    	RORB		; now to bit 7,6,5 in order
    	ANDB	#$E0	; chop off the high bits
	NOP
* balance the stack
	INX
	INX
	STX	PSP	; clear stack
	NOP
*
* From here down either hasn't been tested
* or doesn't function as I intended.
*
* shift left by 13 by right rotation => multiply by 8192, lose high bits
	LDAA	#$41	; $4153 == 16723
	LDAB	#$53
    	LSRB		; bit 0 to carry
    	RORA		; now to bit 15, bit 8 to carry
    	RORB		; bit 8 to bit 7, old bit 1 to carry
    	RORA		; old bit 1 to bit 15 in order
    	RORB		; old bit 9 to bit 7, old bit 2 to carry
    	RORA		; old bit 2 to bit 15 in order
    	RORB		; old bit 10 to bit 7 (old bit 3 to carry)
    	ANDA	#$E0	; chop off the top bytes, ignore carry
	CLRB
	NOP
* shift left by 13 => multiply by 8192, capture all bits:
	LDAA	#$41	; $4153 == 16723
	LDAB	#$53
	JSR	DEX4	; pre-allocate
	STX	PSP
    	LSRB		; bit 0 to carry
    	RORA		; now to bit 15, bit 8 to carry
    	RORB		; bit 8 to bit 7, old bit 1 to carry
    	RORA		; old bit 1 to bit 15 in order
    	RORB		; old bit 9 to bit 7, old bit 2 to carry
    	RORA		; old bit 2 to bit 15 in order
    	RORB		; old bit 10 to bit 7 (old bit 3 to carry)
	CLR	3,X	; least significant, ignore carry
	CLR	2,X	; save a place for lower middle byte
	STAB	1,X	; save next more significant byte
	TAB		; copy to split out high and low
    	ANDB	#$E0	; chop off the high bits
	STAB	2,X	; save the lower middle byte
	ANDA	#$1F	; chop off low bits
	STAA	0,X	; save high bits
	NOP
* Check the results before continuing.
	NOP
* shift left directly by 13 => multiply by 8192, capture all bits:
	DEX		; 2 placeholders, for middle upper byte	
	DEX		; and high byte
	STX	PSP
	LDAA	#$41	; $4153 == 16723
	LDAB	#$53
	LSLB
	ROLA
	ROL	1,X	; catch 1
	LSLB
	ROLA
	ROL	1,X	; catch 2
	LSLB
	ROLA
	ROL	1,X	; catch 3
	LSLB
	ROLA
	ROL	1,X	; catch 4
	LSLB
	ROLA
	ROL	1,X	; catch 5
	LSLB
	ROLA
	ROL	1,X	; catch 6
	LSLB
	ROLA
	ROL	1,X	; catch 7
	LSLB
	ROLA
	ROL	1,X	; catch 8
	DEX		; for high byte final resting place
	CLR	0,X	; not completely filled
	LSLB
	ROLA
	ROL	1,X	; catch 9
	ROL	0,X	; catch 1
	LSLB
	ROLA
	ROL	1,X	; catch 10
	ROL	0,X	; catch 2
	LSLB
	ROLA
	ROL	1,X	; catch 11
	ROL	0,X	; catch 3
	LSLB
	ROLA
	ROL	1,X	; catch 12
	ROL	0,X	; catch 4
	LSLB
	ROLA
	ROL	1,X	; catch 13
	ROL	0,X	; catch 5
	NOP
* stop to check
	NOP
	JSR	INX6	; drop all the above
	STX	PSP
	NOP
* 8-bit-wide rotation
* accumulator-wide ROL by 3 / ROR by 5 using the stack:
	LDX	PSP	; just to by sure, and remind ourselves
	DEX		; temp
	STX	PSP
	LDAA	#83	; $53
	STAA	0,X	; copy to stack
	LSL	0,X	; shift left by 3
	LSL	0,X
	LSL	0,X
	LSRA		; shift right by 5
	LSRA		; (more shifts, use the faster shift accumulator)
	LSRA
	LSRA
	LSRA
	ORAA	0,X	; put results together	
	NOP
* accumulator-wide ROL by 3 / ROR by 5 using ABA:
	LDAB	#83	; $53
	TBA	; copy
	LSLA	; shift left by 3
	LSLA
	LSLA
	LSRB	; shift right by 5
	LSRB
	LSRB
	LSRB
	LSRB
	ABA	; put the results together
	NOP
* accumulator-wide ROL by 3 / ROR by 5 using ADC #0 trick:
	LDAB	#83	; $53
	LSLB
	ADCB	#0
	LSLB
	ADCB	#0
	LSLB
	ADCB	#0
	NOP
* ugly accumulator-wide ROR by 5 / ROL by 3 using branch and set:
	LDAB	#83	; $53
	LSRB		; clears bit 7
	BCC	RR8BN1
	ORAB	#$80	; set it for the carry
RR8BN1	LSRB
	BCC	RR8BN2
	ORAB	#$80
RR8BN2	LSRB
	BCC	RR8BN3
	ORAB	#$80
RR8BN3	LSRB
	BCC	RR8BN4
	ORAB	#$80
RR8BN4	LSRB
	BCC	RR8BN5
	ORAB	#$80
RR8BN5	NOP		; next instruction
* Not as ugly accumulator-wide ROR by 5 / ROL by 3,
* but uses both accumulators to avoid branches:
	LDAB	#83	; $53
	TBA
	LSRA	; get lowest bit in carry first
	RORB
	LSRA	; get 2nd bit in carry first
	RORB
	LSRA	; get 3rd bit in carry first
	RORB
	LSRA	; get 4th bit in carry first
	RORB
	LSRA	; get 5th bit in carry first
	RORB
	NOP
* Compare result before dropping
	INX		; drop temp
	STX	PSP
	NOP
* 16-bit integer rotate left 3 / right 13  on 6800:
	LDAA	#$41	; $4153 == 16723
	LDAB	#$53
	LSLB		; clear bottom bit on shifting left
	ROLA
	ADCB	#0	; push the top carry in (16-bit rotation)	
	LSLB
	ROLA
	ADCB	#0	; push the top carry in (16-bit rotation)	
	LSLB
	ROLA
	ADCB	#0	; push the top carry in (16-bit rotation)
	NOP
* 16-bit integer rotate right 3 / left 13  on 6800:
	DEX		; temp to grab bit with
	STX	PSP
	STAB	0,X	; copy
	LSR	0,X	; get bottom bit
	RORA		; rotate it into top byte
	RORB		; 1 bit complete
	LSR	0,X	; get next bottom bit
	RORA
	RORB		; 2nd bit complete
	LSR	0,X	; get next bottom bit
	RORA
	RORB		; 3rd bit complete
	NOP		; Should be back to $4153
	INX
	STX	PSP
*
	RTS
*
	END	ENTRY
Now let's look at multiplying by some small constants that aren't powers of two.

 

(Title Page/Index)

 

 

 

 

Saturday, January 4, 2025

ALPP 03-XX (18) -- Multiplying by Small Constants (Shift Left and Add)

I'm putting this chapter here for reference. I discovered I was heading too deep, too early, but I want to keep this chapter handy. I haven't checked the code or the explanations carefully, be careful if you read or try to use what I have here.

Multiplying by Small Constants
(Shift Left and Add)

(Title Page/Index)

 

We've worked out multiplying by some constant powers of 2, hopefully enough to have some confidence in using bit shifts for multiplying.  

(Note that we are not using Trachtenberg methods.

So, I said that multiplying by small constants that are not powers of ten would still be easy. Considering I said that multiplying by powers of two would be easy, you may be doubting me.

Fair enough.

But let's look at multiplying by constant 3. 

We can look at it several different ways:

3X == X + X + X

Multiplying something by three is adding it to itself three times. That's what the algebra says, and when I say it in somewhat ordinary English, it makes sense. Sort of. But if we are very careful about how we say this, we have to say adding the number to zero three times. 

Adding it to itself three times could actually mean something else, something we just talked about in the last chapter, and are still talking about here.

But multiplying something by two -- doubling -- is adding it to zero twice, or adding it to itself (once). Anyway,

3X == 2X + X

Multiplying something by three is multiplying by two and then adding it again -- or left-shifting in binary once and then adding it to the product again.

For a byte on the 6809, catching carries:

	CLRA	; for carries
	LDB	,U	; get the byte
	LSLB		; 2X
	ROLA		; grab any high bit
	ADDB	,U	; 2X+X==3X
	ADCA	#0	; get any carry

Let's do that to a 16-bit integer, as a subroutine. And capture the carries.

MUL3	LDD	,U
	CLR	,-U	; for carries
	LSLB		; 2X
	ROLA
	ROL	,U	; get carries
	ADDD	1,U	; + X makes 3X
	BCC	MUL3NC
	INC	,U	; get carries
MUL3NC	STD	1,U
	RTS

If we weren't capturing carries and setting it up as a subroutine, it would be really short. 

Speaking of capturing carries, you will note that this subroutine accepts a 16-bit integer as input, but returns a 24-bit integer with the carries in the added most significant byte. 

Let's not tell anyone, but , rather than using shifts, we might have simply loaded and then added twice, also:

MUL3ADD	LDD	,U
	CLR	,U	; for carries
	LDD	1,U		; 1X
	ADDD	1,U	; 2X
	ROL	,U	; get first carry
	ADDD	1,U	; 3X
	BCC	MUL3NC
	INC	,U	; get 2nd carry
MUL3ANC	STD	1,U
	RTS

The first load is the same as adding it to zero, so that's really just multiplying by 3 the hard way, adding it up three times. 

Adding a number to itself is another way to shift it left by one bit, by the way.

How about another way that works on the 6809 and 6801? Both have the MUL A × B instruction that leaves the product in D:

MUL3MUL	LDD	,U
	PSHU	A	; save high byte
	LDA	#3
	MUL		; multiply low byte in B
	STD	1,U	; save 16-bit result
	LDB	,U	; get high byte back
	CLR	,U	; for result addition
	LDA	#3
	MUL		; multiply high byte in B
	ADD	,U
	STD	,U
	RTS

MUL takes 10 cycles on the 6801 and 11 on the 6809, so it would take a little longer than the shift method, but not much, and with a similar byte count.

But this routine using MUL could be generalized into an 8-bit by 16-bit routine that could be fairly quick for both small variables and small constants:

* 16 by 8 to 24-bit multiply
* 16-bit multiplicand in 2nd and 3rd bytes,
* 8-bit multiplier in 8 bits on top:
MUL16X8	LDB	2,U	; low byte
	LDA	,U	; multiplier
	MUL
	STB	2,U	; store result low byte
	PSHU	A	; save result middle byte temp
	LDD	1,U	; multiplier in A, high byte in B
	MUL
	ADDB	,U+	; add result middle byte, pop it
	ADCA	#0	; add the carry into the high byte
	STD	,U	; save middle and high bytes of result
	RTS
* 
MUL3CAL	LDB	#3
MUL3ROB	PSHU	B
	BRA	MUL16X8	; rob the RTS
MUL5CAL	LDB	#5
	BRA	MUL3ROB	; rob the PSHU
MUL10CAL	LDB	#10
	BRA	MUL3ROB	; rob the PSHU

Cool enough?

How about multiplying by 5 via shift-and-add, just to compare? Five is four plus one:

MUL5	LDD	,U
	CLR	,-U	; for carries
	LSLB		; 2X
	ROLA
	ROL	,U	; carries
	LSLB		; 2X again makes 4X
	ROLA
	ROL	,U	; carries
	ADDD	,U	; + X makes 5X
	BCC	MUL5NC
	INC	,U	; get carries
MUL5NC	STD	1,U
	RTS

Similar length to the MUL method, but just a little faster.

How about 10? 

Ten is five times 2.

Shifting left and then calling MUL5 would lose us a carry. But we can call MUL5 and then shift once more left. Or use the shift-and-add MUL5 code and hang another shift on the end:

* Multiply a 16-bit integer by 10,
* return as a 24 bit integer, to keep the carries.
* For the 6809
MUL10	LDD	,U
	CLR	,-U	; push a zero for overflow
	LSLB		; 2X
	ROLA
	ROL	,U	; catch the overflow
	LSLB		; 2X again makes 4X
	ROLA
	ROL	,U	; catch the overflow
	ADDD	1,U	; + X makes 5X
	BCC	MUL10NC
	INC	,U	; carry in the addition
MUL10NC		LSLB		; 2X again makes 10X
	ROLA
	ROL	,U	; catch the overflow
	STD	1,U
	RTS

But if we have the MUL16X8 routine, it's probably going to be about as fast to call that.

That's the 6809. How about actual code on the other processors?

The conversion is straightforward. I recommend trying it. 

For the 6801, you will be loading X from PSP and updating PSP appropriately. Also, you will be able to replace LSLB ; ROLA pairs with LSLD.

For the 6800, in addition to loading and updating PSP, you will need to split the double accumulator operators into individually working on A and B.

As a rough guess, given how fast the MUL is on 6801 and 6809, it would be reasonable to just use MUL and ADD on both. Shift-and-add will be useful on the 6800, however.

For the 68000, don't try to be too literal. Give 32-bit results instead of 24.

Oh, okay, let's look at code for the 68000:

MUL3_16:	; guessing it'll be about 48 processor cycles
	CLR.W	-(A6)	; for carries
	MOVE.L	(A6),D7
	LSL.L	#1,D7
	ADD.L	(A6)
	MOVE.L	D7,(A6)	; return in 32 bits
	RTS

MUL3_16_ADD:
	CLR.W	-(A6)	; for carries
	MOVE.L	(A6),D7
	ADD.L	(A6)		; Or maybe do it in-register?
	ADD.L	(A6)
	MOVE.L	D7,(A6)
	RTS

* No 8-bit MUL in 68000
* MULU takes 38 + 2 per bit processor clocks.
MUL3_16_MUL:	; MULU by #3 takes about 46 processor cycles, about 12 memory clocks.
	MOVE.W	(A6)+,D7	; pop it
	MULU.W	#3,D7	; ignore source high, 32-bit result in D7
	MOVE.L	D7,-(A6)
	RTS

* 16 by 16 to 32 bit multiply
* 16-bit multiplicand in 3rd and 4th bytes,
* 16-bit multiplier in 16 bits on top:
MUL16X16:
	MOVE.W	2(A6),D7
	MULU.W	(A6),D7	; ignore source high, 32-bit result in D7
	MOVE.L	D7,(A6)
	RTS
* 
MUL3CAL_16:
	MOVE.W	#3,-(A6)
	BRA.S	MUL16X16	; rob the code
MUL5CAL_16:
	MOVE.W	#5,-(A6)
	BRA.S	MUL16X16	; rob the code
MUL10CAL_16:
	MOVE.W	#10,-(A6)
	BRA.S	MUL16X16	; rob the code

MUL5_16:
	CLR.W	-(A6)	; for carries
	MOVE.L	(A6),D7
	LSL.L	#2,D7	; 4X
	ADD.L	(A6)	; 4X+X
	MOVE.L	D7,(A6)	; return in 32 bits
	RTS

MUL10_16:
	CLR.W	-(A6)	; for carries
	MOVE.L	(A6),D7
	LSL.L	#2,D7	; 4X
	ADD.L	(A6)	; 4X+X
	LSL.L	#1,D7	; 2(4X+X)
	MOVE.L	D7,(A6)	; return in 32 bits
	RTS

And 16 bit integers ought to be big enough for anyone, right? I mean, who would ever do anything with an integer bigger than 30,000?

(Which is part of where the old "Who needs more than 64K of memory?" excuse for corner-cutting in the early-to-mid 1980s  came from.)

However, we do often want to do this with 32-bit operands, and may want to catch the carries, from that, even. So let's look at giving 32-bit results with carries in 40, I mean, 48 bits. (We shouldn't want to do math in odd numbers of bytes on the 68000.):

Draw your own conclusions about optimization.

 

Recap

 

And now we can do part of what is necessary for decimal input and output. That was quick. 

 

 

 

 

 

Dividing from the top.

 

 

multiply byte by ten, remainder from divide by 100?


***********

 


 


(Title Page/Index)