Friday, March 1, 2024

ALPP 01-01 -- Accumulating Integer Results in the M6809, 6800, 6801, and M68000

Accumulating Integer Results
on the M6809, 6800, 6801, and M68000

What is an accumulator?

When you're working through a problem of some kind, you generally need some place to keep a record of your current results. As an example, think of adding up a total:

8 + 5 + 2 + 7 + 4 = 26 (if I didn't mess up the math)

If you can do that in your head, you probably don't even think about it, but you probably do keep a running total:

8
+ 5 = 13
+ 2 = 15
+ 7 = 22
+ 4 = 26 (Okay, I didn't mess up the math)

If you do that on a piece of paper the way I just did it here, your "accumulator area" is the space to the right of the = sign next to the current number you're adding at each step.

But if you do it in you're head, unless you've trained yourself (or have natural photographic memory or something), you only keep the running total (the accumulated value) and the number you are currently adding. That's your effective accumulator in your head.

It's the natural current focus point of calculations.

In the 6800/6801 and the 6809, you have two small dedicated accumulators, usually called A and B. (In the 6801 and 6809, you can concatenate them together for a single, medium-sized Double accumulator D, for some operations, more later.)

In the 68000, you have eight large data registers which can be used as accumulators, usually called D0 to D7. (Usually. Some assemblers use different conventions.)

As a totally contrived sequence of operations, let's do the above in 6800 assembly language:


  * Adding a sequence of constant values:
	LDAA	#8	; Gotta start somewhere.
	ADDA	#5
	ADDA	#2
	ADDA	#7
	ADDA	#4
	NOP		; landing pad for breakpoint

(6801 is the same, or can be.)

LDAA is the mnemonic for the "Load accumulator A" (LoaD Accumulator A) instruction (operation code).
The # hash mark indicates that the numeric value is immediate.
ADDA is for "ADD to accumulator A".
NOP is a no-operation op code, in this case, as noted, for a landing pad where I can set a breakpoint.
And the notes after the semicolon are comments for human readers (mostly), and the computer (mostly) ignores them.

Now, this entire sequence is known in advance, and should usually be optimized to a single instruction

	LDAA	#26	; optimized version

But if we optimize away the operations now, we can't see what they are doing. So we won't do that. Not yet.

If you are wondering where the result is supposed to show up, good.

Let's break out a debugger and trace it through:


  % t on
% b 100a
Breakpoint set at 100A
% c 1000

          0 A=00 B=00 X=0000 SP=00FF ------          1000: 86 08    LDA #08   EA=1001 D=08
          1 A=08 B=00 X=0000 SP=00FF ------          1002: 8B 05    ADDA #05  EA=1003 D=05
          2 A=0D B=00 X=0000 SP=00FF ------          1004: 8B 02    ADDA #02  EA=1005 D=02
          3 A=0F B=00 X=0000 SP=00FF ------          1006: 8B 07    ADDA #07  EA=1007 D=07
          4 A=16 B=00 X=0000 SP=00FF H-----          1008: 8B 04    ADDA #04  EA=1009 D=04

Breakpoint!
>         5 A=1A B=00 X=0000 SP=FF8A ------          100A: 01       NOP

Looks good, right? Nothing happening in B, X, and SP. The H half-carry flag gets set and then goes away, we see the PC counting through the code, and the machine code is there with the assembler mnemonics.

(Above, EA is something we call the effective address of the operand, and D is the data operand referenced at the effective address, more on that later.)

But, wait, ... since when does 8 + 5 = what? D? And we don't see 26 in there at the end, or anywhere?

What went wrong? Computers are never wrong! (cough.)

Must be something we did.

Let's try a different sequence of numbers. Here's a well-known one:

1 + 2 + 3 + 4 + 5 + 6 = 21

Yep. ( 7 × 6 ) ÷ 2 is 21. Code:


  	LDAA	#0	; Clear A out.
	ADDA	#1	; Add 1 to A.
	ADDA	#2	; Add 2.
	ADDA	#3	; Etc.
	ADDA	#4
	ADDA	#5
	ADDA	#6
	NOP		; landing pad

That ought to work. Trace it in the debugger:


  % b 100e
Breakpoint set at 100E
% t on
% c 1000
          0 A=00 B=00 X=0000 SP=FF8A ------          1000: 86 00    LDA #00   EA=1001 D=00
          1 A=00 B=00 X=0000 SP=FF8A ---Z--          1002: 8B 01    ADDA #01  EA=1003 D=01
          2 A=01 B=00 X=0000 SP=FF8A ------          1004: 8B 02    ADDA #02  EA=1005 D=02
          3 A=03 B=00 X=0000 SP=FF8A ------          1006: 8B 03    ADDA #03  EA=1007 D=03
          4 A=06 B=00 X=0000 SP=FF8A ------          1008: 8B 04    ADDA #04  EA=1009 D=04
          5 A=0A B=00 X=0000 SP=FF8A ------          100A: 8B 05    ADDA #05  EA=100B D=05
          6 A=0F B=00 X=0000 SP=FF8A ------          100C: 8B 06    ADDA #06  EA=100D D=06

Breakpoint!
>         7 A=15 B=00 X=0000 SP=FF8A H-----          100E: 01       NOP

Much better. Load in a 0 and the result in A is 0. Add 1 to get 1, add 2 to get 3, and 3 to get 6, add 4 to get A ...

Erk. Giving 8 + 5 = D a D grade is understandable. What grade do we give to 6 + 4 = A?

Heh. The computer is not (this time) wrong.

So let's talk about

Hexadecimal!

Simple debuggers like the one I'm using here display many results in hexadecimal instead of decimal, to save time, space on the screen, writing code, and just to make things more meaningful. (You doubt that last one? Trust me on this.)

For the time being, and maybe a quick refresher, here's a quick hexadecimal-to-decimal conversion chart, up to 31_ten:

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F

16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F

So, 8 + 5 does, in fact, equal D_sixteen (D base sixteen) in the usual (modern) way of writing hexadecimal.

And 6 + 4 does equal A_sixteen (A base sixteen).

Yes, it's inconvenient when you've spent so much of your life getting used to decimal, but we'll try to take time to make sure we don't get lost in the hexadecimal.

So, from the conversion chart, 22_tenis 16_sixteen. Maybe that's confusing, but let's look at why.

Remember that each column in base ten is a power of ten, increasing to the left. (That's most-significant column first.)

22_tenis 2 times ten plus 2 times one:

twenty plus two, or 20_ten + 2_ten.

And each column in base sixteen is a power of sixteen, increasing to the left.

16_sixteen is 1 times sixteen plus 6 times one:

sixteen plus six, or 10_sixteen + 6_sixteen.

(To help us not get confused, we can read that as "one-zero base sixteen plus six base sixteen".)

So, working in hexadecimal (and checking in decimal),

8_sixteen
+ 5_sixteen = D_sixteen (13_ten)
+ 2_sixteen = F_sixteen (15_ten)
+ 7_sixteen = 16_sixteen (16_ten + 6_ten = 22_ten)
+ 4_sixteen = 1A_sixteen (16_ten + 10_ten = 26_ten)

There's our 26!

And,

0_sixteen
+ 1_sixteen = 1_sixteen (1_ten)
+ 2_sixteen = 3_sixteen (3_ten)
+ 3_sixteen = 6_sixteen (6_ten)
+ 4_sixteen = A_sixteen (10_ten)
+ 5_sixteen = F_sixteen (15_ten)
+ 6_sixteen = 15_sixteen (16_ten + 5_ten = 21_ten)

And that's enough of non-decimal numeric bases for the moment.

For comparison, the 6809 version of the first sequence is almost the same in assembler:

	LDA	#8	; 6809 mnemonic: LoaD accumulator A
	ADDA	#5	; Mnemonic shared with 6800/6801.
	ADDA	#2
	ADDA	#7
	ADDA	#4

Looking at that in a 6809 simulator:


  % b 100a
Breakpoint set at 100A
% t on
% c 1000

          0 A=00 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            1000: 86 08        LDA #$08
          1 A=08 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            1002: 8B 05        ADDA #$05
          2 A=0D B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            1004: 8B 02        ADDA #$02
          3 A=0F B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            1006: 8B 07        ADDA #$07
          4 A=16 B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --H-----            1008: 8B 04        ADDA #$04

Breakpoint!
>         5 A=1A B=00 X=0000 Y=0000 U=0000 S=00FF P=00 --------            100A: 12           NOP

That looks a lot like the 6800 emulation session, which is no real surprise. An extra index register (Y), an extra stack register (U), the direct page register (P). But it's a long line already, so the effective address and data are left out. And the accumulated sums are the same, as they should be.

And the 68000 version looks like this, using default register width instructions (which we will need to talk about later):


  	MOVEQ	#8,D0	; MOVEQ lets us ignore register width.
	ADDQ	#5,D0	; Default register width is okay.
	ADDQ	#2,D0	; D0 is equivalent to 6800 A? Not quite?
	ADDQ	#7,D0
	ADDQ	#4,D0

MOVEQ is for "MOVE Quick"; ADDQ is for "ADD Quick". Both are somewhat optimized versions of the move and add instructions. MOVEQ affects the entire register, so any register size for the following ADD instructions is okay for the range of numbers we are using here. (MOVEQ operands can be from -128 to 127; ADDQ operands can be from 1 to 8. Again, more later.)

With all the registers in the 68000, doing a one-line complete register dump becomes rather awkward. You'd need a screen able to show something like 300 characters per line, and then you'd wear your eyes out looking for what changed on each line. So you don't do that.

Here's what the top part of the screen looks like in the devpak debugger on the Hatari Atari ST emulator screen:


  d0 = 00000000  ....    a0 = 00000000 602E 0206 00E0 0030 0001  ..........
d1 = 00000000  ....    a1 = 00000000 602E 0206 00E0 0030 0001  ..........
d2 = 00000000  ....    a2 = 00000000 602E 0206 00E0 0030 0001  ..........
d3 = 00000000  ....    a3 = 00000000 602E 0206 00E0 0030 0001  ..........
d4 = 00000000  ....    a4 = 00000000 602E 0206 00E0 0030 0001  ..........
d5 = 00000000  ....    a5 = 00000000 602E 0206 00E0 0030 0001  ..........
d6 = 00000000  ....    a6 = 00000000 602E 0206 00E0 0030 0001  ..........
d7 = 00000000  ....    a7 = 00000000 602E 0206 00E0 0030 0001  ..........
SR:0000    U          ssp = 00000000 602E 0206 00E0 0030 0001  ..........
PC:00E00030  bra $E0004E

After the data register on each line is the ASCII interpretation of the contents of the data register. Then, after the address register on each line is the contents of ten bytes of memory in hexadecimal, then the same ten bytes shown as if they were ASCII characters. SR is the status register, ssp is the system stack pointer, PC is the program counter. And that's a lot to take in.

Even without the extra interpretations and memory content, you can see the problem, right?

If the debugger would just show you the registers that changed, that would be great. But I don't have one of those kinds of debuggers for the 68000.

I can pretend to be one, and I guess that will have to do. Here's what it might look like:


  % t on
% b 1800a
Breakpoint set at 0001800A
% c 18000

  0 D0=00000000 A7=00017FF8 ----- 00018000: 7008 MOVEQ   #8,D0
  1 D0=00000008 A7=00017FF8 ----- 00018002: 5A40 ADDQ    #5,D0
  2 D0=0000000D A7=00017FF8 ----- 00018004: 5440 ADDQ    #2,D0
  3 D0=0000000F A7=00017FF8 ----- 00018006: 5E40 ADDQ    #7,D0
  4 D0=00000016 A7=00017FF8 ----- 00018008: 5840 ADDQ    #4,D0

Breakpoint!
> 5 D0=0000001A A7=00017FF8 ----- 0001800A: 4E71 NOP

Now, you're probably wanting to try some of this fun stuff yourself.

I plan to show how to build a hexadecimal calculator as part of the tutorial, but that's a bit down the road, yet.

Until then, hexadecimal math is supported by many desktop calculators on personal and workstation computers. Look for such things as programmer mode, and you'll probably find binary and octal modes as well within the programmer mode.

Or, if you have a Unix or Linux OS workstation, you probably also have the Unix command-line arbitrary precision basic calculator utility, bc. That allows you to set the input and output conversion bases to arbitrary positive bases (greater than one), and can be fun and instructive to play with:

obase=16
ibase=16
8 + 8
10
8 + 5 + 2 + 7 + 4
1A

Or, the programming language Forth has a similar conversion base, just one for both input and output:

16 BASE ! ok
8 5 + 2 + 7 + 4 + . 1A ok

(You might find post-fix math confusing, I suppose.)

Many versions of Forth are available, I tend to use gforth, because it is available in the Debian and Ubuntu repositories.

Playing with hexadecimal math is fun, but what about to using a debugger like I have shown above to learn assembler?

There are, of course, many ways to get hardware and/or software to practice on.

Hardware is good, but I don't have room for hardware. So I am using Joe H. Allen's emulator EXORsim for the 6800, 6801, and 6809, and the Atari ST emulator Hatari for the 68000. Since it will be useful for the reader to follow along, I'll explain at least one way to get each of these, but this chapter is long enough already.

What about other CPUs, like the 8080, Z-80, 6502, 8086, ARM, RISC-V, Power PC, ... ?

This is a fairly time-intensive project. Doing just the four I've already described above is plenty to fill my plate. And add to that describing how to set up the environment to run the code.

But when I get to a good pause point, I do intend to come back and add chapters to demonstrate code for some of those processors.

So, how to get the emulators ...

(Title Page/Index)

ALPP -- Assembly Language Programming Primer -- Title Page

Assembly Language Primer

Joel Matthew Rees

Amagasaki, Japan

Copyright 2024 Joel Matthew Rees

Preface

Basics

Accumulating Integer Results in the M6809, 6800, 6801, and M68000

Sunday, February 25, 2024

Optimizing Direct-call Forth in 6809. 68000, and 6800/6801

(This is a little doodling inspired by a post suggesting stackless Forth in the Minimalist Computing Facebook group. Doing this for 6809, 68000, and 6800.)

Start with a bit of fig Forth to practice optimizing. All words called by COUNT are leaf words, making them easy to in-line.

The fig source code ai am working with, borrowed from my transcriptions of the 6800 model, with a few comments on what I am doing:


  * From Dave Lion, et. al, fig Forth model for 6800:
* Converting to direct call, in-lining, and optimizing.
* This is low-hanging fruit, but easy to see how it could work.
* Should be possible to do this all mechanically, I think.
* I'm going to pretend I know what the optimizer that
* does not yet exist will do for 6809, 68000, 6801 and 6800.
figCOUNT:
	FDB	DOCOL,DUP,ONEP,SWAP,CAT
	FDB	SEMIS
* 
figCOUNTlinearsource:
	FDB	DOCOL
	FDB	DUP
	FDB	ONEP
	FDB	SWAP
	FDB	CAT
	FDB	SEMIS

I'll start with the 6809 because it should be easy and straightforward. It probably isn't important, but, to emphasize that I'm doing away with the VM overhead and just working within a run-time that projects the VM onto the 6809 IPA, I'm showing the direct calls in the 6809 assembler, along with the functions called.

Below the in-line source, I'm showing the results of each theoretical optimization pass, starting with removing the easy extraneous data movement on the stack and in registers, and proceeding with moving things around and combining operations. Then I show the final source, along with an optional version that may not be easy to reach along mechanical paths:


  ******************************
* First, 6809
6809COUNTcall:
*	LBSR	DOCOL	; direct call
	LBSR	DUP
	LBSR	ONEP
	LBSR	SWAP
	LBSR	CAT
*	LBSR	SEMIS
	RTS

* Leaf routines defined as
6809DUP
	LDD	,U
	STD	,--U
	RTS

6809ONEP
*	INC	1,U	; harder to optimize, keep it simple.
*	BNE	ONEPNC
*	INC	,U
*ONEPNC	RTS
	LDD	,U
	ADDD	#1
	STD	,U
	RTS

6809SWAP
*	PULU	D,X	; harder to optimize, keep it simple.
*	EXG	D,X	; could also do separate stores, etc.
*	PSHU	D,X
*	RTS
*	
	LDD	,U
	LDX	2,U
	STD	2,U
	STX	,U
	RTS

6809CAT
	CLRA
	LDB	[,U]
	STD	,U
	RTS

* bringing the calls in-line:
6809COUNTinline:
	LDD	,U	; DUP
	STD	,--U
*
	LDD	,U	; 1+
	ADDD	#1
	STD	,U
*
	LDD	,U	; SWAP
	LDX	2,U
	STD	2,U
	STX	,U
*
	CLRA		; C@
	LDB	[,U]
	STD	,U
*
	RTS

* Vacuum out the data motion on the stack:
6809COUNTpass1
	LDD	,U	; DUP
*	STD	,--U
	LEAU	-2,U
*
*	LDD	,U	; 1+
	ADDD	#1
*	STD	,U
*
*	LDD	,U	; SWAP
	LDX	2,U
	STD	2,U
	STX	,U
*
	CLRA		; C@
	LDB	[,U]
	STD	,U
	RTS

* Combine and simplify:
6809COUNTpass2
	LDD	,U	; DUP
	LEAU	-2,U
*
	ADDD	#1	; 1+
*
	LDX	2,U	; SWAP
	STD	2,U	; Misordering possible.
*	STX	,U
*
	CLRA		; C@
*	LDB	[,U]
	LDB	,X
	STD	,U
	RTS

* Postpone stack operations:
6809COUNTpass3
	LDD	,U	; DUP
*	LEAU	-2,U
*
	ADDD	#1	; 1+
*
*	LDX	2,U	; SWAP
	LDX	,U	; SWAP
*	STD	2,U
	STD	,U
*
	CLRA		; C@
	LDB	,X
*	STD	,U
	STD	,--U
*
	RTS

6809COUNTrearrange
*	LDD	,U	; DUP
	LDX	,U
*
*	ADDD	#1	; 1+
*
*	LDX	,U	; SWAP
*	STD	,U
*
	CLRA		; C@
*	LDB	,X
	LDB	,X+
	STX	,U
*
	STD	,--U
	RTS
*
6809COUNTfinal
	LDX	,U
	CLRA		; C@
	LDB	,X+
	STX	,U
	STD	,--U
	RTS
*

* compare (Could this be done mechanically, too?):
6809COUNTmaybe:
	LDX	,U
	CLRA
	LDB	,X+
	STD	,--U
	STX	2,U
	RTS

The 68000 code follows the 6809 code's optimization paths rather closely, since they both support high-level run-time models quite well, and in similar ways.

******************************
* Now 68000:
68KCOUNTcall:
*	BSR.W	DOCOL	; direct call
	BSR.W	DUP
	BSR.W	ONEP
	BSR.W	SWAP
	BSR.W	CAT
*	BSR.W	SEMIS
	RTS

* Leaf routines defined as
68KDUP
*	MOVE.L	(A6),-(A6)	; Harder to optimize, keep it simple.
*	RTS
	MOVE.L	(A6),D0
	MOVE.L	D0,-(A6)
	RTS

68KONEP
*	ADD.L	#1,(A6)	; Harder to optimize, keep it simple.
*	RTS
	MOVE.L	(A6),D0	; Keep it simple
	ADD.L	#1,D0
	MOVE.L	D0,(A6)
	RTS

68KSWAP
*	MOVEM.L	(A6),D0/D1	; Harder to optimize, keep it simple
*	EXG	D0,D1
*	MOVEM.L	D0/D1,(A6)
*	RTS
	MOVE.L	(A6),D0
	MOVE.L	2(A6),D1
	MOVE.L	D0,2(A6)
	MOVE.L	D1,(A6)
	RTS

68KCAT
	CLR.L	D0	; zero-extend
	MOVE.L	(A6),A0
	MOVE.B	(A0),D0
	MOVE.L	D0,(A6)
	RTS

* in-line:
68KCOUNTinline:
	MOVE.L	(A6),D0	; DUP
	MOVE.L	D0,-(A6)
*
	MOVE.L	(A6),D0	; 1+
	ADD.L	#1,D0
	MOVE.L	D0,(A6)
*
	MOVE.L	(A6),D0	; SWAP
	MOVE.L	2(A6),D1
	MOVE.L	D0,2(A6)
	MOVE.L	D1,(A6)
*
	CLR.L	D0	; C@
	MOVE.L	(A6),A0
	MOVE.B	(A0),D0
	MOVE.L	D0,(A6)
*
	RTS

* Vacuum out the data motion on the stack:
68KCOUNTpass1
	MOVE.L	(A6),D0	; DUP
*	MOVE.L	D0,-(A6)
	LEA	-4(A6),A6
*
*	MOVE.L	(A6),D0	; 1+
	ADD.L	#1,D0
*	MOVE.L	D0,(A6)
*
*	MOVE.L	(A6),D0	; SWAP
	MOVE.L	2(A6),D1	; Misordering possible.
	MOVE.L	D0,2(A6)
*	MOVE.L	D1,(A6)
*
	CLR.L	D0	; C@
*	MOVE.L	(A6),A0
	MOVE.L	D1,A0
	MOVE.B	(A0),D0
	MOVE.L	D0,(A6)
*
	RTS

* Combine and simplify:
68KCOUNTpass2
	MOVE.L	(A6),D0	; DUP
	LEA	-4(A6),A6
*
	ADD.L	#1,D0	; 1+
*
*	MOVE.L	4(A6),D1	; SWAP
	MOVE.L	4(A6),A0	; SWAP
	MOVE.L	D0,4(A6)
*
	CLR.L	D0	; C@
*	MOVE.L	D1,A0
	MOVE.B	(A0),D0
	MOVE.L	D0,(A6)
*
	RTS

* Postpone stack operations:
68KCOUNTpass3
	MOVE.L	(A6),D0	; DUP
*	LEA	-4(A6),A6
*
	ADD.L	#1,D0	; 1+
*
*	MOVE.L	4(A6),A0	; SWAP
	MOVE.L	(A6),A0	; SWAP
*	MOVE.L	D0,4(A6)
	MOVE.L	D0,(A6)
*
	CLR.L	D0	; C@
	MOVE.B	(A0),D0
*	MOVE.L	D0,(A6)
	MOVE.L	D0,-(A6)
*
	RTS

68KCOUNTrearrange
*	MOVE.L	(A6),D0	; DUP
	MOVE.L	(A6),A0
*
*	ADD.L	#1,D0	; 1+
*
*	MOVE.L	(A6),A0	; SWAP
*	MOVE.L	D0,(A6)
*
	CLR.L	D0	; C@
*	MOVE.B	(A0),D0
	MOVE.B	(A0)+,D0
	MOVE.L	A0,(A6)
	MOVE.L	D0,-(A6)
*
	RTS

68KCOUNTfinal
	MOVE.L	(A6),A0
	CLR.L	D0	; C@
	MOVE.B	(A0)+,D0
	MOVE.L	A0,(A6)
	MOVE.L	D0,-(A6)
	RTS


* compare (Could this be done, too?):
68KCOUNTmaybe:
	MOVE.L	(A6),A0
	CLR.L	D0
	MOVE.B	(A0)+,D0
	MOVE.L	D0,-(A6)
	MOVE.L	A0,4(A6)
	RTS

The 6801's 16-bit support, with more primitive resources, induces a different path:

******************************
* Next, 6801

* Somewhere, preferably in the direct page,
* Must NOT be used by interrupt-time routines!
* -- Either save it or have interrupts use another PSP
PSP	RMB	2	; parameter stack pointer
* DTEMPA	RMB	2	; temp for SWAP, ...

6801COUNTcall:
*	JSR	DOCOL	; direct call
	JSR	DUP
	JSR	ONEP
	JSR	SWAP
	JSR	CAT
*	JSR	SEMIS
	RTS

* Leaf routines defined as
6801DUP
	LDX	PSP
	LDD	0,X
	DEX
	DEX
	STD	0,X
	STX	PSP
	RTS

6801ONEP
*	LDX	PSP
*	INC	1,X	; harder to optimize, keep it simple.
*	BNE	ONEPNC
*	INC	0,X
*ONEPNC	STX	PSP
*	RTS
	LDX	PSP
	LDD	0,X
	ADDD	#1
	STD	0,X
	RTS

6801SWAP
*	LDX	PSP	; this uses no static local variable,
*	LDAA	0,X	; but it will be harder to optimize
*	LDAB	2,X
*	STAA	2,X
*	STAB	0,X
*	LDAA	1,X
*	LDAB	3,X
*	STAA	3,X
*	STAB	1,X
*	RTS
	LDX	PSP
	LDD	0,X
*	STD	DTEMPA	; Faster, but uses statically allocated variable
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDD	2,X
	STD	0,X
*	LDD	DTEMPA
	PULB
	PULA
	STD	2,X
	RTS
 
6801CAT
	LDX	PSP
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STD	0,X
	RTS

* in-line:
6801COUNTinline:
	LDX	PSP	; DUP
	LDD	0,X
	DEX
	DEX
	STD	0,X
	STX	PSP
*
	LDX	PSP	; 1+
	LDD	0,X
	ADDD	#1
	STD	0,X
*
	LDX	PSP	; SWAP
	LDD	0,X
*	STD	DTEMPA	; Faster, but uses statically allocated variable
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDD	2,X
	STD	0,X
*	LDD	DTEMPA
	PULB
	PULA
	STD	2,X
*
	LDX	PSP	; C@
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STD	0,X
*
	RTS

* Vacuum out the data motion on the stack:
6801COUNTpass1
	LDX	PSP	; DUP
	LDD	0,X
	DEX
	DEX
*	STD	0,X
	STX	PSP
*
*	LDX	PSP	; 1+
*	LDD	0,X
	ADDD	#1
*	STD	0,X
*
*	LDX	PSP	; SWAP
*	LDD	0,X
*	STD	DTEMPA	; Faster, but uses statically allocated variable
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDD	2,X
	STD	0,X
*	LDD	DTEMPA
	PULB
	PULA
	STD	2,X
*
*	LDX	PSP	; C@
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STD	0,X
*
	RTS

* Combine and simplify is stuck at this point,
6801COUNTrearrange
	LDX	PSP	; DUP
*	LDD	0,X
	DEX
	DEX
	STX	PSP
*
*	ADDD	#1	; 1+
*
**	STD	DTEMPA	; SWAP
*	PULB
*	PULA
*	LDD	2,X
*	STD	0,X
**	LDD	DTEMPA
*	PULB
*	PULA
*	STD	2,X
	LDD	2,X	; SWAP
	STD	0,X
*
	LDX	0,X	; C@
	CLRA
	LDB	0,X
*
	LDX	PSP
	STD	0,X
*
	LDD	2,X
	ADDD	#1	; 1+
	STD	2,X
*
	RTS

* Postponing stack operations is already done:
6801COUNTfinal
	LDX	PSP
	DEX
	DEX
	STX	PSP
	LDD	2,X	; DUP {SWAP}
	STD	0,X
	LDX	0,X	; C@
	CLRA
	LDB	0,X
	LDX	PSP
	STD	0,X
	LDD	2,X	; 1+
	ADDD	#1
	STD	2,X
	RTS

*6801COUNTmaybe	; No obvious alternate paths.

The 6800's lack of 16-bit support induces yet different paths, which are (surprisingly?) similar to the 6809's and 68000's paths:

******************************
* And, 6800

* Somewhere, preferably in the direct page,
* Must NOT be used by interrupt-time routines!
* -- Either save it or have interrupts use another PSP
PSP	RMB	2	; parameter stack pointer
DTEMPA	RMB	2	; temp for SWAP, ...

6800COUNTcall:
*	JSR	DOCOL	; direct call
	JSR	DUP
	JSR	ONEP
	JSR	SWAP
	JSR	CAT
*	JSR	SEMIS
	RTS

* Leaf routines defined as
6800DUP
	LDX	PSP	; will doing this one byte at a time
	LDAA	0,X	; be easier to optimize or harder?
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
	RTS

6800ONEP
	LDX	PSP
	INC	1,X	; Have to add byte at a time anyway.
	BNE	ONEPNC
	INC	0,X
ONEPNC	STX	PSP
	RTS
*	LDX	PSP
*	LDAA	0,X
*	LDAB	1,X
*	ADDB	#1
*	ADCA	#0
*	STAA	0,X
*	STAB	1,X
*	RTS

6800SWAP
*	LDX	PSP	; Use accumulaters for intermediates
*	LDAA	0,X	; Requires special case to recognize what's where.
*	LDAB	2,X
*	STAA	2,X
*	STAB	0,X
*	LDAA	1,X
*	LDAB	3,X
*	STAA	3,X
*	STAB	1,X
*	RTS
	LDX	PSP	; Should be easier to optimize.
	LDAA	0,X	; SWAP should almost always optimize out.
	LDAB	1,X
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	PULB
	PULA
	STAA	2,X
	STAB	3,X
	RTS
 
6800CAT
	LDX	PSP
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STAA	0,X
	STAB	1,X
	RTS

* in-line:
6800COUNTinline:
	LDX	PSP	; DUP
	LDAA	0,X
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP
*
	LDX	PSP	; 1+
	INC	1,X
	BNE	ONEPNC
	INC	0,X
ONEPNC
	STX	PSP
*
	LDX	PSP	; SWAP
	LDAA	0,X	; SWAP should almost always optimize out.
	LDAB	1,X
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	PULB
	PULA
	STAA	2,X
	STAB	3,X
* 
	LDX	PSP	; C@
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STAA	0,X
	STAB	1,X
*
	RTS

* Vacuum out the easy data motion:
6800COUNTpass1
	LDX	PSP	; DUP
	LDAA	0,X
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP	; Make the push permanent.
*
*	LDX	PSP	; 1+
	INC	1,X
	BNE	ONEPNC
	INC	0,X
ONEPNC
*	STX	PSP
*
*	LDX	PSP	; SWAP
	LDAA	0,X	; SWAP should almost always optimize out.
	LDAB	1,X
	PSHB		; avoid opportunities to make interrupt-time issues
	PSHA
	LDAA	2,X
	LDAB	3,X
	STAA	0,X
	STAB	1,X
	PULB
	PULA
	STAA	2,X
	STAB	3,X
* 
*	LDX	PSP	; C@
	LDX	0,X
	CLRA
	LDB	0,X
	LDX	PSP
	STAA	0,X
	STAB	1,X
*
	RTS

* Some easy combinations and data movement tracking:
6800COUNTpass1_1
	LDX	PSP	; DUP
	LDAA	0,X
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP	; Make the push permanent.
*
*	INC	1,X	; 1+
	INCB		; 1+
	BNE	ONEPNC
*	INC	0,X
	INCA
ONEPNC
*
*	LDAA	0,X	; SWAP should almost always optimize out.
*	LDAB	1,X
*	PSHB		; avoid opportunities to make interrupt-time issues
*	PSHA
*	LDAA	2,X
*	LDAB	3,X
*	STAA	0,X
*	STAB	1,X
*	PULB
*	PULA
	STAA	2,X	; SWAP
	STAB	3,X
* 
	LDX	0,X	; C@
	CLRA
	LDB	0,X
	LDX	PSP
	STAA	0,X
	STAB	1,X
*
	RTS

* Combine one more,
6800COUNTrearrange
	LDX	PSP	; DUP
	LDAA	0,X
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP	; Make the push permanent.
*
	INCB		; 1+
	BNE	ONEPNC
	INCA
ONEPNC
	STAA	2,X	; SWAP
	STAB	3,X
* 
	LDX	0,X	; C@
*	CLRA
	LDB	0,X
	LDX	PSP
*	STAA	0,X
	CLR	0,X
	STAB	1,X
*
	RTS

* Final
6800COUNTfinal
	LDX	PSP	; DUP
	LDAA	0,X
	LDAB	1,X
	DEX
	DEX
	STAA	0,X
	STAB	1,X
	STX	PSP	; Make the push permanent.
	INCB		; 1+
	BNE	ONEPNC
	INCA
ONEPNC
	STAA	2,X	; SWAP
	STAB	3,X
	LDX	0,X	; C@
	LDB	0,X
	LDX	PSP
	CLR	0,X
	STAB	1,X
	RTS

*6800COUNTmaybe	; No obvious alternate paths.

JFTR, this code has not been particuly tested.

Sunday, February 11, 2024

Assembly Language Programming Primer -- Preface

Preface

Lance Leventhal wrote a series of books on assembly language programming that many regard as the Assembly Language Programming Bible for their favorite microprocessor.

I theoretically own, somewhere, if the rain hasn't destroyed them yet, a copy of two of his books, the one on the 6809 and the one on the 68000. Thirty years or so ago, I kept trying to use them to help me with a certain project that I have given way too much of my life to, until I realized two things:

The first was that I already understood everything in both of those books, just from studying Motorola's programmers' manuals. Motorola had some really amazing documents at the time. Not perfect, but amazing.

The second was that the approach he had taken was leading me away from my goals in that project in subtle ways. He wasn't teaching assembly language programming, he was teaching a specific discipline that was being taken up by many in the industry as intellectual meta-infrastructure.

Not incidentally, I was a fan of an internal combustion engine that I read about in Popular Science that a guy named Turner designed, that a friend once looked at and said, "That's a U-joint turned into an engine." I don't remember if it was Turner or the author of the Popular Science article on the engine who called it a rotary V engine -- as in rotary V-8, which makes it sound like a contradiction in terms.

Another friend doubted that the torsional stresses could be properly taken care of. I asked him about camshafts, and he said, because we've already done that with camshafts.

Maybe I'm a hopeless fan of underdogs.

Turning that engine into a product for ordinary consumers would have required creating a support infrastructure -- mechanics and repair shops trained in subtle technical differences that would not just know how to work on it, but how it works. In addition, there would be parts suppliers, and sales networks and such. But the ordinary internal combustion engine support infrastructure is different, and somehow gets in the way of building support structures for odd-ball engines like that Turner rotary V, and the Wankel engine that Mazda finally gave up on, among many others.

Infrastructure for electric vehicles has been significantly helped by a certain billionaire's involvement. (And rotary engines seem to being brought along, in a comeback as range extenders.)

Established infrastructure gets in the way of many interesting things in our society.

Somebody famous has said,

The perfect is the enemy of the good.

Many somebodies have parroted this truistic aphorism.

We can logically invert this.

The mediocre good is the enemy of perfection, and therefore the enemy of progress.

Now, mind you, human ideals can never be realized. Ideal is not real. God's ways are not our ways, and his thoughts are not our thoughts (Isaiah 58: 8, 9). But that is not the same thing as saying we should prioritize the mediocre good because perfection is hard to achieve.

Nor is it saying that we can get along without ideals. Ideals are necessary, just not the ultimate end. Ideals keep us from becoming mediocre.

I'm rambling.

What does this side ramble have to do with assembly language primers?

I'm trying to find a step forward on making something real of at least parts of that project that has eaten too much of my life. (Am I a glutton for punishment?)

Some FB friends who like Leventhal's book on the 6809 asked why I said it's not my favorite, and have challenged me to write a better primer.

So I'm thinking maybe such a set of tutorials would be a workable step forward on that project.

That's kind of what I have in mind for this series of blog posts.

But I need to lay out a warning up front that my approach to assembly language is subtly incompatible with most of the existing programming tools. It looks really good until you realize that, if you use my approach, you will sooner or later find yourself trying to, as it were, walk on air, unless you take the time to understand the differences and are willing to develop some of your own tools and share them.

I don't have the means to develop all the missing support tools by myself.

The CPUs aren't quite there, either.

My favorite existing processor, Motorola's 6809, isn't really quite up to my approach, because it's missing an indexing mode that would allow using the Load Effective Address instructions to take the address of a variable in the run-time direct page. We can work around this, but it consumes instructions and processor cycles.

The 68000 isn't quite up to it, either. Not because the 68020 is too complex and the original 68000 doesn't have 32-bit constant offset index modes and it's hard to get Motorola's CPU32 without a lot of SOC stuff that you don't want to mess with initializing, etc. The problem is that the 68000 has all these index registers, but using them as segment registers gets in the way of using them as index registers. LoL. Not really a problem. We can work around this, as well.

Speaking of segment registers, Intel's x86 could have been a bit more workable -- if still clunky, but BP and SP are stuck in the same segment unless you use segment override. And you really don't want to do that because it gets in the way -- BP designed as frame base pointer instead of parameter stack pointer is definitely one of the things we would be fighting against on x86. There are workarounds, such as ignoring that the two stacks are in the same segment. (The Extra Segment doesn't work for this, last time I tried it. That is, you really don't want to be losing access to the parameter stack whenever you want to move strings.) I think I'll let someone else play with that. At least, I'm not interested in doing so again right now. I've seen what happens when that siren song takes its toll.

Existing CPUs that implement the primitives of Forth-like languages or LISP kernels don't quite do the job either. At least, I have never found one such that included support for such things as virtualizing address spaces. Real processors are all designed to work within the existing meta-infrastructure.

Motorola's 6800 and 6801 actually get closer than you might expect (ignoring the lack of direct-page opcodes for the unary operators), if you define the use of a few direct page variables as virtual registers -- software parameter stack pointer and such -- and define support routines for the virtual registers.

RISC processors can be similarly supported, but need macros instead of routines because of the cost of branches in deeply pipelined architectures.

Of course, that's also true of any processor that doesn't include the necessary registers and operations as part of the native CPU architecture, but it consumes cycles and memory resources. And there is always that siren song of false optimizations that lures you away, into the existing meta-infrastructure.

I keep hearing from people rediscovering "stackless". That's one of those false optimizations. Why it's a false optimization takes a lot of explanation, though. I expect I'll at least partly cover that in the process here.

As we go, I'll demonstrate the necessary virtual registers and support routines for the 6800 and 6801. Once we see what we can do with them and what walls we run into without, we can talk about why and how to apply it other CPUs.

But tutorials need to start simply. Explaining why one needs a parameter stack while you are teaching how to build a software stack while you are explaining what LDAA 0,X means gets kind of unwieldy.

And if you are looking for a primer, all this talk of esoteric problems isn't getting you started writing assembly language programs.

Anyway, be warned:

If you proceed with this series, I'm going to teach you things that will make you dissatisfied with the tools you have available, and I don't have the means of creating the missing tools by myself. I hardly have the time to write the tutorial chapters.

So, what does the example I just gave of the 6800/6801 indexed mode load accumulator A

LDAA 0,X

mean? And the 6809 equivalent,

LDA ,X

And the 68000 near-equivalent,

MOVE.B (A0),D0

and the difference between that and, say,

MOVE.L (A0),D0

on the 68000?

Ick. Jumping in too deep already, perhaps. Definitely needs context and underlying concepts. Maybe start with accumulators and data registers.

(Title Page/Index)

Monday, March 20, 2023

Liar's Paradox Revisited by a Programmer

I'm not sure I've blogged about the liar's paradox before, but that's not what I mean by revisited.

A common version of this paradox is that someone looks you in the eye and says,

I'm telling you a lie.

Well, this isn't really hard in the real world. If it's a good friend, you punch him in the arm and say,

Yah, yer just messin' with my mind.

If it's someone you don't know very well who is trying to sell you real estate, you cough politely and say,

Yeah. Right.

And get out of there as fast as is permissible.

But logicians love to have their minds messed with, so they say that is all too unsatisfactory. We must either find a way to say it's true or it's false, or find a good general reason why we don't need to.

No, we don't really need to do this. But since I took a little time to listen to a YouTube video on the subject today, I'm going to give you several good, general reasons why.

First, let's set the problem up sort-of formally:

This sentence is false.

The above sentence has the same problem. We can resolve it by noting that some sentences are neither true nor false.

But then we have logician A say,

This sentence is not true.

And logician B, who has binary categorization compulsion, says, "But ...

If it's true it's not true, and that doesn't work.
If it's false it's true, and that doesn't work.
And if it's neither true nor false, it's true, and that doesn't work.

Which ignores the possibility of a sentence being both true and false, but we'll wink at that for a moment.

Logician B then continues with, "How about this one? ...

The sentence below is not true.
The sentence above is true.

If you're willing to walk along with this logician, you can see the problem he is facing.

Now, if you know much about programming, you might recognize that, whether you split this up or say it in one sentence, you are dealing with recursive definition, a definition that (in other words, recursively) tries to define itself, or a set of definitions that mutually recursively define each other.

If you write a program this way, you tend to either blow your program stacks or, if the recursive part can be optimized to tail recursion, put the program into an endless loop.

Recursive definitions need a resolving definition path just like recursive programs need a terminating condition. And, just like recursive programs without a terminating condition are not valid programs, recursive definitions without a way to resolve the recursion are not valid definitions -- in terms of formal logic.

Just like reasoning from a false hypothesis is known to be an error in formal logic, trying to resolve a definition that is known not to be valid is known to be an exercise in futility.

(And there is also a corollary in division by zero, really.)

Now, the real world is not binary. We can talk about Planck's constant and about quantum mechanics, but the real world is not really composed of discrete values, discrete conditions, discretely resolving logical states, etc. Not for our purposes in this thermodynamic world.

A statement can, in fact, be both true and false. Let's look at the statement I started with:

I am telling you a lie.

Depending on the context,

this may be a signal for the listener to look deeper at what the speaker is saying, in which case, we can say it is true for our purposes.
Or it may be an attempt to give the listener a false sense of security, in which case we can say it is false for our purposes.
Or it may be a riddle designed to entertain, in which case it is both true and false. (See?)
Or the speaker may just be talking randomly, in which case it might be something we can neither call true nor false.

There are, indeed, four possible results when analyzing any such assertion. We only need to restrict our set of acceptable results when we choose to restrict our set of acceptable results, and we simply should not choose to do so when dealing with assertions like these.

Similar reasoning holds for

I am not telling the truth.

One more, consider the statement

I am shading the truth.

This is a statement that inherently disallows binary interpretation, so even compulsively binary logicians can't complain. It's not a good thing to shade the truth, but a person can say he is without invoking the liar's paradox.

Thursday, October 13, 2022

Notes on building m6809-gdb for use with XRoar

Ciaran has a page up on his local build of an older version of gdb for working with 6809 and XRoar at

https://www.6809.org.uk/dragon/m6809-gdb.shtml

I am running Ubuntu, I think it's 18 (need to upgrade that sometime, I suppose, but no rush.)

I'm building XRoar from Ciaran's repository and installing it in my user-local executables directory. So the dependencies for XRoar were already installed when I started.

The first time I tried to clone Ciaran's repository for m6809-gdb, network traffic just kept git from being able to load anything. Had to wait and try later, meaning today.

It's about 300 or 400 MB, and it's not a terribly time-consuming build, in the range of thirty minutes with a reasonable workstation configuration and broadband connection.

Following his instructions, i did the configure with the recommended parameters, and it got stuck part-way through. Couldn't find my Ada or flex; and then other peculiarities of my OS configuration got it stuck without those. (Thought I had both installed, but a quick check showed they weren't.) So, ...

I logged out of my working user and logged in to my admin user and got the most recent updates with Ubuntu's apt tool. (Where it's apt-get on Debian, it's just apt on Ubuntu, but pretty much similar. apt update, then apt upgrade.)

Then I looked around for Ada and flex. They are a little hard to find, because apt search and man -k both throw a lot of false-positives up there.

Ada -- the package on Ubuntu is gnat. Gnat-8 was available for my OS, but I loaded the default gnat, which turned out to be gnat-7:

sudo apt install gnat

Flex -- no, this is not that Flex, the old Flex operating system for 6800 and 6809. It is also not Digital Research's FlexOS. (Weren't there trademark issues with that?) Nor was it ...

Flex in this case is the fast lexical analyser generator, a re-implementation of the lex low-level parser (lexer) generator that was used with Yet Another Compiler Compiler (yacc) in putting together the early C compilers during the birth of Unix and C. The package name in Ubuntu is flex, so

sudo apt install flex

I remembered there were also messages about bison, so while I was at it I got that, too:

sudo apt install bison

(Yeah, bison is a re-implementation of yacc. Puns all around.)

When I logged back into my workstation user, the build process got stuck with cached configuration that said I didn't have bison.

Make clean and several other of the usual methods didn't clear that, and I couldn't find config.cache to delete. So I just deleted the whole directory and started over with the git clone step.

This time, it successfully built.

Knowing that it built okay, I did a make clean and re-ran the configure step with a prefix option:

--prefix=/home/myuser/local/mybin

(Not my real file system structure.) My login name in the above would be

myuser

and my user-local place for keeping binaries in the above would be

~/local/mybin

Running the make and make install steps after this configuration option is set allows me to run m6809-dbg when I'm logged in as myuser, but keeps it out of the general system executables. (I also do this with my own asm68c assembler, FWIW.)

Notes on running gdb:

b *0x3971 to set breakpoint at 0x3971 without symbols
ni (next instruction) to single-step
info registers to show CPU registers
c to continue

Saturday, September 24, 2022

A Short Description of VTL-2 Programming (Very Tiny Language pt 2)

(Continuing from VTL-2 Expressions, because some people want to know:)

As I understand it, Very Tiny Language essentially owes its existence to the paucity of support from MITS for the Altair 680. RAM boards from the manufacturer were not just expensive, but they were really hard to come by. And even if you had the RAM, putting anything in the RAM was painful. No floppies, no cassette tape, only the serial port, so you had to load BASIC or the assembler or other tools from paper tape, and that took a long, noisy several hours according to what I have heard. And without tools, it's hard to develop tools in the first place.

So Gary Shannon and Frank McCoy at the famous Computer Store made this very tiny programming language that fit into three ROM sockets on a single ROM board. 760 bytes of fairly tight code. (When I say fairly tight, don't get me wrong. I might kibbitz their work, but they did a pretty good job.)

They took a bunch of short-cuts in the design of the language to keep it tight, which is something to keep in mind while we walk through using the language to write programs.

And it turned out to be useful enough that owners of the 8800 also wanted it, so they wrote a version for that. And then 3rd parties wrote versions for the 6502 and other processors.

Why?

Because it can turn your microcomputer into a programmable (integer) calculator even with very little ROM and RAM. (And it's also an example of a useful, minimal ad-hoc programming language.)

My post on expressions talked about using it without really programming in it. This post will be about programming in it.

Allocation variables!:

Before we can start putting VTL programs in memory, we have to make sure VTL knows where it can store the lines of program we feed it.

(Fasten your seat-belt and hang on, this detour-is-not-a-detour is a bit of a rough ride.)

We need to check the two allocation variables, the current allocation base -- & (ampersand) -- and the top of allocatable memory -- * (asterisk). (Yes, ampersand and asterisk are variables.)

?=&

will print the current allocation base. And

?=*

will print the top of allocatable memory -- except that they have to be correctly set first.

Yes, these are variables, and yes, that's what they are named.

If you want to know the total amount of memory available for programming (and the array, more later), type

?=*-&

-- if they happen to be correctly set. Which is what I should explain now.

To understand the problem, consider that Gary Shannon and Frank McCoy at one time were making a little money off the VTL ROMs they sold. Not much, though. They really didn't want to have lots of different versions of the ROM in their inventory.

Other than the CPU decision, they just wanted the user to be able to plug the ROMs into a ROM board, set the board's address, and turn on power. So they didn't want to program a RAM limit into the ROMs that would not apply to a large number of their customers, even if the customer could fix the limit. Better to make customers aware of what those variables are and why, and ask the customers to set them themselves.

Maybe.

Anyway, if you have the original ROMs or the original source (including the version I started with in bringing VTL-2 up on EXORsim), these variables are not set when VTL-2 starts up.

And, until they are set, you can't start programming.

If you are running VTL-2 inside an emulator, the emulator may do you the convenience of clearing memory before loading the interpreter -- in which case it will be obvious that they are not set.

Or it may not.

And if you are running real hardware, the variables will probably have garbage in them on boot-up, which may be obvious, or which may look valid to the interpreter (and may cause crashes if you try to edit or look at program lines).

On the other hand, if you are working with one of the executable images made from source in which I moved the VTL variables out of the direct page (all the rest of the versions I have posted), I went ahead and put start-up code in the interpreter that sets them -- specifically because I moved those variables and the initial allocation base is no longer a simple question of 264 on a 68XX CPU.

Similarly, C being what C is, VTL-C sets them for you. And others have done likewise, for similar reasons.

So, you need to take a look at them and figure out if they need to be set or not.

Top of allocatable RAM:

The top of allocatable RAM is fairly straightforward. Check it now:

?=*

It should be either equal to the amount of RAM you have (or have given it), if VTL resides (say, in a ROM) in an upper address range, or the amount of RAM minus 1K to 2K -- so that VTL and maybe system stuff like maybe stack can reside in RAM above the allocatable area.

Confusing? Maybe I should draw a map:

0000:	Possible system stuff
(low address):	VTL variables also, input buffer and usually VTL stack
(slightly higher address):	& (allocation base is here.) VTL allocatable space -- for program (and maybe an array) * (top of allocatable area is here.)
(still higher address):	possible system stuff (including, possibly, the host stack)
(even higher address, may be top half of memory):	VTL interpreter code image
(above that):	more system stuff (including, possibly, the host stack)

Hard numbers?

The top of the allocatable area should be something around 30000 or more for 32K RAM systems, 14000 or more for 16K, 6000 or more for 8K, etc. If the interpreter is up in the upper 32K of address space, where ROM often is, top of RAM may be exact powers of two.

However, in some systems, the system stuff at the bottom of RAM is huge, and general purpose RAM itself may not start until something like 16384. And general purpose RAM may not come in simple powers of 2. So you may need to figure out the total RAM size and add it to the physical beginning of general purpose RAM.

(For the MC-10, for example, you probably want to play around with it with my unedited sources in the XRoar emulator, then read the source and figure out how to adapt it to your hardware, to use your RAM configuration effectively. Your likely configurations are 4K, 8K, 20K, and ??, so top of allocatable space is likely to be 1 or 2 K less than one of those.)

(To be strictly accurate, in the MC-10 images and the CoCo images I've produced, I'm letting the VTL interpreter piggyback its stack on the stack the BASIC host uses. This is because the MC-10 interrupt routines, in particular, don't like the stack being somewhere else -- which means you really could eliminate the stack allocation in the source code, but I didn't want to pull too many rugs out from under things.

Without the necessity of maintaining BASIC's stuff, you would need less than 900 bytes at the top of RAM for the interpreter, not 2K. Top of RAM would be above 31000, 15000, 7000, etc.)

Allocation base:

If you are running an original ROM or an image from the original source, the initial value of the allocation base depends on the CPU:

For the 6800, the allocation base at startup is 264, eight bytes above the stack area at the top of the direct page.
For the 8080, they moved the variables up to make space for the CPU interrupt shims at the bottom of memory, and the initial allocation base is 320.

If you are running an image from my modified sources and have not altered the label ZERO or the ORG before where ZERO is declared, the allocation base depends on which source you are using. You can check the assembler listing for the value of the label PRGM.

(Cough. Let me correct myself.)

If you are assembling it yourself from whatever sources, you should check the assembler listing.

Go ahead and find the listing, or re-run the assembler and get it, and look for the label PRGM. If the interpreter is setting things for you, a little bit after the label COLD, or a little before START, you'll see something like


      LDX	#PRGM
    STX	AMPR

That's the allocation base. And you'll see the code setting the STAR variable shortly before the START label. That's the theoretical top of memory.

(If you're running a C language version, it will most likely be 264 on startup. Or, if it's a 32-bit version, 528 or something. Maybe. Look for clues in the C source. Or assume it's good unless it crashes.)

At the time I write this post,

For my 6800 version with just the variables moved, the initial allocation base is hexadecimal $308, which is 776 in base ten.
For my general 6801 enabled, slightly optimized version with the variables moved, it is the same.
For the MC-10 using the general 6801 image, because there is a huge gap where nothing exists below video RAM and because VTL and its variables have to reside above video RAM and memory in use by interrupt routines and such, the initial allocation base is $4400, or 17408 in base ten.
For the Color Computer using the transliteration image, while we don't have that gap of wasted memory map like on the MC-10, we do have memory used by BASIC and interrupt routines, and by the video RAM, which have to reside below the variables. The initial allocation base I'm using is $1708, or 5896 in base ten.

Whew. End of detour-is-not-a-detour.

If the values of the allocation variables don't look reasonable on startup, figure out what they should be and set them before you continue. Otherwise, you will by typing in code and VTL won't be remembering it.

One more point before we return to our originally scheduled programming:

New command:

Now that you have noted and remembered the initial value of the allocation base, consider what happens if you have a program in memory and you type

&=264

or whatever the initial allocation base label was on your system (say, 5896 on the Color Computer or 17408 on the MC-10).

Presto.

VTL has forgotten your program and is ready for a new one.

Programming a counted loop in VTL-2:

With the allocation variables properly set, we can now type program lines in and list them.

In my bring-up process, I used several variations of a simple counting program because it was simple. We'll start by looking at that. If you've been through my expressions post, much of this will be familiar, and you'll be able to see some differences from BASIC.

10 A=0
20 A=A+1
30 ?="A=";
40 ?=A
50 ?=""
60 #=(A<10)*20
70 ?="BYE-BYE NOW"

Lines 10 and 20 look just like BASIC.

If you make a mistake, _ (the underscore key) allows you to back up one space, but the terminal doesn't back up or erase it from the screen. If you have trouble keeping track of what's happening this way, type @ (the at-each key) and start the line over from scratch.

Line 30 has that print command that reminds one of the BASIC shortcut key, but isn't the same. It will just print the string

A=

on the output device. And the semicolon will stop the VTL interpreter from putting a new-line after it.

Line 40 prints the value of A.

Line 50 doesn't have a trailing ; (semicolon), so printing the empty string puts a new line on the output device.

Line 60:

You may remember from the expressions post that # is the current line number. And I mentioned that setting it forces a jump. But what is that weird business about multiplying a comparison by 20?

There is no line 0 in VTL. Trying to jump to line 0 is a no-op.

The value of the comparison is 0 (false) or 1 (true). Multiply that by 20 and you get either 0 or 20. If it is 0, there is no jump. If it is 20, setting the line number to 20 causes a jump to line 20.

Say what?

This is VTL's version of BASIC's IF -- GOTO statement.

VTL re-uses the assignment syntax -- and the semantics -- not just for jumps, but for conditional jumps, too.

Just for the record, the left-to-right parsing means that we don't really need the parentheses on the condition. I put them in there to make them more obvious. Whether you do or not is up to you. Remember that more complex conditions will likely need the parentheses.

So, that's how this counted loop works. If you run it.

Can you guess how to run it?

That's right. Type

#=1

on a line by itself.

OK
#=1
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9
A=10
BYE-BYE NOW

Huh? Why not #=10?

#=10 would also work, but VTL does you the convenience of looking for the next higher existing line number and starting there, so #=1 will run any program.

What about line 0? Try it if you haven't yet.

Nothing? Remember, VTL forbids line 0.

That means that, when you type 0 by itself on a line, it does not try to edit, save, or delete a line 0. And the authors took advantage of that.

OK
0
10 A=0
20 A=A+1
30 ?="A=";
40 ?=A
50 ?=""
60 #=(A<10)*20
70 ?="BYE-BYE NOW"

Typing a 0 by itself on a line gives you a listing, instead -- if the allocation variables define memory to remember a program in, and if there is a program in there.

Some versions of VTL use an L instead of a 0 as a listing command, just for the record.

Programming Factorials in VTL-2

This one is from the PDF manual, but I've modified it just a little bit for no particular reason.

Since VTL wants us to be frugal, we'll take the cumulative approach.

LIne 10, N will be the index number.

Line 20, F will be the cumulative factorial.

Line 30, we want to print N,

then, in line 40 print some text to show what it is,

and, in line 50, print F.

In line 60, we'll terminate the output line.

In line 70, we'll increment remember F in G.

In line 80, we'll increment N,

and in line 90, we'll use the new value of N and the old value of the factorial to obtain the factorial for the new N.

In line 100, we'll see if the new factorial is so large that it wraps around. This only works once, and we only know it works because we know it works. We might as well check to see if N is less than 9.

And, in fact, the first time through, it doesn't work. We don't have a less than or equal comparison, so we'll add a line for the equality at 95.

10 N=0
20 F=1
30 ?=N
40 ?="! = ";
50 ?=F
60 ?=""
70 G=F
80 N=N+1
90 F=F*N
95 #=(G=F)*30
100 #=(G<F)*30

and that prints out the factorials of 0 through 8.

Subroutines in VTL:

Print the value of the (system) variable ! (exclamation mark).

?=!

If you just ran the program above, it should give you 101.

VTL does a little more than just set the line number when it sets the line number. (I told a little lie back there about the semantics. Or, at least, not the whole truth. It's doing a little more than re-using the semantics of assignment.)

Before VTL sets the # line number variable, it saves the current line number plus 1 in the ! (exclamation mark) variable.

So you get one level of call and return. Exclamation mark is the current return line number.

Don't get too excited, it's just one level. If you want to further call subroutines, you have to save the current return value first, yourself, somewhere.

There are other system variables, but I will stop here.

This should be enough to help you decide whether or not you want to go looking for the PDF manual with the rest of the system variables listed and some example programs to try.

One place the PDF manual for the Altair 680 can be found (with some other retro stuff) is at http://www.altair680kit.com/ .

The manual includes more sample code.

One downside of VTL-2 on the MC-10 or Color Computer is that we don't have a way to load the programs in other than typing them. If we had a true monitor, we could save and load the program area. Or someone could extend VTL-2 on these two, to create cassette tape save and load functions, perhaps -- someone besides me. I think I need to get back to some other projects.

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F

16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F

16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F

Friday, March 1, 2024

Accumulating Integer Results on the M6809, 6800, 6801, and M68000

Assembly Language Primer

Joel Matthew Rees

Copyright 2024 Joel Matthew Rees

Basics

Sunday, February 25, 2024

Sunday, February 11, 2024

Preface

Monday, March 20, 2023

Thursday, October 13, 2022

Saturday, September 24, 2022

Allocation variables!:

Top of allocatable RAM:

Allocation base:

New command:

Programming a counted loop in VTL-2:

Programming Factorials in VTL-2

Subroutines in VTL:

Accumulating Integer Results
on the M6809, 6800, 6801, and M68000

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F

16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
10	11	12	13	14	15	16	17	18	19	1A	1B	1C	1D	1E	1F