joel's programming fun: August 2022

Sunday, August 28, 2022

Tandy/Radio Shack MC-10 Assembly Language, pt 1 -- Getting VTL-2 (Very Tiny Language) Running

Tandy/Radio Shack MC-10 Microcomputer
photo by Simon South, from Wikimedia,
licensed under GNU Free Documentation License

(This is part 4 of the VTL series.)

The MC-10 is a very stripped-down computer based on the 6801 microprocessor and the 6847 video display generator that Tandy/Radio Shack produced in 1983, a couple of years too late and priced too high to compete in its target market, against the Commodore VIC-20 and the Timex/Sinclair ZX81.

Yes, if Radio Shack management had recognized the market , they could have released essentially the same computer in 1981, maybe with a non-Microsoft BASIC. I think it would have been very competitive in 1981.

Part of the reason I find it interesting is that it is similar to Motorola's Micro Chroma 68 prototyping kit, which was my first computer back in 1981.

Anyway, I have been working over the last several weeks, on adding functionality to my assembler for the 6800/6801, asm68c, so that it can produce the .c10 format file that the MC-10 can read. That is in itself worth a post, but not now. My purpose in doing so was to make it possible to install VTL-2 (Very Tiny Language) and fig-Forth on the MC-10.

In order to get either running on the MC-10, I need to arrange to get input from the keyboard and output to the display. In fact, pretty much anything you want to do beyond graphics is going to pretty much have you using the keyboard and the display.

Output to the raw display in assembler is actually not all that hard, and input from the raw keyboard is not particularly impossible, but especially the keyboard would require writing and debugging a lot of code. If the BASIC ROM provides routines I can use I might as well use them at this level of work.

Writing a simple set of BIOS class routines can be a project for a hypothetical 'nother day. Others have done something along this line, providing a more capable BASIC interpreter that doesn't get in the way of high resolution (for the 6847 controller) graphics and such.

I remembered that someone had posted information on using the BASIC ROM routines in the MC-10 group Facebook group, so I went hunting and found a disassembly file of the BASIC ROM that Simon Jonassen had posted, mc10.txt. And then Greg Dionne told me about a full assembly listing that is also available, MC10_ROM.txt.

In these files, I found a list of useful hooks into the ROM just under the interrupt vectors, starting at address $FFDC. The first two hooks are


    FFDC | F8 83    POLCAT  fdb  KEYIN   ; read keyboard
FFDE | F9 C6    CHROUT  fdb  PUTCHR  ; console out

But that's not enough information to be confident I can use them.

You can go to the labels KEYIN and PUTCHR in the listing and find more complete descriptions. For PUTCHR, you find this (plus a bit more):

* Send character in ACCA to the current output device.

Likewise, for the KEYIN routine, you find this:

* Poll the keyboard for a key-down transition.
* Return ASCII/control code in ACCA or 0 if no key-down transitions.

It looks straightforward, but I have found there are often a lot of hidden assumptions behind such routines. It's usually wisest to test them with simple code before you go trying to wrap whole parsers and interpreters around them.

And I was, for some reason, a little suspicious of the POLCAT routine -- too suspicious, maybe.

And that's the subject of this post.

My first attempt was pretty simple. Scan the keyboard, output the result, wait a while so I have time to see the results, repeat. Except that I put in code that adapted it to my source for VTL-2, which wants to use the B accumulator to get the characters in and out instead of the A accumulator:


    *	INPUT ONE CHAR INTO B ACCUMULATOR
INCH	PSHA
	PSHX
	LDX	INCHV
	JSR	0,X
	PULX
	TAB
	PULA
	RTS
*
*	OUTPUT ONE CHAR 
OUTCH	PSHA
	PSHX
	LDX	OUTCHV
	TBA
	JSR	0,X
	PULX
	PULA
	RTS

Then I ran out of time. (I'm working on this in the evenings and on the weekends).

And when I came back I had forgotten that the B accumulator shims were in there.

Wasted a whole evening "discovering" that the BASIC ROM was actually using the B accumulator, in contradiction to all claims and appearances! And started planning this post to explain how I discovered this pseudo-fact. (Bleaugh!)

I suppose I could retrace my steps in those "discoveries". Outside of the fact that I was ignoring the shims that were right in front of my nose, I was actually using useful debugging techniques. But since the whole process was based on false assumptions, that would be confusing. And somebody might end up thinking those pseudo-facts were facts. So I won't.

Instead, in this post, I'll walk through the tests as I should have done them. Or at least a few of the steps.

(And, again, I ran out of time. Too many distractions. I hope I can remember what I was doing next time I pick this up.)

Okay, so I tried this, somewhere along the line:


    	OPT	6801
	ORG	$4C00

* MC-10 BASIC ROM vectors
POLCATV	EQU	$FFDC	; Scan keyboard	
CHROUTV	EQU	$FFDE	; Write char to screen


TEST	LDX	POLCATV
	JSR	0,X	; scan
	TSTA		; 0 means no key pressed
	BEQ	TEST	; That's going to play Hobb with CTL-@.
	LDX	CHROUTV
	JSR	0,X	; print to screen and advance
*	CLRB
*	CLRA
*LOOOP	SUBD	#1	; wait a sizeable fraction of a second
*	BNE	LOOOP
	BRA	TEST	; get more
*
	ORG TEST

You'll notice that I also tried a wait loop, which is now commented out. Without the wait loop, it sits there and waits for me to hit a key and then outputs it at the current location on the screen.

With a little more code and testing, I was able to convince myself that it does what the programmer who wrote the comments considered polling (although it isn't what I consider polling) and I figured out how to work the interface:


    * MC-10 BASIC ROM vectors
INCHV	EQU	$FFDC	; Scan keyboard	
OUTCHV	EQU	$FFDE	; Write char to screen
*
*	RECEIVER POLLING
POLCAT	PSHA
	PSHX
	LDX	INCHV	; at any rate, don't wait.
	JSR	0,X	; 
	TAB		; MC-10 ROM says NUL is not input.
	SEC
	BNE	POLCATR	; Don't wait.
	CLC
POLCATR	PULX
	PULA
	RTS
*POLCAT	LDAB	ACIACS
*	ASRB
*	RTS
*
*	INPUT ONE CHAR INTO B ACCUMULATOR
INCH	BSR	POLCAT
	INC	$400F	; DBG
	BCC	INCH	; Wait here.
	INC	$4010	; DBG
	STAB	$4011	; DBG
	RTS
*
*	OUTPUT ONE CHAR 
OUTCH	PSHA
	PSHX
	LDX	OUTCHV
	TBA
	JSR	0,X
	PULX
	PULA
	RTS

But when I tried to load VTL-2 with this (CLOADM), BASIC gave me I/O errors. And I noticed that the output object was relatively huge.

And I remembered that the ORG and the RMBs that set up the direct page variables will cause the .c10 file to include object where there is no memory on the MC-10 -- in the range from $0100 to $4000. Trying to CLOADM binary code where there is no memory will cause I/O errors from BASIC.

So I decided, as a quick fix, to change the direct page declarations to EQUs instead of RMBs:


    * 	ORG	$C0	; Move this according to your environment's needs.
DPBASE	EQU	$C0	; Change this to move the registers.
* PARSET	RMB	2	; Instead of SAVE0 in TERM/NXTRM
PARSET	EQU	DPBASE+2
* CVTSUM	RMB	2	; Instead of SAVE1 in CBLOOP
CVTSUM	EQU	PARSET+2
* MLDVCT	EQU	CVTSUM	; Instead of SAVE1 in mul/div (1 byte only)
MLDVCT	EQU	CVTSUM
* DIVQUO	RMB	2	; Instead of SAVE2 in DIV
DIVQUO	EQU	MLDVCT+2
* MPLIER	EQU	DIVQUO	; Instead of SAVE2 in MULTIP
MPLIER	EQU	DIVQUO
* EVALPT	RMB	2	; Instead of SAVE3
EVALPT	EQU	MPLIER+2
* CNVPTR	RMB	2	; Instead of SAVE4
CNVPTR	EQU	EVALPT+2
* VARADR	RMB	2	; Instead of SAVE6
VARADR	EQU	CNVPTR+2
* OPRLIN	RMB	2	; Instead of SAVE7
OPRLIN	EQU	VARADR+2
* EDTLIN	RMB	2	; Instead of SAVE8
EDTLIN	EQU	OPRLIN+2
* INSPTR	RMB	2	; Instead of SAVE10 (maybe? Will some VTL programs want it back?)
INSPTR	EQU	EDTLIN+2
* SAVLIN	RMB	2	; Instead of SAVE11
SAVLIN	EQU	INSPTR+2
* SRC	RMB	2	; For copy routine
SRC	EQU	SAVLIN+2
* DST	RMB	2	; ditto
DST	EQU	SRC+2
STKMRK	EQU	DST+2	; to restore the stack on each pass.
DPALLOC	EQU	STKMRK+2	; total storage declared in the direct page

That's a little tiresome work, adding the previous label plus two in place of all the RMBs, but it does the job.

(I think I'm going to regret doing it this way, later.)

That loaded, but then when I tried to EXEC it, it just went away and didn't come back.

After some thought, I decided to use the BASIC stack instead of making a private stack for VTL-2. I'll need to check that it really is enough stack, but I thought that was the first thing to try, and that's what the second-to-last line there defining STKMRK is for. (Okay, that's two steps in one I'm showing.)

The interpreter wants to reset the stack each time through, so it needs something to reset the stack to, and that is why I'll save the BASIC stack position to STKMRK on entry from the COLD label.

And that worked well enough to give me a cursor. But what I typed at the keyboard did not show on the screen.

So I added some debugging assembly language equivalent of POKEs to the screen. With the right debugging output to screen

(see the source of a later step at https://osdn.net/users/reiisi/pastebin/8573, the lines with DBG in the comments),

I was able to see that the commands were actually going in, and that they were being recognized, because I was able to get good output from typing

?=*

even though the command itself did not show on the screen. And that quickly led to recognizing that the MC-10 does not echo what you type until after BASIC parses it. So I needed to decide where to add code to echo the input. The EXORsim code simply assumes that INCH will echo for me, so I decided to follow that approach and just add an output JSR in the INCH routine. Should work.

Let's publish this rant-in-progress to my blog while I see if it works.

And, it did.

Okay, now the echo works. And I added what I thought would store the stack pointer in the Z variable so I could examine it, but I did it like this:


    START
*	LDS	#STACK	; re-initialize at beginning of each evaluate
	LDS	STKMRK	; from mark instead of constant
	STS	VARS+25 ; DBG so we can get a look at the stack pointer BASIC gives us.

That should have been VARS+25*2. So, instead of in Z, it shows up spread across L and M, which is an eye-crosser to read. That's easy to fix. And I should remove the other DBG lines before I forget.

Now I can see where the stack is when BASIC passes control to VTL-2.

Why should I worry?

The last byte assembled by the above source is at $4F67. The end of memory for a 4K MC-10 is $4FFF. I think that should be enough space between the code and what BASIC is using at the end of RAM, but putting that stack pointer somewhere I can see it helps me figure if it really is enough.

But checking the memory bounds variables shows that my memory probe code is not working right. & is set right, but * ends up with the probe having not run at all. Still, if I overwrite the * variable with 19456 ($4C00), I can enter a simple program, list it, and run it.

Hmm. It looks like I have the branch upside down in the probe somehow. How did I have that working before? Did I?

[Sort of. But, yes, the test is upside down. I am embarrassed. More below when I get it fixed. It's lousy, still having to work a day job instead of devoting full time to playing games like this.]

Somewhat of success, but I think that's all I have time for tonight. (What day was this?)

Picking this back up on the 3rd of September, I'll give you the summary version of what happened with the probe test that was upside down. Yes, this is embarrassing.

You know how I got all enthusiastic about the improved CPX in my explanation of the software differences between the 6800 and the 6801? Well, when I was adapting Joe H. Allen's simulator, I forgot to make sure the carry flag got set in the CPX emulation routine when running as a 6801. Probably also failed to set the other flags right. So, in my initial successful code for EXORsim running a 6801 core, I had the test inverted.

So I had to go back and fix that in EXORsim6801, and test it a little, and move things around a bit in my blogs and OSDN stuff, and that's where I've been for three weeks or however long it took.

I should describe how to get this stuff running on the Xroar MC-10 emulator at this point, I think. Get the source from the pastebuffer in my OSDN pages:

https://osdn.net/users/reiisi/pastebin/8607 (But see notes for 20220904 and 20220911 below.)

Assemble with the switch to output the .c10 file. The command line

asm68c -l2 -c10 VTL_6801_mc10.asm

should give you the listing to the screen so you can check for errors. Or you can redirect the listing to a file

asm68c -l2 -c10 VTL_6801_mc10.asm > VTL_6801_mc10.list

and pull the listing file up in a text editor.

Run Xroar with the command line

xroar -machine mc10

You may need to add some to the configuration file to get that to work, I don't remember. (I put an ampersand on the end to tell the bash shell to run it concurrently with the terminal session, so I don't have to open another terminal.)

Selecting the tool menu will allow you to select keyboard translation, which I need (with my Japanese keyboard), and, more importantly, open up the tape control:

Click the insert button in the tape control dialogue and select the file VTL_6801_mc10.c10 (assuming you've given the assembler file the same name as I show above).

Now click the emulator window to make it active again, and type in the CLOADM command.

Go back to the tape control dialogue and hit the play button, and the MC-10 emulator should load the virtual tape for something like twenty virtual seconds (virtual seconds! Goes by very quickly, less than a real second.) and then tell you it got it. Below the file name, type in the EXEC command with no start address.

I'm not sure whether the CLOADM command in the stock MC-10 BASIC allows specifying a load address. Nor am I sure whether the EXEC command allows specifying an execute address. But the lowest ORG directive in the assembler file sets the load address for the C10 file. And the ORG directive at the end of the assembler file sets the starting address in the S-record output, with the C10 output setting the same starting address. So you don't need to tell BASIC where to load or where to execute.

Anyway, entering the EXEC command should give you an OK prompt ~~and a cursor~~ without a cursor, and you may think nothing happened until you try typing something.

[202209111413 add:]

No cursor in the source I have up. Adding a cursor is pretty simple if you look through the listing of BASIC A flashing cursor takes a little more thought and effort. I'll leave that as an exercise for the reader, since I want to move ahead to the 6809 transliteration.

[202209111413 add end.]

VTL doesn't give many error messages, it mostly just doesn't do anything it doesn't understand. So if you type "ZZ" and hit enter, rather than a cryptic BASIC error message, you just get another OK and more cursor:

Test it by typing in a math expression, like

?= 1+1*2

The ?= is like BASIC's PRINT comman. But you'll quickly notice that VTL's calculator parsing is not the algebraic infix you learned in school and get from BASIC. When you need to alter the left-to-right parsing, use parenthesis:

?= 1+(1*2)

The ampersand variable & tells you where in memory your program will get stored. It's kind of the bottom of currently available memory. (Kind of.)

The asterisk variable * tells you what the highest available address for VTL programs is. (The way I've set it up, it's the byte before the VTL interpreter itself starts.)

Both the & and the * are supposed to be set by hand in the original VTL-2, but I had to move so much around that it just made more sense to have the interpreter probe and set them for you.

I did check whether it should run in a 4K MC-10.

xroar -machine mc10 -ram 4K &

With that, VTL-2 returns these values for the beginning and end of the programming area, and I ended up putting the BASIC stack pointer marker in the semicolon variable:

?=&
17416
?=*
19455
?=;
20375

That's

$4408 begining of VTL programming area
$4BFF end of VTL programming area
$4F97 BASIC stack mark

The last address assembled in the code as it stands now is $4F67. So you have $4F98-$4F68 or $30 (decimal 48) bytes of stack space before the stack overwrites the return from OUTCH, and VTL crashes. That should be enough, especially for programs that fit in the very small programming area.

It's plenty to run a short test program:

10 A=0
20 A=A+1
30 ?=A
40 ?=""
50 #=(A<20)*20
60 ?="DONE"

If you want to give VTL-2 more room to work, you can set the ORG before COLD to the highest address your memory configuration allows, about one or two kilobytes less than your end of memory. And that should be the only thing you have to change, if I did my job right in the source.

Uh, yeah. The object image for VTL is less than a kilobyte. Just so you know. Just so you're prepared for the limits.

This is not thoroughly tested, and has some known problems -- like random number generation is not quite functional.

[202209041331: add (more embarrassment!)]

I made a bad optimization to the random function in the above, and in the EXORciser versions. Don't know how I missed this one, either. Blame it on my age?

At the label AR2 in the source code, I falsely corrected, without thinking, the addition of the low byte to the high, and vice-versa. My false correction looks like this:


    AR2	STD	0,X	; STORE NEW VALUE
	ADDD	QUITE	; RANDOMIZER
	STD	QUITE
	RTS

Looks reasonable, right? As long as you ignore the comment about randomizer?

Well, here's what it looked like in the original 6800 code:


    AR2	STAA	0,X	; STORE NEW VALUE
	STAB	1,X
	ADDB	QUITE	; RANDOMIZER
	ADCA	QUITE+1
	STAA	QUITE
	STAB	QUITE+1
	RTS

See what was going on?

Well, I probably should fix the pastebuffer, but OSDN won't let me today. Maybe I have too many. But I don't want to publish this code in a repository without some explicit permission from the authors. Anyway, the fix I recommend is below, with a couple of lines of code to detect lack of initialization and semi-auto initialize it:


    AR2	STD	0,X	; STORE NEW VALUE
	BNE	AR2RND	; Initialize/don't get stuck on zero.
	INCB		; Keep it known cheap.
*	ADDD	QUITE	; RANDOMIZER	; NO! Don't do this.
AR2RND	ADDB	QUITE	; RANDOMIZER	; Adding the low byte to the high byte
	ADCA	QUITE+1	;		; is cheap but intentional.
	STD	QUITE
	RTS

Just search the code for "RANDOMIZER", cut the bad code out, and paste in the fix. Or, better yet, edit it with your own, improved random function.

Initialization -- yeah, this is another place where the original code left initialization out to keep the code tiny.

[202209041331: add end]

[2022091112291333: add]

While transliterating for the 6809, I discovered that I missed some more opportunities to optimize for the 6801.

If you look for the label

SUBTR SUBD 0,X

you'll notice that, in the 6800 source, it was a very short routine to subtract whatever X pointed to from D. Every call to SUBTR can be replaced with the actual body of the subroutine in the 6801 source, with no increase of code size. You can search for BSR SUBTR and JSR SUBTR and replace each with SUBD 0,X. (I don't think there will be any JSR SUBTR, but if there are, you can replace that, two, for a byte of code saved.)

Really wish I had more time after work to focus on this stuff.

[2022091112291333: add end]

[JMR202209231810: add]

I have written up a post on VTL expressions, here: https://joels-programming-fun.blogspot.com/2022/09/short-description-vtl-2-expressions-very-tiny-language-p1.html , which will help in testing and otherwise making use of the language. I should also shortly have a post up introducing programming in VTL-2.

JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

There is a downloadable version of VTL-2 for the Tandy MC-10 (6801) in the stock 4K RAM configuration in there, with source, executable as a .c10 file, and assembly listing for reference. Look for it in the directory mc10:

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]

Thursday, August 25, 2022

Software Differences between the 6800 and the 6801/6803

For those not familiar with the 6801, the issues when trying to run 6800 code on the 6801 or 6803 would fall under the following categories:

(1) Differences in the condition code calculations for CPX;

(2) Differences in function of any op-codes undefined in the 6800 which may be used;

(3) Timing differences for software timing loops;

(4) Differences in the memory map, which depend on the mode the MPU is operating in.

(Remember that the 6803 is a 6801 with the ROM missing or disabled.)

Revisiting this a week later, I realize that I've left out something important by not mentioning the rest of the 6801's extensions to the 6800:

A and B accumulators are concatenated to form the D double accumulator for certain new instructions. Condition codes fully reflect the results in all 16 bits, which is more important than just getting things done in fewer instructions.
The new instructions that work with the D accumulator are
- LDD and STD, 16-bit load and store of the double accumulator;
- ADDD and SUBD, 16-bit add to and subtract from the double accumulator; and
- LSRD and ASLD/LSLD, 16-bit logical shifts right and left of the double accumulator.
The new MUL instruction also works with the double accumulator, sort-of. It's an 8-bit by 8-bit unsigned multiply of the contents of A and B, leaving the results in D.
You can synthesize a 16-bit multiply using four of these and appropriate ADD instructions, which uses a few more bytes than shifting and adding, but is about seven times as fast.
Improvements in the condition code calculations for CPX, the compare with X instruction which I already mentioned above.
The PSHX and PULX instructions push X to and pop it from the return address stack.
The ABX instruction adds B to X, which is very helpful in accessing record fields and such.
The JSR instruction has a new direct page addressing mode, which can be used to speed calling a few carefully selected short, heavily used routines.
There is now an official branch never instruction, BRN, effectively a two-byte NOP. This can be useful in debugging and in simplifying code generation in some high-level languages.
There were changes in Motorola's assembler and the manual itself which I kind of shrugged my shoulders at, but may be of interest:
- LSL is added as an alias of ASL. (Note that LSLD is an alias of ASLD.)
- BHS and BLO are added as aliases of BCC and BCS, respectively.
  (Branch High or Same == Branch Carry Clear)
  (Branch Low == Branch Carry Set)

Details of Code Porting Issues:

(I'm summarizing information from the MC6801RM (AD2) MC6801 8-bit Single-chip Microcomputer Reference Manual -- )

(1) On the 6800, the CPX instruction is not recommended for anything other than comparing X for equality -- in other words, you would generally want to follow CPX on the 6800 with either BEQ or BNE, but not with other branches.

On the 6801, CPX affects all the condition codes correctly for the 16-bit comparison, and it can be used in the full range of signed and unsigned comparisons. If the 6800 code confines itself to the recommended use of CPX, there should be no problem.

But much of the existing 6800 code does do tricky things with CPX. In particular, it is often used as a NO-OP in the assumption that CPX will not affect the carry flag. In particular, it is often used to hide another op-code in the operand field. For example, on the 6800, with

SKIPR CPX #$8601

executing through SKIPR would only see a probably meaningless comparison of X with hex 8601. But branching to SKIPR+1 would see, instead, LDAA #1. This is a fragile optimization, of course, and the effort to use it usually costs more than whatever it was supposed to gain.

What to do with this? It only costs 1 more byte and actually uses 1 clock cycle less to spell it out properly:


    SKIPR  BRA NOLOAD
LOAD1  LDAA #1
NOLOAD

If that 1 byte is fatal, you can probably find something nearby to clean up slightly and save a byte, perhaps using one of the 6801 extensions mentioned above. You really don't want such fragile optimizations, anyway.

(2) is similar to (1), in that the incompatibilities are the result of tricks engineers really should avoid. (You don't want to find yourself faced with an undefined op-code changing its behavior in future mask sets, not to mention possible architecture extensions like the 68HC11, among other things.) Again, you can usually find places in nearby code to clean up, or to take advantage of the new 6801 instructions in a way that allows avoiding undefined op-code abuse, even if using the undefined op-code actually did save a byte.

(3) is the downside of the improved timings for the 6801 instructions. There's nothing to do here but recalculate the timing constants, which should be enough. (But note the exception for CPX timings.)

Maybe I should try to give a summary of the improved timings:

Branches take 1 cycle less (3 vs. 4).
Branch to subroutine, BSR, takes 2 cycles less (6 vs. 8).
Indexed mode binary operand byte instructions (ADDA/B n,X; etc.) take 1 cycle less (4 vs. 5).
Indexed mode unary byte instructions (ASLA/B n,X; etc.) take 1 cycle less (6 vs. 7).
Indexed mode 16-bit load and store instructions (LDS n,X; etc.) take 1 cycle less (5 vs. 6).
CPX takes one cycle more except in indexed mode, I guess to get the flags right doing it 8 bits at a time:
- immediate is 4 for 6801 vs. 3 for 6800,
- direct page is 5 vs. 4,
- indexed is 6 for both processors,
- extended/absolute address is 6 vs. 5.
Inherent mode 16-bit instructions (DES, TSX, etc.) take 1 cycle less (3 vs. 4).
JMP in indexed mode takes 1 less (3 vs. 4).
JSR gets some nice improvements:
- 5 cycles vs. not available on the 6800 in direct page mode, as mentioned above,
- 6 vs. 8 in indexed mode,
- 6 vs. 9 in extended/absolute address mode.

(4) The differences in the memory map show up at the bottom, top, and middle of memory:

At the top of memory, the 6801/6803 define new interrupt vectors for the built-in peripheral devices. 6800 code probably puts something at those addresses that will need to be moved, and/or the 6801/3 code will need to include instructions that mask out the associated device interrupts.

In the middle of memory, you may find the internal ROM on the 6801, but not on the 6803.

The 6801 in several of its operating modes will have ROM somewhere in the middle of memory -- usually from $F800 to $FFFF, but not always. You can switch the ROM out of memory by the operating mode.

In particular, the 6803, which has no functional ROM, should only be operated in either mode 2 or mode 3, which are the (mostly) external bus modes. Mode 2 keeps the internal RAM (addresses $0080 to $00FF) in the memory map, and Mode 3 switches it out. These two modes can be used to avoid conflicts when existing 6800 code wants to use the addresses where the ROM would be.

At the bottom of memory, you have the devices themselves and the built-in RAM. The built-in RAM is less likely to get in the way, but you can use operation mode 3 to switch it out of the memory map.

The built-in peripheral interface registers cannot all be switched out. There will always be devices at address $0000 to $0003 and $0008 to $001F.

Since the direct page is special, 6800 code should really not be written to conflict with the addresses at the bottom of memory in a way that doesn't allow just moving some of the direct page variables up a bit, but quite a lot of code is written in a way that does conflict.

One example is Very Tiny Language. (VTL-2 is what you will probably find if you go hunting for it.), VTL-2 basically allocates the language's variables A, B, ... Z to addresses defined by their ASCII values, which are in the direct page. I have found a way to mostly work around this for VTL (see my recent posts on adopting VTL-2 to hardware other than the MITS/ALTAIR 680), but I'm not perfectly confident it's a perfect work-around.

Anyway, that probably covers the differences.

If you want another example of how these changes actually affect code, I have adapted the fig-Forth interpreter model for the 6800 to the 6801, optimizing it with the new instructions. (I may have missed some possible optimizations.) That source may be interesting to examine. You can find it either in my adaptation of Joe H. Allen's EXORsim:

https://osdn.net/users/reiisi/pf/exorsim6801/scm/tree/master/

or in the source tree of my 6800/6801 assembler:

https://sourceforge.net/p/asm68c/code/ci/master/tree/fig-forth/

Saturday, August 13, 2022

VTL-2 part 3, Optimizing for the 6801

The 6801 is one of my favorite hobby horses -- or one of my favorite axes to grind. Motorola kind of missed out on some opportunities with it. I've ranted about that elsewhere, here I'll just make use of it for what it is.

If you wonder why I would do a hand-optimization of VTL-2 for the 6801, John Linville got me digging into the getting the 6800 version running, and I enjoy working with the 6801 almost as much as I enjoy working with the 6809. Anyway, it's a kind of recreational activity for me.

Two notes before I dig in:

One is a CPX trick someone used in the 6800 source that won't work on the 6801. There's an FCB $9C at line 541 of the source I posted with the variables moved out of the direct page, in the middle of the DIVide routine. (It's from the original that I'm working with.) That's the op-code for CPX, and CPX doesn't affect carry or overflow on the 6800. It is used as an effective no-op to skip the SEC instruction, instead of using a branch around the set carry instruction. It saves one precious byte (and a couple of processor cycles).

I've replaced it with a branch around in the optimizations for the 6801, because the 6801 fixes the CPX instruction to fully implement all flags for the sixteen-bit comparison, which means it ain't gonna work on the 6801. And it won't work for the 6809, either.

(That's the thing about such tricks. They fall apart under progress.)

The other is that the 6800 uses a post-decrement push, so that the stack pointer is always pointing to the next available byte, not the last byte pushed. That's important in the insert-line-by-stack-blast routine, which I'm going to replace just as a matter of principle before I'm done, but the first version for the 6801 keeps it.

For anyone getting ahead of me and cribbing from my conversions to work on the 6809 transliteration, remember that, and position Y or U correctly before you copy down or up, as you may choose, in your line insertion routine.

At the present point, I have it running on my fake 6801 simulator that I added to Joe H. Allen's EXORsim. This first version does easy stuff, like converting two-byte load and store sequences using the accumulators to single double-accumulator sequences.

It also moves the temporary/local variables out of where they were hiding among the VTL pre-declared variables (and saving RAM space). Two of the temporaries were easily replaced with PSHX and PULX sequences (since the 6801 can do that). The others have usage patterns that don't fit so easily to shoving them off on the stack, so I moved them to the direct page.

Why, you ask? Why use extra RAM in the direct page when I just moved things out of the direct page?

Well, twenty bytes is a lot different from the entire 256 bytes of the direct page. Twenty bytes can be moved around with an appropriate ORG to fit your hardware. And moving those temporaries back into the direct page gives us back some of our code byte count savings that came from having it all in the direct page in the first place.

But the big reason is that it is an excuse to give them more meaningful names than SAVE0, SAVE1, ... etc. labels, which will help when moving the code to the 6809.

See the code for the rest of the story:

~~https://osdn.net/users/reiisi/pastebin/8475~~ (replaced with following:)

https://osdn.net/users/reiisi/pastebin/8605 (See notes for 20220904 and 20220911 below.)

[202208140155: add (yeah, up way too early on a Sunday morning]

Here is source that clears out the stack blasts, for clarity and for playing nice with interrupts:

~~https://osdn.net/users/reiisi/pastebin/8476~~ (replaced with following:)

https://osdn.net/users/reiisi/pastebin/8606 (See notes for 20220904 and 20220911 below.)

[202208140155: add end]

I'll add at least one more when I get it running on the MC10 emulator, then I should be ready to tackle the 6809 transliteration -- if no one beats me to it.

[202209022213: add (Oh! how embarrassing!)]

I have discovered, on the road to getting the MC10 version up, that my emulation was faulty. For all the discussion on the difference between CPX on the 6800 and on the 6801, I forgot to set the carry flag in my emulation, and the RAM check routine I added to the 6801 version has its end test inverted. Eventually, I'll delete the pastebins above and replace them with pastebins that have the BHI on line 150/152 fixed. Until then, go in and edit that line in whichever source, change the BHI to BLO:


    PROBET	CPX	#COLD
	BLO	PROBE	; CPX on 6801 works right.

And get the most recent revision to my 6801 version of EXORsim. Hopefully I'll get the fixed simulator code up sometime tomorrow (3 Sept.).

[202209031747: Done. Corrected pastebuffers added and linked, buggy pastebuffers deleted.]

[202209022213: add end]

[202209041331: add (more embarrassment!)]

I made a bad optimization to the random function in both of the above, and in the MC-10 version. Don't know how I missed this one, either. Blame it on my age?

At the label AR2 in both the above, I falsely corrected, without thinking, the addition of the low byte to the high, and vice-versa. My false correction looks like this:


    AR2	STD	0,X	; STORE NEW VALUE
	ADDD	QUITE	; RANDOMIZER
	STD	QUITE
	RTS

Looks reasonable, right? As long as you ignore the comment about randomizer?

Well, here's what it looked like in the original 6800 code:


    AR2	STAA	0,X	; STORE NEW VALUE
	STAB	1,X
	ADDB	QUITE	; RANDOMIZER
	ADCA	QUITE+1
	STAA	QUITE
	STAB	QUITE+1
	RTS

See what was going on?


    AR2	STD	0,X	; STORE NEW VALUE
	BNE	AR2RND	; Initialize/don't get stuck on zero.
	INCB		; Keep it known cheap.
*	ADDD	QUITE	; RANDOMIZER	; NO! Don't do this.
AR2RND	ADDB	QUITE	; RANDOMIZER	; Adding the low byte to the high byte
	ADCA	QUITE+1	;		; is cheap but intentional.
	STD	QUITE
	RTS
*

Just search the code for "RANDOMIZER", cut the bad code out, and paste in the fix. Or, better yet, edit it with your own, improved random function.

Initialization -- yeah, this is another place where the original code left initialization out to keep the code tiny.

[202209041331: add end]

[202209111229: add]

While transliterating for the 6809, I discovered that I missed some more opportunities to optimize for the 6801.

If you look for the label

SUBTR SUBD 0,X

you'll notice that, in the 6800 source, it was a very short routine to subtract whatever X pointed to from D. Every call to SUBTR in the source for the 6801 can be replaced with the actual body of the subroutine, with no increase of code size. You can search for BSR SUBTR and JSR SUBTR and replace each with SUBD 0,X. (I don't think there will be any JSR SUBTR, but if there are, you can.)

[202209111229: add end]

[JMR202209231810: add]

[JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]

Friday, August 12, 2022

VTL-2 part 2, Moving the Variables Out of the Direct Page

Swtpc6800 en:User:Swtpc6800 Michael Holley, Public domain, via Wikimedia Commons

In my previous post on VTL-2, I described how I got Very Tiny Language running on Joe H. Allen's EXORsim simulator. I noted in that post that the use it makes of the entire direct page address space would likely cause conflicts with many run-time operating environments.

I'm not sure how much of a problem on 6800 systems this will be, but I know it will be a problem on any 6801 system that doesn't map the built-in peripherals completely out of the memory map.

So, preparatory to working through the code to optimize it to the 6801, and in an effort to understand some things that will be necessary in transliterating the code for the 6809, I worked out how to move the variables out of the direct page. It was actually quite a bit easier than it could have been.

To make it easier to take a diff to see what I have done, I'll include links to the previous two paste buffers:

Adding semicolons to make it easier to assemble accurately:
https://osdn.net/users/reiisi/pastebin/8440
Modifications for EXORsim (and EXORciser):
https://osdn.net/users/reiisi/pastebin/8441

And this is the version with the variables moved out of the direct page:

https://osdn.net/users/reiisi/pastebin/8474

[Work in progress here, left live to help integrate with paste buffer.]

A brief rundown of the changes (if not how I figured them out) --

The original uses a number of magic numbers -- 72, $87, and several implicit zeroes (NULL vectors). I added definitions to the code that clarified what those magic numbers mean and made it easier to redefine them.

The most important of those was ZERO, which, in this version, is no longer $0000.

The next most important was BUFOFF, which is $88. This allows the code to be explicit about what it is doing with the index register at several important points, particularly where ZERO was implicitly $0000.

One of the problems I had getting the original version running was the problem of initializing two variables that are necessary for the program to know where it can store VTL program code for editing and running:

& (AMPR in the assembler) to 264 and
* (STAR in the assembler) to something like the end of your RAM.

Running the original, the user was required to set those by hand because the assumption was that you would get a ROM and plug it into your single-board computer or system, and the ROM code would not be able to know how much RAM it could use and where it would end.

The assumptions change here. Now we assume that you will assemble this to run on whatever you want to run it on. So the source code can set AMPR for you, and can probe to set STAR for you.

Comment that code out, or simply jump to START instead of COLD if you don't want to do that. And remember to set those yourself.

More details:

At line 33, set the source code to start at address $200 instead of address 0. This address can be moved, but be aware that, in order to keep the code simple, the code assumes that the lower byte of this address is 0. In other words, wherever you move it, it must be to an address evenly divisible by $100.

(In case you are unfamiliar with Motorola assemblers, $prefix makes it hexadecimal base: $100 is 100_sixteen, or 256_ten.)

At line 81, SAVOFF was something I decided I didn't want to do. It's commented out, ignore it.

From line 84, I mentioned the magic numbers for LINBUF and BUFOFF above, read the comments for details. Note that BUFOFF is actually $88, but it is used every as BUFOFF-1. No biggy.

From line 88, I moved the stack. No big deal. Remember that the bottom of the allocation area is the limit of the stack, and the stack pointer gets initialized to the top.

In particular, and this is important, note that on the 6800 and 6801 the stack pointer is always pointing to the next available byte, not to the last byte pushed. In other words, the CPU stores a byte where S is pointing, then decrements, ready for the next byte. And it increments before popping (PULing) the most recently pushed byte.

Note especially that this is different from the 6809, which always points to the last byte pushed, or to one beyond the stack.

(POP vs. PUL. Motorola used a different jargon from the most visible literature. In the most visible literature, PULL is associated with queues -- first in, first out. POP is for stacks, last-in/first-out. In Motorola 8-bit assemblers, however, POP is PUL. Think about it. The most common metaphor was a stack of trays at the lunchroom. You don't POP a tray off the stack, you PULL it off, whether at the top or the bottom. Confused? Study stacks and queues. It will all become clear -- eventually. Someday, I'm going to write a runtime library that will help clarify this. Sometime before I die, God willing.)

Line 91 sets up the beginning of the area where VTL stores lines of code (and leaves its single array allocated, if you use that).

Line 99 is where the code starts, I mentioned COLD vs. START above.

Line 232 is just a place where the extra code size required because the variables are no longer in the direct page moved a branch target out of range. On the 6800/6801 there is no long branch, so we invert the test and follow with a JMP to the target, instead. Not a meaningful change.

Note its proximity to the machine language stuff, which did not require changes, somewhat to my surprise.

Well, basically, if you put some machine language stuff in there, you're going to need the monitor to support you through the way it handles SWI. My changes should not affect that.

Line 439 is more branch target out of range, but there's no condition, so it's just changing a BSR to a JSR.

From line 452 is the key change, very dependent on making ZERO explicit. (See also lines 591, 595, and 597.) Read the comments in the code, note that I commented out the most obvious code and used instead the math mentioned above that depends on ZERO being declared at an even 256-byte boundary. (I'm trying to avoid pushing the capabilities of your assembler of choice too far. Also procrastinating about making my assembler more conformant to current assemblers.)

The comments on 591, 595, and 597 explain again why I got rid of the magic numbers and used explicit labels.

Note the puzzle at line 541. I'm not the one who said, What? I'll explain more carefully in my post of the optimization for the 6801.

And that should cover this step in the project.

Well, actually, it's tempting to back up here and post another version, bringing in the progress from the 6801 version: in other words,

without the CPX trick at line 541, using an explicit branch around instead,
using a non-stack-blast method of inserting and deleting lines,
and giving the temporary SAVEnn variables meaningful names.

But I'll postpone that, with a few comments:

If you want to do an automatic (or mechanical manual) conversion of 6800 source code to 6809 source code, at a minimum, the CPX trick must be replaced with an explicit branch around, as I have done in the 6801 optimizations. Moving the variables out of the direct page, as I've done here, should help, since the implicit linkage becomes explicit. I'm not sure whether the stack blast will survive the conversion, so you might want to bring that in from the 6801 optimizations.

[I may be able to come back to finish this up, or I may not. I do think the 6800 version needs more work.]

[JMR202209231810: add]

JMR202209231810: add end]

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]

Sunday, August 7, 2022

Adventures Getting VTL-2 (Very Tiny Language) Running on EXORsim

I don't think I really had this much spare time, but there was a question in the 6809/6309, 6800 programming language Facebook group about getting or building a version of VTL-2 (Very Tiny Language) for the 6809.

There were a couple of guys working on this. You can check David Wiens post and John W. Linville's post for the progress they've made so far.

This kind of thing piques my curiosity (... killed the cat, as they say).

So, for the past several weeks, part of what little spare time I have has disappeared down this rabbit hole, and I'm now bringing back some results. Nothing for the 6809, just yet, but some results.

First thing I did, since the assembly language sources they have for the 6809 came from someone working with a very perverse syntax assembler, and since I don't like the way other people write 6809 code, etc., was go looking for a base line source, something that runs with known results.

A little history:

As near as I can tell (from copyright dates on manuals and in source code, etc., and from the manuals themselves), the original VTL and VTL-2 came to life on the 6800-based MITS ALTAIR 680, motivated (as I understand it) in no small part by the lack of manufacturer options for the 680.

VTL was a minimal programmable calculator-style language that would run from a very small ROM on a 680 with minimal memory expansion.

After making VTL-2 available for the 680, responding to interest from the larger, more established market for the 8080-based ALTAIR 8800, the authors, Gary Shannon and Frank McCoy from The Computer Store, re-implemented it for the 8800.

Thus, there are two base versions of the source code and manual for VTL-2, one for the 680 and the other for the 8800.

After that, versions for the 6502 and some other CPUs, and, ultimately, versions in C, were produced by others. If you go looking for source code now, the easiest to find is for the 6502.

I suppose I should leave links to everything I found, but I was tired after work and not keeping records. I think deramp.com and altairclone.com were among the places I visited.

But where I finally started getting traction was T. Nakagawa's page on VTL. Down the page a ways is a link to a zip file containing a C language implementation of VTL that is straightforward to compile on *nix (and some other platforms). (Yes, his pages are mostly Japanese. Use Google Translate if you need to.)

Now I had something running to test my understanding of the manuals I had found elsewhere. (There is a searchable PDF version of the manual for the ALTAIR 680 down towards the bottom of his page, too. Easier to read and more complete than what I had been reading for the 680.)

He also has a table of memory usage for the VTL variables in his implementation, which is useful (though not definitive) in understanding how the microprocessor versions work. And the C source code is also useful in decrypting things under the hood in VTL.

And he has a link to Jun Mizutani's Return of Very Tiny Language page, which has a link to his very useful history and comparison of major versions table, where I finally rediscovered the command to list out the program you're typing in. On the 6800 versions, it's generally a zero typed by itself on the command line. Yes.

0

LOL.

And somehow (I don't remember how), I found the sbc6800 page on switch-science.com. Inside the software for the sbc6800, there is clean source for VTL-2 that almost works with my tools.

Almost. My assembler allows whitespace in operand expressions, which means you really need a leader character for comments to make sure that in lines like

VAR CMPB #'$ OR STRING

don't end up trying to use the OR of the character $ and the label string as the operand.

I need to fix that sometime, put in a switch to shut off whitespace in operand expressions. Another project for the back burners.

Switch Science has a sbc6809, as well, but there is no VTL in the software for that. Maybe we can fix that.

Okay, so I used semicolon for the comment leaders and inserted them by hand. Good thing VTL is really tiny. Took me less than a half hour, I don't remember how much less. (Paste buffer on my OSDN pages.)

But I don't have an sbc6800. I should probably get one, but not yet. I do have Joe H. Allen's EXORsim simulator for Motorola's EXORciser running, however. And the fun begins.

First I had to move the code for the VTL interpreter down from $FC00, where it was set up to assemble in the source code, to some place that doesn't conflict with the EXORciser monitor and system object code. I moved it to $7800.

Then I patched the I/O calls and had to figure out why I wasn't getting any output from EXORsim.

For some reason, it turns out that I have to hit the output port about thirty times before things start showing up. I should ask Joe about that sometime. I added code to do that in an initialization routine. (Paste buffer.) Not a lot of changes yet, use diff if you're interested in seeing what I added. Oh. I think there were also a couple of branches that were no longer in range, which I changed to appropriate jumps.

I was expecting there were going to be problems with using all of the direct page for VTL's variables and stack. There will be conflicts if I try to assemble a version that works under a disk operating system.

But with the I/O patched and the code moved down, it worked as a simple calculator. Trying to enter a program, however, did not produce happy results. Crashing, freezing.

After spinning my wheels for a couple more days, I began to become sure that it was just something about VTL itself that I was missing. Specifically, looking in the code, there are two system variables that I didn't see getting set any place in the code -- program base & and end * address. So I went back and worked my way through the manual with both VTL-2 on EXORsim and and VTL-C, and there it was, towards the end of the manual --

The microprocessor versions of VTL require one more step of initialization after you get the interpreter running and before you start typing in programs.

The variables in question are & and *, the program space base and the end.

264 should work for the base, per the manual. I used 300 to be safe. Look for the PRGM label in the source code to see what's going on. (Really, there should be no problem setting that automatically in the code, but I guess the authors were saving every byte they could for the user program space.)

The end depends on how much RAM you have installed and where you assemble the interpreter. It ought to work as large as 30719 ($77FF), but I used 8192 to be sure. (This could be probed and set, but it would take probably ten to twenty bytes to do a simple probe. Or you could just hard code it so you don't forget, and remember to reassemble if your memory layout changes.)

So, to get it running in EXORsim, copy the code from the paste buffer, save and assemble it, open the S1/S9 object in the source.x file in a text editor. Something like

asm68c -l1 vtl_6800_exorciser.asm > vtl_6800_exorciser.list
gedit vtl_6800_exorciser.list vtl_6800_exorciser.x

Run exorsim in a terminal session, something like

./exor --mon

You should have a starting message and a reminder you can type help, and the % prompt, at which you give the load command:

Hit Ctrl-C for simulator command line. Starting simulation...

> 0 A=00 B=00 X=0000 SP=FF8A ------ OSLOAD E800: 8E FF 8A LDS #$FF8A Load OS

Type 'help'
% l

Don't forget to hit return after the load command. (The load command is the lone "l" you type in after the % prompt on the last line I just showed.)

It will wait for you to feed it S-record object code, so go to the object file in the text editor and select and copy the whole of the S1/S9 object as text:

S1090000000000000000F6
S113000600000000000000000000000000000000E6
S113001600000000000000000000000000000000D6
S113002600000000000000000000000000000000C6
S11100360000000000000000000000000000B8
S113004400000000000000000000000000000000A8
S11300540000000000000000000000000000000098
S11300640000000000000000000000000000000088
...
S10C7AF8C60D8D02C60A7E7B1B3B
S1087B010D0A4F4B00CA
S10C7B06C628BDF0215A26FA3903
S1087B0FF6FCF45739F7
S10A7B1436BDF012163239F0
S10A7B1B3617BDF0183239E2
S903780084

Paste it into the exorsim session. It'll just paste in like the above. Hit the return key once at the end and it should say

PC set to 7800

Now hit CTL-C to get out of load mode, and at the % prompt type

% c 7800

or just "c" (continue), since it should have set the PC to 7800 for you anyway. And hit return. It should give you some empty lines and then the OK prompt.

OK

at which you need to set the base and end. Type

~~&=300~~
*=8192

at the OK prompt. (Woops, not 300, see edit below.) Or you should be able to get away with

OK
&=264

OK
*=16*1024

OK

And it should run and allow you to type in programs.

[JMR202208092108: add]

After some study, it appears that the program area base variable needs to equal the assembler PRGM label for listing and program editing to work. In other words, the 6800 versions need to write

&=264

[JMR202208092108: add end]

[JMR202209231810: add]

JMR202209231810: add end]

My next step is probably to move the interpreter variables out of the direct page ...

[JMR202208131113: add]

I've got this done, see part 2 here: https://joels-programming-fun.blogspot.com/2022/08/vtl-2-part-2-moving-variables-out-of-direct-page.html

[JMR202208131113: add end]

... so I can optimize it for the 6801 (which usually has I/O at the bottom of the memory map)

[JMR202208140202: add]

And this is now done, see part 3 here: https://joels-programming-fun.blogspot.com/2022/08/vtl-2-part-3-optimizing-for-6801.html

[JMR202208140202: add end]

... and then assemble it to run on the MC-10.

After that, if Dave and John haven't made any more progress, I'll see if I can make a quick transliteration to 6809.

[JMR202210011737: add]

I now have my work on VTL-2 up in a private repository:

https://osdn.net/users/reiisi/pf/nsvtl/wiki/FrontPage

https://osdn.net/users/reiisi/pf/nsvtl/files/

[JMR202210011737: add end]