Maybe compress wordlist #14

hackerb9 · 2022-08-08T03:53:32Z

From #6:

Since you know that every word is exactly 5 characters, you could omit [CRLF] and have a file that's 28% smaller.
I think I looked at that and disregarded it for some reason. I'll have to check my notes. Maybe time to scan 5 chr. chunks was too long to be practical?

My RNDACC program (issue #15) shows that one could remove CRLF and have fast access. There is still the downside that the file isn't as easily editable since all the words run together, but - at least on the Tandy 200 - the text editor can still handle it.

What the text editor wouldn't be able to handle would be compressing each word from 5 bytes of ASCII to 4 bytes of binary.

Still, sticking with plain text may be the best idea unless the space is absolutely necessary for a spellchecker (#12).

hackerb9 · 2022-08-08T05:30:06Z

Ignore, use binary scheme for compression to 3 bytes below

~~Binary scheme for compression to 4 bytes.~~

For a word made up of five characters: ABCDE, each with five bits of data A₄…A₀, the bits can be arranged into four bytes like so:

A₄A₃A₂A₁A₀B₄B₃B₂	1B₁B₀C₄C₃C₂C₁C₀	010D₄D₃D₂D₁D₀	010E₄E₃E₂E₁E₀

Note that the last two characters are unchanged from the capitalized ASCII letters which D and E represent.

[EDIT: Change first bit of second byte to be 1 so that there is no way this method could ever generate ^Z (ASCII 26), which is the Tandy End-of-File character.]

hackerb9 · 2022-08-08T18:58:48Z

It may actually be possible to compress a five letter word into three bytes. This is because 26 letters doesn't actually require five bits:

$$ \begin{eqnarray*} log_2(26) =& 4.70043971 &\text{ bits per char} \\ 5 \times log_2(26) =& 23.50219859 &\text{ bits per word} \\ 5 \times log_2(26)\div{8} =& 2.93777482 &\text{ bytes} \\ <& 3 \text{ bytes} \\ \end{eqnarray*} $$

bgri · 2022-08-08T22:06:36Z

I like the idea of smaller wordlist files. Dropping CRLF and finding the proper word using your code . Heh, I couldn't quite follow the magic to reduce five bytes down to four, then the math to show the possibility of going down to three is totally beyond me.

But, getting almost a 50% reduction in word size is cool. Doing something like this would mean writing and distributing the compression script in a somewhat agnostic codebase (Javascript? Bash?). Or going really old school and figuring out how to have the m100/NEC create the binary output file from the ASCII input file.

More fun!

hackerb9 · 2022-08-09T03:49:00Z

Or going really old school and figuring out how to have the m100/NEC create the binary output file from the ASCII input file.

I wrote a proof of concept program today in BASIC just to encode one word into three bytes and then back. It works! The basic notion is to see the word as representing a single number (in base 26) and use the binary representation to chop it into three bytes.

Let A, B, C, D, and E be numbers from 0 to 25 representing the five letters of the word. Then the magic number they represent in base 26 (little endian) could be generated as:

$$ \begin{eqnarray*} \text{magic} =& A \times 26^0 + B \times 26^1 + C \times 26^2 + D \times 26^3 + E \times 26^4 \\ =& A + B \times 26 + C \times 26^2 + D \times 26^3 + E \times 26^4 \end{eqnarray*} $$

The largest number that could be generated is $26^5-1=11881375$. Note that $11,881,375<16,777,216=256^3$, therefore, the magic number will always fit in three bytes.

To recover a five character word from a magic number, you repeatedly divide it by 26 and use the remainder as each character number. By the way, it may seem more natural to multiply $A \times 26^4$ instead of $E$. I used little-endian ordering so that during decoding, the first remainder is the first character and last, is the last.

hackerb9 · 2022-08-09T03:52:50Z

Compression benefits

Method	File	Bytes	Lines	Chars per entry
None	WL2022.DO	2555	365	7
Remove CRLF	WL2022.DA	1825	1	5
Base 26 binary	WL2022.DB	1095	1	3

bgri · 2022-08-09T04:30:26Z

So, 57% compression...very nice!
With that compression, it almost seems possible to have the entire wordlist onboard. Which would enable testing guesses against the wordlist for validity (rather than accepting any five letters as it currently does), if the test is fast. That's a lot of words to test against...

hackerb9 · 2022-08-09T05:05:48Z

So, 57% compression...very nice! With that compression, it almost seems possible to have the entire wordlist onboard. Which would enable testing guesses against the wordlist for validity (rather than accepting any five letters as it currently does)

That's a good idea, but there are less than six years worth of words (2119 words). There are a heck of a lot more five letter words would should be accepted. I am guessing ≅7000 words is a minimum, but the official Wordle uses a 13,000 word spelling dictionary.

if the test is fast. That's a lot of words to test against...

Yup, that's the question. I believe the words would need to be sorted alphabetically so they could be quickly accessed with a binary search. (Or some similar algorithm.) But sorting them would make it very slow to find the correct word for the day. Also, while decryption is fast enough to do once at game initialization, I do not know if it would be fast enough to perform $log_{2}7000 \approx 13$ times on each guess.

In issue #12 I was contemplating a hash table (which would be fast) then throwing away the actual list of words (which hash tables usually keep) and just storing a single bit (to save on RAM) if the word should be accepted. I'm not convinced that's a good idea as it would still take an awful lot of RAM for a reasonably sized spelling dictionary — $\approx$ 21KB if using the three-byte binary compression — but at least by making the wordlists smaller, there's a chance it would be able to fit.

hackerb9 · 2022-08-09T05:21:49Z

From #12:

So maybe using the .DA format on modern computers would work, but as far as the NEC is concerned, it may introduce user confusion...

Yes, definitely keep the original .DO file for sanity.

Too bad about the NEC/Backpack getting confused by .DA, but no big deal. Maybe just change the filename, for example WL2022.DO would stay uncompressed, WD2022.DO if M100LE uses data files which drop CRLF, and WB2022.DO might represent the three-byte binary compression.

bgri · 2022-08-09T14:00:23Z

Right. I forgot about the official spelling dictionary. I'm a bit conflicted. While it would be cool to try and fit the spelling dictionary in, if it's missing, does it actually detract from the game? As it is, yes, we'll accept 'XXXXX' as a guess if you choose to use that as a strategy to solve the word. Was it fun? Was it more fun being told 'XXXXX' isn't a word? And is it worth the tradeoff that to use the spelling dictionary, we'll eat a large portion of your free RAM? I guess we could make it an optional play mode if they have the RAM available -- a light version for folk without the available space, and the full experience for someone with a REX, or a second 32k RAM bank just for the game, etc.

…

On Mon, Aug 8, 2022 at 11:06 PM hackerb9 ***@***.***> wrote: So, 57% compression...very nice! With that compression, it almost seems possible to have the entire wordlist onboard. Which would enable testing guesses against the wordlist for validity (rather than accepting any five letters as it currently does) That's a good idea, but there are less than six years worth of words (2119 words). There are a heck of a lot more five letter words would should be accepted. I am guessing ≅7000 words is a minimum, but the official Wordle uses a 13,000 word spelling dictionary <https://github.com/tabatkins/wordle-list>. if the test is fast. That's a lot of words to test against... Yup, that's the question. I believe the words would need to be sorted alphabetically so they could be quickly accessed with a binary search. (Or some similar algorithm.) But sorting them would make it very slow to find the correct word for the day. Also, while decryption is fast enough to do once at game initialization, I do not know if it would be fast enough to perform $log_{2}7000 \approx 13$ times on each guess. In issue #12 <#12> I was contemplating a hash table (which would be fast) then throwing away the actual list of words (which hash tables usually keep) and just storing a single bit (to save on RAM) if the word should be accepted. I'm not convinced that's a good idea as it would still take an awful lot of RAM for a reasonably sized spelling dictionary — $\approx$ 21KB if using the three-byte binary compression — but at least by making the wordlists smaller, there's a chance it would be able to fit. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJII36JUXVK2ZLA627UT3VYHRLPANCNFSM553UA2LA> . You are receiving this because you commented.Message ID: ***@***.***>

-- -- Brad Grier ----------

bgri · 2022-08-09T14:05:06Z

Too bad about the NEC/Backpack getting confused by .DA, but no big deal. Maybe just change the filename, for example WL2022.DO would stay uncompressed, WD2022.DO if M100LE uses data files which drop CRLF, and WB2022.DO might represent the three-byte binary compression.

It may be that TS-DOS was confused too. Might be that NEC just takes all file extensions starting with 'D' and assumes from there. I like the suggestion about changing the file name (WL/WD/WB). Lets users choose how they want to work with the files...

…

On Mon, Aug 8, 2022 at 11:22 PM hackerb9 ***@***.***> wrote: From #12 <#12>: So maybe using the .DA format on modern computers would work, but as far as the NEC is concerned, it may introduce user confusion... Yes, definitely keep the original .DO file for sanity. Too bad about the NEC/Backpack getting confused by .DA, but no big deal. Maybe just change the filename, for example WL2022.DO would stay uncompressed, WD2022.DO if M100LE uses data files which drop CRLF, and WB2022.DO might represent the three-byte binary compression. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJII4GLGI3U5PPJ6N2BODVYHTHTANCNFSM553UA2LA> . You are receiving this because you commented.Message ID: ***@***.***>

-- -- Brad Grier ----------

hackerb9 · 2022-08-13T10:03:22Z

Turns out Tandy BASIC doesn't like to open .DA files. I'm able to access them using my direct RAM storage method so I hadn't noticed that OPEN gave the same errors as on your NEC.

hackerb9 · 2022-08-13T10:05:41Z

Here's a program for compressing a WordList in plaintext format into a 3-byte binary format.

[UPDATE: This does not work because BASIC cannot write files that contain CHR$(0), CHR$(26), or CHR$(127). ]

0 REM CMPRSS.BA
5 REM Read five letter words from one file and output three byte representations to another file
9 ON ERROR GOTO 950
10 IN$="COM:98N1ENN"
30 PRINT "File to read ["IN$"]";
39 'Enter will not change IN$
40 INPUT IN$
55 A$=IN$
60 GOSUB 800: REM Sanity check filename
65 IN$=A$
70 IF A$="" THEN GOTO 10 ' Not sane
110 OT$="WB20XX.DO"
130 PRINT "File to write ["OT$"]";
139 'Enter will not change OT$
140 INPUT OT$
155 A$=OT$
160 GOSUB 800: REM Sanity check filename
165 OT$=A$
170 IF A$="" THEN GOTO 110 'Not sane
200 REM OPEN FILES
210 ON ERROR GOTO 910
220 OPEN IN$ FOR INPUT AS #1
230 ON ERROR GOTO 920
240 OPEN OT$ FOR OUTPUT AS #2
250 ON ERROR GOTO 950
300 REM Loop over reading, encoding, writing
305 DY=0
310 IF EOF(1) THEN GOTO 400
320 LINE INPUT #1, A$
330 DY=DY+1
340 PRINT INT(100*DY/366)"%",;
350 GOSUB 500
360 PRINT #2, CHR$(B1)CHR$(B2)CHR$(B3);
370 PRINT B1 ", " B2 ", " B3 "     " CHR$(13);
380 GOTO 310
400 REM Finished compressing
410 PRINT "100 %"
420 PRINT"Finished compressing."
490 CLOSE #1: CLOSE #2
499 END
500 REM Encode string in A$ to B1,B2,B3
501 REM modifies X,Y,T,A
510 X=0
520 FOR T=5 TO 1 STEP -1
530 A=ASC(MID$(A$,T,1))
540 A=A AND 31
550 A=A-1
560 X=X*26+A
580 NEXT
590 'Continue to 600
600 REM CONVERT NUMBER IN A TO B1,B2,B3
601 'Modifies x,y,b1,b2,b3
610 Y=INT(X/256):B1=(X-Y*256):X=Y
620 Y=INT(X/256):B2=(X-Y*256):X=Y
630 Y=INT(X/256):B3=(X-Y*256)
690 RETURN
800 REM Sanity check filename in A$.            Returns A$="" if invalid.
809 ' Skip "COM:98N1ENN" 
810 IF INSTR(1,A$,":") THEN RETURN: 
819 ' Filename and extension if no dot
820 DT=INSTR(1,A$,".") ' Find the dot
830 IF DT>0 THEN FN$=MID$(A$,1,DT-1): EX$=MID$(A$,DT+1,2): ELSE FN$=A$
840 IF LEN(FN$) > 6 THEN PRINT"Filename must be six characters or less": A$="": RETURN
850 IF LEN(FN$) = 0 THEN PRINT"Please enter a filename.": A$="": RETURN
860 IF LEN(EX$) > 2 THEN PRINT"Extension too long. Maybe try .DO.": A$="": RETURN
869 'Default extension is DO
870 IF LEN(EX$) = 0 THEN EX$="DO"
880 A$=FN$+"."+EX$
890 RETURN
900 REM Error handling
910 PRINT"Could not open "IN$" for reading": END
920 PRINT"Could not open "OT$" for writing": END
950 REM Generic error handler
960 PRINT"Got error" ERR "in line" ERL

hackerb9 · 2022-08-15T08:24:30Z

Here is a program that tests binary output using BASIC's OPEN and PRINT # commands.

10 open "foo.do" for output as #1 
20 for t=0 to 255
30 print #1, chr$(t);
40 next t
50 close #1 
60 open "foo.do" for input as #1 
70 x=0
100 if eof(1) then 200
110 c = asc(input$(1,1))
120 if c<>x then print x
130 x=c+1
140 goto 100
200 close #1 
210 end

The output on my Tandy 200 is

0
26
127

Which mean NULL, ^Z, and ^? cannot be written to .DO files using this method.

bgri · 2022-08-16T22:07:02Z

Ok, finally caught up on this thread. Darn. It looked promising too. I ran the compress program on my 8201 and was pleased that it generated a file WL20XX.DO :) Though I have now discovered a dead pixel line on the 8201a... Yay! Another unit to service :)

Interesting that when those byte combinations come up (^Z, etc) the system doesn't catch that bad write attempt and fail.

I had a thought... could we just confine the binary writes to everything between 128 and 255?

Basically Y=INT(X/128):B1=(X-Y*128):X=Y:B1 = B1 + 128

I was trying to think of ways to work within the spaces between 0, 26, and 127, then got bogged down and wondered if just cutting our working options in half and shoving the values into a valid range would still make sense, or not.

I modified your code slightly (below) and have a result. Not sure if it's a good result or not, but don't have a secret decoder ring yet.

It seemed to work ok on WL2022.DO, STUF.DO is the compressed.. to the right of it is WL2022.DO, and WL20XX.DO is the 256 compressed file.

Not really understanding the math, would that be a large enough range? Or would we benefit by adding another byte (B4)? Or do I really not get this and just had an interesting afternoon thinking about things :)

0 REM CMPRS2.BA
5 REM Read five letter words from one file and output three byte representations to another file
9 ON ERROR GOTO 950
10 IN$="COM:98N1ENN"
30 PRINT "File to read ["IN$"]";
39 'Enter will not change IN$
40 INPUT IN$
55 A$=IN$
60 GOSUB 800: REM Sanity check filename
65 IN$=A$
70 IF A$="" THEN GOTO 10 ' Not sane
110 OT$="WB20XX.DO"
130 PRINT "File to write ["OT$"]";
139 'Enter will not change OT$
140 INPUT OT$
155 A$=OT$
160 GOSUB 800: REM Sanity check filename
165 OT$=A$
170 IF A$="" THEN GOTO 110 'Not sane
200 REM OPEN FILES
210 ON ERROR GOTO 910
220 OPEN IN$ FOR INPUT AS #1
230 ON ERROR GOTO 920
240 OPEN OT$ FOR OUTPUT AS #2
250 ON ERROR GOTO 950
300 REM Loop over reading, encoding, writing
305 DY=0
310 IF EOF(1) THEN GOTO 400
320 LINE INPUT #1, A$
330 DY=DY+1
340 PRINT INT(100*DY/366)"%",;
350 GOSUB 500
360 PRINT #2, CHR$(B1)CHR$(B2)CHR$(B3);
370 PRINT B1 ", " B2 ", " B3 "     " CHR$(13);
380 GOTO 310
400 REM Finished compressing
410 PRINT "100 %"
420 PRINT"Finished compressing."
490 CLOSE #1: CLOSE #2
499 END
500 REM Encode string in A$ to B1,B2,B3
501 REM modifies X,Y,T,A
510 X=0
520 FOR T=5 TO 1 STEP -1
530 A=ASC(MID$(A$,T,1))
540 A=A AND 31
550 A=A-1
560 X=X*26+A
580 NEXT
590 'Continue to 600
600 REM CONVERT NUMBER IN A TO B1,B2,B3
601 'Modifies x,y,b1,b2,b3
610 Y=INT(X/128):B1=(X-Y*128):X=Y:B1=B1+128
620 Y=INT(X/128):B2=(X-Y*128):X=Y:B2=B2+128
630 Y=INT(X/128):B3=(X-Y*128):B3=B3+128
690 RETURN
800 REM Sanity check filename in A$.            Returns A$="" if invalid.
809 ' Skip "COM:98N1ENN" 
810 IF INSTR(1,A$,":") THEN RETURN: 
819 ' Filename and extension if no dot
820 DT=INSTR(1,A$,".") ' Find the dot
830 IF DT>0 THEN FN$=MID$(A$,1,DT-1): EX$=MID$(A$,DT+1,2): ELSE FN$=A$
840 IF LEN(FN$) > 6 THEN PRINT"Filename must be six characters or less": A$="": RETURN
850 IF LEN(FN$) = 0 THEN PRINT"Please enter a filename.": A$="": RETURN
860 IF LEN(EX$) > 2 THEN PRINT"Extension too long. Maybe try .DO.": A$="": RETURN
869 'Default extension is DO
870 IF LEN(EX$) = 0 THEN EX$="DO"
880 A$=FN$+"."+EX$
890 RETURN
900 REM Error handling
910 PRINT"Could not open "IN$" for reading": END
920 PRINT"Could not open "OT$" for writing": END
950 REM Generic error handler
960 PRINT"Got error" ERR "in line" ERL

hackerb9 · 2022-08-17T00:43:12Z

Nice work. You are correct that reducing the maximum value would help. We could then substitute the values 0, 26, and 127 with 255, 254, and 253 when compressing and swap them back when using the secret decoder ring in M100LE. However, as you guessed,128 wouldn't give a large enough range because

$$ 128^3 = 2,097,152 < 11,891,376 = 26^5 $$

In other words, three numbers from 0 to 127 can represent about 2 million possibilities, but five numbers from 0 to 25 can represent almost 12 million.

However, there is nothing stopping us from choosing an intermediate number. For example, 253 gives us

$$ 253^3 = 16,194,277 > 11,891,376 = 26^5 $$

and that leaves the values 253, 254, and 255 available for substitution.

However, it turns out such trickery might not be necessary. I asked on the Bitchin' 100 mailing list how to write binary data and got a hint that one would have to use a .CO file. BASIC doesn't provide a nice interface for writing .CO files; one only has the option of using SAVEM (or BSAVE on the NECs) to store a region of RAM to a file. I figured out a sneaky way to create the .CO file without having to clear extra space for a duplicate copy of the data. It may be too clever by half, but my proof of concept seems to be working.

hackerb9 · 2022-08-17T01:00:46Z

Here's a version of CMPRSS which handles writing the data to a .CO file. I believe it should work on the NEC as well as Tandy, despite the NEC using BSAVE and Tandy wanting SAVEM.

Click for .CO version of CMPRSS

0 CLEAR
1 REM CMPRSS By hackerb9, 2022
5 REM Read five letter words from one file and output a three byte representations to a new .CO file.
6 REM Uses Ram Directory to POKE the data dirctly into the .CO file.
7 REM Note The .CO file is currently hardcoded to fit 366 three-byte words.
9 ON ERROR GOTO 950
10 IN$="COM:98N1ENN"
30 PRINT "File to read ["IN$"]";
39 'Enter will not change IN$
40 INPUT IN$
55 A$=IN$: DX$="DO"
60 GOSUB 800: REM Sanity check filename
65 IN$=A$
70 IF A$="" THEN GOTO 10 ' Not sane
110 WL$="WL20XX.CO"
130 PRINT "File to write ["WL$"]";
139 'Enter will not change WL$
140 INPUT WL$
155 A$=WL$: DX$="CO" 'default extension
160 GOSUB 800:REM Sanity check filename
165 WL$=A$
170 IF A$="" THEN GOTO 110 'Not sane
200 REM OPEN FILES
205 ON ERROR GOTO 910
210 OPEN IN$ FOR INPUT AS #1
215 ON ERROR GOTO 920
220 GOSUB 7000 ' Create .CO file
225 ON ERROR GOTO 930
230 GOSUB 8000 ' WA=addr of .CO file
240 IF WA=0 THEN GOTO 930
250 ON ERROR GOTO 950
255 B(0)=223:B(1)=73:B(2)=168
256 FOR I=0 TO 2: POKE WA+6+365*3+I, B(I): NEXT I
300 REM LOOP: Read, Encode, Write
305 DY=0
310 IF EOF(1) THEN GOTO 400
315 IF DY>=366 THEN PRINT "Stopped reading at 366 words": GOTO 400
320 LINE INPUT #1, A$
330 GOSUB 500 'compress to 3 bytes=
340 PRINT INT(100*DY/366)"%",
345 PRINT B(0) ", " B(1) ", " B(2) "     " CHR$(13);
350 FOR I=0 TO 2
355 POKE WA+6+DY*3+I, B(I)
360 NEXTI
370 DY=DY+1
399 GOTO 310
400 REM DONE
410 PRINT "100 %"
420 PRINT"Finished compressing."
490 CLOSE #1
499 END
500 REM Encode string in A$ to B(0..2)
501 REM modifies X,Y,T,A
510 X=0
515 IF LEN(A$) <> 5 THEN PRINT"Length of "A$" is not 5.": STOP
520 FOR T=5 TO 1 STEP -1
530 A=ASC(MID$(A$,T,1))
540 A=A AND 31
550 A=A-1
560 X=X*26+A
580 NEXT
590 'Continue to 600
600 REM CONVERT NUMBER IN A TO B0,B1,B2
601 'Modifies x,y,b0,b1,b2
610 FOR I=0 TO 2
620 Y=INT(X/256):B(I)=(X-Y*256):X=Y
630 NEXT I
670 RETURN
800 REM Sanity check filename in A$.            Returns A$="" if invalid.
801 REM Set DX$ to default two character extension before calling this subroutine.
802 IF DX$="" THEN DX$="CO"
805 EX$=""
809 ' Skip "COM:98N1ENN" 
810 IF INSTR(1,A$,":") THEN RETURN: 
819 ' Filename and extension if no dot
820 DT=INSTR(1,A$,".") ' Find the dot
830 IF DT>0 THEN FN$=MID$(A$,1,DT-1): EX$=MID$(A$,DT+1,2): ELSE FN$=A$
840 IF LEN(FN$) > 6 THEN PRINT"Filename must be six characters or less": A$="": RETURN
850 IF LEN(FN$) = 0 THEN PRINT"Please enter a filename.": A$="": RETURN
860 IF LEN(EX$) > 2 THEN PRINT"Extension too long. Maybe try ."DX$: A$="": RETURN
869 'Default extension is in DX$
870 IF LEN(EX$) = 0 THEN EX$=DX$
880 A$=FN$+"."+EX$
890 RETURN
900 REM Error handling
910 PRINT"Could not open "IN$" for reading": END
920 PRINT "Could not allocate" SZ "bytes for " WL$: END
930 PRINT"Error finding address of "WL$: GOTO 950
950 REM Generic error handler
960 PRINT"Got error" ERR "in line" ERL
999 END
7000 REM Create .co file to hold compressed wordlist
7010 ID=PEEK(1)
7019 ' Allocate space for 366 words
7020 SZ=366 * 3 - 1
7030 IF (ID=148) THEN BSAVE WL$,0, SZ: ELSE SAVEM WL$, 0, SZ
7040 RETURN
7999 END
8000 REM RNDACC subroutine
8001 REM Input: WL$ is file to locate.
8002 REM Output: WA is address in RAM.
8003 REM Temp: ID, RD, FL, FN$, T1, T2
8004 REM
8005 REM Warning: Run CLEAR at start of program or this will return an invalid address.
8006 REM
8007 ' Set WL$ to 8 chars, no dot
8008 GOSUB 8100
8009 ' HW ID. 51=M100, 171=T200, 148=NEC,      35=M10, 225=K85
8010 ID=PEEK(1)
8014 ' Ram Directory address. (Anderson's "Programming Tips" gives RD=63842 for M100 and 62034 for T200.)
8015 ' (Gary Weber's NEC.MAP gives RD=63567, but we can skip the system files by starting at 63633.)
8016 RD=-( 63842*(ID=51) + 62034*(ID=171) + 63633*(ID=148) )
8019 ' Search directory for WL$
8020 FOR T1 = RD TO 65535 STEP 11
8029 ' Attribute flag: See Oppedahl's "Inside the TRS-80 Model 100" for details.
8030 FL=PEEK(T1)
8039 ' Stop at end of directory (255)
8040 IF FL=255 THEN GOTO 8080
8044 ' Skip invalid files
8045 IF (FL AND 128)=0 THEN NEXT T1
8049 ' WA is file address in memory
8050 WA=PEEK(T1+1)+256*PEEK(T1+2)
8059 ' Filename matches WL$?
8060 FOR T2=1 TO 8: IF ASC(MID$(WL$,T2, 1)) <> PEEK(T1+2+T2) THEN NEXT T1: ELSE NEXT T2
8070 IF T2=9 THEN PRINT "Writing "WL$" at" WA: RETURN
8080 REM File not found
8085 PRINT "Did not find " WL$
8090 WA=0: RETURN
8100 REM Normalize filename to 8 chars
8101 REM E.g. "FOO.DO" -> "FOO   DO"
8102 REM INPUT & OUTPUT: WL$
8103 REM Temp: T1, T2, FN$, EX$
8110 T1=INSTR(1,WL$,".")
8115 FN$=WL$:EX$=""
8120 IF T1>0 THEN FN$=MID$(WL$,1,T1-1): EX$=MID$(WL$,T1+1,2)
8130 IF LEN(FN$)>6 THEN PRINT "filename too long": STOP
8140 IF LEN(FN$)<6 THEN FN$=FN$+" ": GOTO 8140
8150 IF LEN(EX$)<2 THEN EX$=EX$+" ": GOTO 8150
8160 FN$=FN$+EX$: WL$=""
8170 FOR T1=1 TO 8
8172 T2=ASC(MID$(FN$,T1,1)): IF (T2>=ASC("a")) AND (T2<=ASC("z")) THEN T2=T2-32
8173 WL$=WL$+CHR$(T2)
8175 NEXT T1
8180 RETURN

And here is a secret decoder ring to test it out:

Click for .CO version of RNDACC

0 REM RNDACC by hackerb9 2022
1 REM Random access to files in RAM.
2 ' This program can read directly
3 ' from a compressed .CO word list
4 ' created by CMPRSS.BA.
5 '
7 ' Files change their location in RAM,     moving aside as other files grow. 
8 ' Note: EDIT modifies a hidden file,      but not the directory pointers!
9 ' CLEAR refreshes the pointers.
10 CLEAR
12 ' HW ID. 51=M100, 171=T200, 148=NEC,      35=M10, 225=K85
13 ID=PEEK(1)
14 ' Ram Directory address. (Anderson's "Programming Tips" gives RD=63842 for M100 and 62034 for T200.)
15 ' (Gary Weber's NEC.MAP gives RD=63567, but we can skip the system files by starting at 63633.)
16 RD=-( 63842*(ID=51) + 62034*(ID=171) + 63633*(ID=148) )
17 ' WL20xx.CO is the wordle wordlist        for each day in 20xx, compressed.
18 IF ID=148 THEN WL$="WL20"+LEFT$(DATE$, 2)+".CO": ELSE WL$="WL20"+RIGHT$(DATE$, 2)+".CO"
19 ' Search directory for "WL20xx.CO" 
20 FOR A = RD TO 65535 STEP 11
29 ' Attribute flag: See Oppedahl's "Inside the TRS-80 Model 100" for details.
30 FL=PEEK(A) 
39 ' Stop at end of directory (255)
40 IF FL=255 THEN 300
43 IF (FL AND 128)=0 THEN NEXT 'VALID?
46 IF (FL AND 16)<>0 THEN NEXT 'In ROM
47 IF (FL AND 8) <>0 THEN NEXT 'Hidden
49 ' X is file address in memory
50 X=PEEK(A+1)+256*PEEK(A+2)
59 ' Add filename all at once for speed
60 FN$=CHR$(PEEK(A+3)) + CHR$(PEEK(A+4)) + CHR$(PEEK(A+5)) + CHR$(PEEK(A+6)) + CHR$(PEEK(A+7)) + CHR$(PEEK(A+8)) + "." + CHR$(PEEK(A+9)) + CHR$(PEEK(A+10))
69 ' Got filename in FN$
70 PRINT FN$, X
80 IF FN$=WL$ THEN 200
90 NEXT A
99 GOTO 300
200 REM Found WL20xx.CO. Now access it.
210 INPUT "Enter an ordinal date (1 to 366)"; DY
220 DY=DY-1
228 ' X is WL20XX.CO's address in RAM
229 ' SZ is # of bytes of data
230 SZ=PEEK(X+2)+256*PEEK(X+3)
235 IF SZ < DY*3+2 THEN PRINT "Out of Data error": END
236 IF DY<0 THEN PRINT "Negative index error": END
238 ' Skip .CO header and index by DY
239 ' Each 5 letter word is 3 bytes.
240 X = X+6 + DY*3
250 A=PEEK(X)+256*PEEK(X+1)+256*256*PEEK(X+2)
260 FOR T=1 TO 5
270 B=INT(A/26)
280 PRINTCHR$(A-B*26+ASC("A"));
290 A=B: NEXT T
295 PRINT
299 END
300 REM File not found
310 PRINT "Error: File ";WL$;" not found."
320 END

As you can see, I'm first creating the .CO file just by copying the ROM from address 0 to 3 × 366. Once I have a .CO file of the right size, I can use the Ram Directory trick to find its memory address and POKE the correct binary data into it. I've never seen this method documented anywhere, so there may be something fundamentally wrong with the idea, but it does seem to work and hasn't blown up my Tandy 200 (yet).

bgri · 2022-08-18T17:15:54Z

Interesting! That's some very tricky manipulation of the 'live' data :) Nice!

I loaded the files on my 8201a and a m102. Hadn't tried the 8201 or the 8200 yet (doing this on breaks at work).

8201a ran both without error, though the data displayed when it showed the selected word was incorrect.
Word one in WL2022.DO should be REBUS.

And just to see what happens, I ran it on the 102... RNDACC runs, but can't find the file as
'unusual' data is discovered ... likely due to the offset in line 16 being inaccurate as PEEK(1) on the model 102 isn't tested( 167).

bgri · 2022-10-11T08:50:43Z

Ok, neat. I get it!! Between your cool description of the *magic*, and this quick refresher of *Endianness <https://en.wikipedia.org/wiki/Endianness>*, I see what you've got going. and I like it! Elegant! Base 26 math magic encodes it to a number, then encode that number (which takes less space). If I understand what's going on, I like it! And very cool that you proved it in BASIC! Possibilities...

…

On Mon, Aug 8, 2022 at 9:49 PM hackerb9 ***@***.***> wrote: Or going really old school and figuring out how to have the m100/NEC create the binary output file from the ASCII input file. I wrote a proof of concept program today in BASIC just to encode one word into three bytes and then back. It works! The basic notion is to see the word as representing a single number (in base 26) and use the binary representation to chop it into three bytes. Let A, B, C, D, and E be numbers from 0 to 25 representing the five letters of the word. Then the magic number they represent in base 26 (little endian) could be generated as: $$ \begin{eqnarray*} \text{magic} =& A \times 26^0 + B \times 26^1 + C \times 26^2 + D \times 26^3 + E \times 26^4 \\ =& A + B \times 26 + C \times 26^2 + D \times 26^3 + E \times 26^4 \end{eqnarray*} $$ The largest number that could be generated is $26^5-1 = 11881375$. Note that $11881375 < 16777216 = 256^3$, therefore, the magic number will always fit in three bytes. To recover a five character word from a magic number, you repeatedly divide it by 26 and use the remainder as each character number. By the way, it may seem more natural to multiply $A \times 26^4$ instead of $E$. I used little-endian ordering so that during decoding, the first remainder is the first character and last, is the last. — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADLJIIZNTLXOUU4WHZ3EHZTVYHILRANCNFSM553UA2LA> . You are receiving this because you commented.Message ID: ***@***.***>

-- -- Brad Grier ----------

hackerb9 · 2022-10-13T00:45:39Z

I see what you've got going. and I like it!

Isn't it neat when math and magic collide?

This was referenced Aug 17, 2022

Add 2021 wordle data #17

Closed

Read wordlists using Random Access to RAM files #20

Merged

bgri closed this as completed in #20 Aug 18, 2022

hackerb9 mentioned this issue Aug 18, 2022

Fix RAM Directory for 8201 and T102 #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe compress wordlist #14

Maybe compress wordlist #14

hackerb9 commented Aug 8, 2022 •

edited

Loading

hackerb9 commented Aug 8, 2022 •

edited

Loading

hackerb9 commented Aug 8, 2022

bgri commented Aug 8, 2022

hackerb9 commented Aug 9, 2022 •

edited

Loading

hackerb9 commented Aug 9, 2022

bgri commented Aug 9, 2022

hackerb9 commented Aug 9, 2022

hackerb9 commented Aug 9, 2022

bgri commented Aug 9, 2022 via email

bgri commented Aug 9, 2022 via email

hackerb9 commented Aug 13, 2022

hackerb9 commented Aug 13, 2022 •

edited

Loading

hackerb9 commented Aug 15, 2022

bgri commented Aug 16, 2022

hackerb9 commented Aug 17, 2022 •

edited

Loading

hackerb9 commented Aug 17, 2022 •

edited

Loading

bgri commented Aug 18, 2022

bgri commented Oct 11, 2022 via email

hackerb9 commented Oct 13, 2022

Maybe compress wordlist #14

Maybe compress wordlist #14

Comments

hackerb9 commented Aug 8, 2022 • edited Loading

hackerb9 commented Aug 8, 2022 • edited Loading

Ignore, use binary scheme for compression to 3 bytes below

hackerb9 commented Aug 8, 2022

bgri commented Aug 8, 2022

hackerb9 commented Aug 9, 2022 • edited Loading

hackerb9 commented Aug 9, 2022

Compression benefits

bgri commented Aug 9, 2022

hackerb9 commented Aug 9, 2022

hackerb9 commented Aug 9, 2022

bgri commented Aug 9, 2022 via email

bgri commented Aug 9, 2022 via email

hackerb9 commented Aug 13, 2022

hackerb9 commented Aug 13, 2022 • edited Loading

hackerb9 commented Aug 15, 2022

bgri commented Aug 16, 2022

hackerb9 commented Aug 17, 2022 • edited Loading

hackerb9 commented Aug 17, 2022 • edited Loading

bgri commented Aug 18, 2022

bgri commented Oct 11, 2022 via email

hackerb9 commented Oct 13, 2022

hackerb9 commented Aug 8, 2022 •

edited

Loading

hackerb9 commented Aug 8, 2022 •

edited

Loading

hackerb9 commented Aug 9, 2022 •

edited

Loading

hackerb9 commented Aug 13, 2022 •

edited

Loading

hackerb9 commented Aug 17, 2022 •

edited

Loading

hackerb9 commented Aug 17, 2022 •

edited

Loading