Other C64-related stuff -> http://www.cs.tut.fi/~albert/Dev/

PuZip

PuZip version 1.13 (20.12.2004) - ZIP compressor for C64. Sources: puzip.asm.gz, puzip.c

See also: gunzip, GZ/ZIP decompressor.
See also: Burst -- C64 burst modification.


Features


Introduction

Some discussion in comp.sys.cbm or the cbm-hackers mailing list late 1998 or early 1999 pondered the problem of having a ton of different archivers for commodore systems and the systems used to manipulate the files. More and more people store and maintain their software collection somewhere else than their original 5.25" disks.

A number of solutions exist for transferring data from e.g. a PC into a 1541 disk so that path is no longer a problem. GunZip.c64 and UnZip64 can even handle the decompression of GZIP and ZIP files on the C64 or C128 if necessary. But it still has been a real pain to transfer files from commodore systems to the outside world for some people. Creating lynx files on the C64, transferring this through a machine to another machine to dearchive the packet and rearchiving it as ZIP is unnecessarily troublesome.

The discussion, if I remember anything about it correctly, tried to find a common format for all commodore files. Someone contemplated writing a ZIP archiver (not necessarily compressing) for C64 to help eliminate redundant and proprietary formats. Unfortunately nothing ever came of it. When the idea was mentioned again, I decided to take a look into it.

I started December 11th 1999 by writing a C-language skeleton and after a couple of nights got it working so that all the offset fields were calculated correctly. Without the correct offsets unzip's in other systems may not want to decompress the files although GunZip.c64 and UnZip64 may decompress then without a hitch.

First I created a simple archiver where the files were not compressed. One of the problems with the ZIP store method is that it requires that the stored file size is written to the ZIP file first. The file size can only be calculated by reading the file, which would mean two passes over the input files and a lot of waiting. This made the store method unusable.

Fortunately the Deflate method includes the possibility to save blocks of data instead of the whole file AND you can select from three block types: stored, LZ77 with fixed huffman, and LZ77 with dynamic huffman. With the Deflate stored method I can read a part of the file and store it, repeating this until the input file ends.

When I got this working with C64 I continued working on the C model and included the LZ77 stuff into it. The main idea at this time was to keep in mind the C64 architecture so that the algorithm would be easily and efficiently converted to 6510 assembly language. When I got the C code to the appropriate condition I started the conversion.

A few modifications were done during this time to make the mapping to 6510 easier and more efficient and the final product was quite good. Compressing the test files (two D64 images) in my 25MHz 68k Amiga took 55 seconds, so performing the same with C64 in 26 minutes is excellent, and with C128 in 15 minutes is really great. Note that the nine-minute difference is accounted totally to file I/O. The only difference between the versions is that the C128 version uses burst read when the C64 version doesn't. C128 in FAST mode only wastes 8 minutes, slightly more for C128 in 40-column mode.

It is possible to get a little more compression with the 2kB window size PuZip is currently using and I'll probably do it. Increasing the window size to 4kB does not help much and larger window sizes are not possible nor practical with a 64kB machine.

See the history section for newer additions to the program and other talk about this and that. Features will be added as they are needed or I'll think of something crazy enough. If you need a feature, just tell me and we'll see if and how it can be implemented.

Usage

First the program asks for an input drive. Press 8 or 9 for drives 8 and 9, respectively. Use 0-7 for drives 10-17.

Then select the files to be archived into the ZIP file by pressing either Y for LZ77 compression or S for store (no compression). You can also use 1-3 to use faster but less efficient compression, 3 being the fastest. You can also press D to select disk image compression. The image type is automatically detected and selected between D64/D71/D81. If you don't want a file included in the archive, press N. When you have selected all files you want included, you can press Q to skip the rest of the files.

Output drive is asked next. 8-17 can be used.

Then the archive name is asked and you can also give a zipfile comment. If you do not want any comment, leave the string empty. In the "zip file?" prompt you can also send a command to the destination drive by "@command". If you give just "@", the drive error string is read.

The border is flashed when reading or writing data. The screen is blanked whenever it can be to speed up compression. The 2MHz mode is also used when possible. The screen in turned back on for 3 block times each 32 blocks, if an error is encountered, and of course at the end of compression.

Speed

The compression speed is fairly adequate now. In my test I compressed two disk images (a full one and a half-filled) from my 1581 drive to the same drive. The VIC20 version took 24 minutes, the C64 version 26 minutes, and the C128 version 8 minutes.

These figures are not very helpful because I can't release the images I used and you can't compare your results to mine. That's why I rerun the tests with this disk image t.zip for some hardware combinations. If you try compressing this image file with some exotic hardware, I would be very interested in knowing your results and will add them to this table. Decompress it first to the drive you are testing (either as a file or as a disk image) and then use PuZip to recompress it.

Machine InDrive/TypeOutDrive Time
VIC20 1581/File1581 13:25
C64 1581/File1581 13:30
C64 +Burst 1581/File1581 8:01
C64 +Burst (NTSC) 1581/File1581 8:01
C128 40 Columns 1581/File1581 4:50
C128 80 Columns 1581/File1581 4:39
C128 C64 Mode 1581/File1581 11:18
C128 C64 Mode +Burst 1581/File1581 5:57
C64 +Burst 1570/Disk1581 9:19
C64 (+Burst) 1541/Disk1581 15:58
C64 (+Burst) 1541/Disk1570 18:31
C64 +Burst 1570/Disk1541 12:00
C64 IDE64/FileIDE64 6:12
C64 1541-II/DiskIDE64 14:04
C128 C64 mode IDE64/FileIDE64 4:01
C128 C64 mode 1541-II/DiskIDE64 ----

History

11.12.1999
Started working on a C version.

19.12.1999
The first C version with Deflate compression with fixed huffman. The first C64 version with Deflate store.

24.12.1999
The first C64 version with Deflate compression with fixed huffman.

25.12.1999
The first public release.

26.12.1999
The output drive setting was not used. C128 version added (uses burst read).

27.12.1999
PuZip had problems with files smaller than 255 bytes. It treated them always as 255-byte files. And because of a reversed condition the C128 version only handled the first file correctly if the source device wasn't burst-capable.

Also, ",s" is now appended to the filename for SEQ files (",u" for USR and ",r" for REL if you want to be silly). The maximum number of files is now 64 and the limit is also checked. You can't accidentally select more than 64 files anymore. [Note: Currently the limit is 254.]

This time I tested PuZip with C64 and C128, 1541->1581, 1570->1581, and 1581->1581. All 6 combinations were okay.

28.12.1999
Now displays original size, compressed size and compression ratio (compressed/original). Smaller is better and over 100% means that the file was expanded. The only annoyance is that the 32-bit decimal number output routine and the 32-bit division routine take 450 bytes of precious memory. Fortunately I can still decrease the output buffer size. Sometime in the future I should use BANK1 in C128 mode for output buffer or something else.

"Lowercase" PETSCII characters in filenames are now converted to uppercase ASCII instead of becoming ISO-Latin-1 accented characters.

Error detection and reporting is now better in both burst and non-burst modes. The created archive will contain as much of the file as possible. Now you at least get to know if a file read fails.

You can now give a 36-character zipfile comment.

Also, partial D64/D71/D81 support has been added. Partial, because it is only available in the C128 version and you need the source drive to be burst-capable (i.e. 1570/1571/1581). Select the drive normally and press "D" in the file selector to select disk image compression. This isn't possible for empty disks though. I'll have to figure out something, but you seldom compress disk images with no files in them anyway. Note also that if a disk is full of files, you get better compression for a disk image than individual files. If there is a read error, the actual data is still transferred even if header or data CRC is wrong. In disk image mode errors do not abort the compression.

You can enter disk commands intended for the output drive in the ZIP FILE prompt by using @. A single @ just reads the error channel.

30.12.1999
Disks can now be compressed as images with non-burst drives and also with PuZip.c64. Just press "D" in the input file selector. But it will take a while: one D64 compression test with c64 and from 1541 to 1581 took about 14 minutes, another 16.5 minutes. The latter test with PuZip.128 and from 1570 to 1581 took less than nine minutes. Have I ever mentioned that burst transfers rule?:-)

Note that the dostype and the double-sided flag are used for automatic detection of D64/D71/D81. 1581 disks have dostype "D", 1541 and 1571 disk have "A". Double-sided 1571 disks also have the double-sided flag set in the first directory block.

2.1.2000
Booted up my PET and 8050 drive to check the directory file format. The fuse in my 3032 was blown, but MMF9000 worked. I added support for 8050 (disk version C) into the directory selector. 8250 directories and 8050/8250 disk images are not yet supported.

4.1.2000 (1:25AM)
Got the burst detection and transfer working on a C64 (a machine with the burst modification -- two wires added). I'll still have to test it a bit before releasing it and maybe perform some cleanup also. File and D64 read and compression at least seems to work okay when tested with 1570 and 1581.

8.1.2000
The burst detection seems to work better now. Both C64 and C128 versions now use burst read if possible. The C64 burst modification is also detected automatically. See http://www.cs.tut.fi/~albert/Dev/burst/ for details. Note that you can not have the cassette drive connected while using burst-capable devices with the C64 burst modification.

IDE64 does not support the raw directory format that PuZip uses to detect the disk type for disk image creation and the file start track and sector for burst read. The directory menu now detects this kind of directory and adapts accordingly. Also, IDE64 currently does not set ST when the error channel has been fully read, so now PuZip ends reading at CR ($0d).

Thanks to Tuomas Rantala for IDE64 raw-directory dump, although IDE64 does not actually support it yet.

10.1.2000
The deflate store does not expand the files by 2% anymore because files are now saved in larger blocks (about 30kB/23kB in C64/C128). The expansion is 9 bytes per block, which is generally less than a percent.

If opening the directory file with the secondary address 2 fails (reads directory as a raw data file), the directory is opened with the secondary address 0 (reads directory converted into a program). This is for future IDE64 compatibility (if I only got it working now).

You can now press RUN/STOP (no RESTORE) in "OUTPUT DRIVE" selection to go back to select a new input drive.

14.1.2000
It turned out to be "I0" which froze the machine when IDE64 was used as a source drive. With IDE64 this command performs full HD init. I would've chosen "U:" or "U9", as the initialize command is very often used. Even the 1581 manual suggests using it to properly close files when your basic program crashes. Maybe I can convince Josef to change the command. Anyway, now you can use IDE64 as both source and destination drive.

Also, I made a simple 80-column cable to test PuZip in 80-column mode. It seems that using backspace is veeeeryyyy ssloooow with C128 and even slower with the 80-column screen. Now I use cursor_left + space + cursor_left instead. I also tested the compression in FAST mode and it was almost twice as fast than in SLOW mode.

15.1.2000
Fooled by the docs again. Appnote.txt omitted the information that the data descriptor also has a header: $50,$4b,$07,$08. PuZip now adds the header, and GunZip allows a data descriptor with or without the header, but this of course means that gunzip has a 0.0000000232% chance of detecting it incorrectly. Unzip programs in other systems should decompress the files correctly regardless, because they use the CRC32 and length values found in the central directory instead.

Btw, the only non-seekable device I could use to check what the Amiga zip program generated was my c1581-handler. (Seeking is possible if a file is opened for reading, but not if it is opened for writing. I could've use AmigaDOS shell and pipes, my own shell uses files to simulate pipes.) If the file is not seekable, the zip program can not back up in the file and fix the CRC32 and length values in the file header and it has to use a data descriptor instead.

16.1.2000
2MHz mode is used in C128 80-column mode and in 40-column mode during compression, except for during 2 / 32 blocks so you can look at the progress report. 2MHz and screen blanking are used in the C64 mode except during the slow serial bus traffic. The screen is turned on if an error is encountered.

19.1.2000
Support for CMD Ramlink and Hard Disk directories added. Thanks to Geoff Sullivan for CMD Ramlink & HD raw-directory dumps and Colin J Thomson for testing the versions while Geoff was away from home.

20.1.2000
The VIC20 version was added. The compression ratio will not be quite equal to the C64 and C128 versions because I had to make the history buffer and some tables smaller. Saving is also performed more frequently because the output buffer is smaller. Compressing the two D64 images 1581->1581 took 23:45, which is two minutes less than the C64 version without burst transfers. Now you must believe that VIC20 is the faster machine: the difference between 1.1MHz and 980kHz.

21.1.2000
Now allows 254 files for C64 and C128 versions and 181 files for VIC20 version. The file info storage shares memory with the output buffer, so if you have more files, the output buffer will be smaller and saving is performed more often.

10.9.2000
Started the C16 and +4 support. The C16 version seems to work quite fine now, although the output buffer is a little small, and you can select maximum of 26 files for a ZIP (which reduces the output buffer size to only 256 bytes). This version has the same LZ77 window size as the VIC20 version, i.e. 1kB.

I also found out a bug in the store algorithm. It didn't work right for files bigger than the current output buffer size.

I also cleaned up the code to make GEOS adaptation easier.

11.9.2000
Added ROM switching code for +4 version and tested it with C16. Also, tested the store algorithm of the +4 version in my C16 and apart from wrong CRC's, the archive creation was successful. The +4 version uses 4kB history buffer while the C64 and C128 versions only use 2kB. This should increase the compression ratio a little.

13.9.2000
Increasing the C64 version window size from 2kB to 4kB saved 4700 bytes on the test D64. Using dynamic Huffman would save about 8600 bytes, but it would require reading the input file at least partially twice -- first to create statistics and then to really perform the compression and save the data. Maybe a statistics sidefile generator for PuZip plus a Zip repacker would be the way to go, so that the hard work of string matching is done first, and there is less data to read in for the second pass.

15.9.2000
PuZip now uses GETKEY instead of directly fiddling with the keyboard buffer stuff. This should fix the lockup problem with the Plus/4 version.

Also fixed a 2MHz problem when PuZip was run on C128 in C64 mode and a disk image was compressed from a non-burst device, such as 1541. Thanks to Tuomas Rantala for the report, although it took me all too long to provide a fix for the problem.

16.9.2000
The C64 and Plus/4 versions now use 8k window, which increases the compression ratio a little, although also makes the compression a tiny bit slower.

23.2.2001
IDE64 block read and write added. Thanks to Kajtar Zsolt (Soci/Singular).

4.6.2001
File permissions fixed.

7.6.2001
File dates are now set to "7.6.2001 02:39".

12.4.2002
Now appends ".D64", ".D71", or ".D81" to the names of disk images in the ZIP file.

7.8.2002
Now uses lazy matching to gain another 1% of compression. 39min->47min
IDE64 detection did not behave well with Action Replay connected. Now tries to detect AR first.
C128 history buffer increased from 2kB to 4kB.

1.9.2002
Now supports multiple speeds. "Y" or "0" uses lazy matching with maximum compression, the whole buffer is searched. "1", "2", and "3" use greedy matching with half, 1/4, and 1/8 the search range.

9.11.2002
C128 version now uses the other 64k of memory for search structures. The history buffer is now 8k, i.e. the same size as in the C64 version.

22.11.2008
Ported the development environment from Amiga to FreeBSD and compiled with the IDE64 patches that Soci sent me (finally!).

TODO
Subdirectory support (record/enter subdirectories)?
Support individual file comments?
Dynamic Huffman Trees - or a postprocessor

To the homepage of albert@cs.tut.fi