DeltaPACKER Project: Researching on Data Compression
However, several arithmetic coding methods have been subject to heavy patenting from IBM and other big companies. Well, that's no problem for me, because I'm using my own algorithm, and anyway such patents are not applicable to non-commercial researching & studying.
The biggest part of my program is an accurate (but slow) bit-level predictor, which drives the arithmetic coder. That's the part I've working the most on.
Currently, DeltaPACKER outperforms WinZip on most files, even when choosing "Max Compression" in WinZip. In general, my program does wonders (when compared to others) on highly redundant files. Yeah, but arithmetic coding and my predictive schemes slow down when there's few redundancy... Anyway, I demonstrated what I wanted to... And there's also room for improvement :-)
I've decided to make a new data compression algorithm from the ground-up.
First, it encodes BITS, not BYTES. This can lead to size gains. For the bit encoding part, DeltaPACKER incorporates the state of the art: An arithmetic coder ! One could say, that HUFFMAN coding (vastly used by the best packers nowadays) is only a subset of Arithmetic coding. The latter allows to code the events using a fractional number of bits, thus freeing us from rough power-of-2 approximations, and greatly improving the accuracy of the coding process !
Calculating The Factorial of 1,000,000
It's equal to: 1 * 2 * 3 * 4 * (...) * 999,998 * 999,999 * 1,000,000
The result is A MONSTER, it's a number of 4,622,222 digits !!!
It would take 2 months on a P5 300 MHz for calculating it, but I took it down to < 4 hours using my assembly routine.
The overflow problems are usually a nightmare to handle in C, but it's very easy to use in Asm.