[wellylug] Encrypted backup recovery

Tue Aug 24 18:00:29 NZST 2004

On Mon, 23 Aug 2004, Kevin Dorne wrote:

> On Tue, Aug 24, 2004 at 05:22:12PM +1200, David Antliff wrote:
> [snip]
> > It might be easier to think of it in this way - forget trying to find something tied up with encryption and just look for a way of injecting redundant information.
>
> Aha.  I guess I was caught up trying to keep 100% of the benefits of
> compression.  In that case, I can just create parity files (for example,
> using Parchive, http://parchive.sourceforge.net/) and use them for
> recovery.

Well, from an information point of view, compressing something
(losslessly) reduces the redundant content. Predictive modellers do their
best to remove redundancy by exploiting patterns. Therefore the
'information cost' of losing a bit from a compressed file is greater than
losing a bit from the plaintext. This is all pretty obvious really, but
the upshot is, anything you compress is likely to be less resistant to
errors (and in my experience, a single error in a gzipped file spells
doom since everything after that is unrecoverable. Block compression
methods are usually able to recover at the next block. Bzip2 does this,
and there are patches for gzip that achieve the same result. But you still
lose data.

Encrypting a file can produce pretty much any kind of output. However it
can't be completely random since it still contains information! A
compression modeller that understands the patterns produced by the
encryption algorithm would do better than a general modeller, but I
suspect knowledge of the plaintext would be needed. If the encrypted text
looked 'too random' to the modeller, it won't compress well at all. In
fact it can be shown for any compression model, there is always a
non-empty set of files that will actually compress to a larger file size.
It can also be shown that the 'arithmetic encoder' is the theoretical
'optimal' encoder and is already widely used. The advances in compression
are in the area of models now - the better you can predict the next
symbol, the less bits you need to use to encode it.

Anyway I'm rambling now - thanks for the headsup on parchive - that one's
new to me.

-- 
David.