Senator Diane Feinstein Wants to know about the Benefits of P2P

Date: Mon, 30 Aug 2004 16:39:56 -0400
From: Mike Tancsa <mike@sentex.net>

>yep md5 made the news recently because it's been cracked:
>
>http://techrepublic.com.com/5100-22-5314533.html
>http://www.rtfm.com/movabletype/archives/2004_08.html#001055

Thats a misleading over simplification. A collision being found implies
something different than "its cracked." A weakness that was theorized
sometime ago has been demonstrated in practice. Finding collisions and
altering files in a useful way to produce a duplicate hash are different
things. There are FAR bigger security concerns than this one right now IMHO.

I recall even seeing posts about people claiming this meant original data
being reconstructed from the checksum! That would be truly amazing since I
could reconstruct a 680MB ISO from just 61d38fad42b4037970338636b5e72e5a. Wow!

Actually...

The "collision" problem discovered means that there might be MULTIPLE 680MB
files that give the same checksum.

Of course, the utility of most of these files would be an exercise left to the
'cracker' if you were looking for an OS patch but ended up with the contents of
an encyclopeida.

Regards,
Gregory Hicks

Gregory Hicks wrote:

Date: Mon, 30 Aug 2004 16:39:56 -0400
From: Mike Tancsa <mike@sentex.net>

yep md5 made the news recently because it's been cracked:

http://techrepublic.com.com/5100-22-5314533.html
http://www.rtfm.com/movabletype/archives/2004_08.html#001055

Thats a misleading over simplification. A collision being found implies something different than "its cracked." A weakness that was theorized sometime ago has been demonstrated in practice. Finding collisions and altering files in a useful way to produce a duplicate hash are different things. There are FAR bigger security concerns than this one right now IMHO.

I recall even seeing posts about people claiming this meant original data being reconstructed from the checksum! That would be truly amazing since I could reconstruct a 680MB ISO from just 61d38fad42b4037970338636b5e72e5a. Wow!

Actually...

The "collision" problem discovered means that there might be MULTIPLE 680MB files that give the same checksum.

There MUST BE multiple 680 MB files that give the same checksum. A
checksum is a many-to-one operation. If MD5 were a "perfect" checksum,
it would map each and every 128-bit strings to another 128-bit string.
However, for an 129-bit string, you have twice as many initial strings,
but still just 128-bits of hash values. If the checksum is perfectly
distributed, each 128-bit hash would have to correspond to _two_ initial
strings.

If you think about it for a second, if you have an initial bit string
of length 'n' and a hash of length 'm,' the number of collisions for a
perfectly distributed checksum (the number of initial strings that
produce the same hash) is,

  2^(n - m)

Now, a 680 MB file is about a 5704253440-bit string. That would imply
there are about 2^5704253312 strings that correspond to that one hash.
Good luck generating _one_ of those. And extra good luck in finding
the ones of the 2^5704253312 that comply with ISO 9660. And a tad
more good luck in finding the ones in there that might make sense
being the particular ISO image you want.

The issue with MD5 is that there may be techniques for an attacker to
generate collisions (and again, there MUST BE collisions) using many
fewer operations than a brute force approach. The brute force approach
has always existed. There is no perfectly secure crypto, only crypto
that is "secure enough for this application for now." MD5 is still
safe enough for most applications for now (hell, DES is safe enough
for most applications for now). However, it should be looked at as
depricated and phased out, i.e. not used in new protocols and products.
The Death of the Internet has not been predicted. There will be no
film at eleven.

I recall even seeing posts about people claiming this meant original data
being reconstructed from the checksum! That would be truly amazing since I
could reconstruct a 680MB ISO from just 61d38fad42b4037970338636b5e72e5a. Wow!

Assuming that MD5 is a PRF, about 2^{-128} files will have such a hash
value. For a file 680MB in size, About 2^{680*1024*1024*8-128} in
total. If I had a list of all of those files, it would be impossible
for me to identify which of them was the 'right' image.

First-preimage resistance means that it should be computationally
infeasible for anyone to create *any* file with that particular
hash. It was also believed to be computationally infeasible to find
*any* two files that had the same MD5 hash. The attack on MD5 showed
that it in fact is computationally feasible to find two files with the
same MD5 --- someone did it. This attack showed that MD5 no longer
meets some of its design requirements.

The "collision" problem discovered means that there might be
MULTIPLE 680MB files that give the same checksum.

Scott

Actually...

  None of the demonstrated collisions are in a file approaching
anything close to the size of a typical CD. They are only 1024 bit (128
byte) files, and the found collisions only differ from the original by
a few bytes.

  Finding a collision in a 200+ MB patch file is not terribly useful
unless you can actually make the patch do something it shouldn't, or not do,
something it should. This is computationally expensive in the extreme.

  And even if you manage to do so, odds are that your file, even if it
is both detrimental and a collision in MD5, would not also be a hash collision
when hashed with SHA-1, or -256, -512, and the like.

  I could quite easily avoid this problem by hashing the source file
using a few different algorithms and comparing all of those hashes to the
received file.

  There have been some near collisions (on modified versions of MD5)
in existance for several years; the fact that MD5 is not a perfect hashing
algorithm is not a surprise. MD5 is weaker than previously thought, sure.
But is this really likely to be a problem for network operators soon? I
don't think so, although people should evaluate these risks for themselves.

  --msa