That md5 has now been deprecated for awhile is certainly also true; and people should have definitely moved on by now.
Then again, I just got yet another Debian DSA mail which has plaintext download links for new binaries. The integrity verification mechanism for said binaries is, you guessed it: PGP-signed md5sums.
We still have a long way to go. 
– S
Then again, I just got yet another Debian DSA mail which has
plaintext download links for new binaries. The integrity
verification mechanism for said binaries is, you guessed it:
PGP-signed md5sums.
I can assure you that you will continue to receive these messages for
a while (unless you unsubscribe from the relevant mailing lists).
Our rationale is that in order to carry out currently known attacks on
MD5, you need to create a twin of documents, one evil and one
harmless. In Debian's case, we prepare the data we sign on our
trusted infrastructure. If someone can sneak in an evil twin due to a
breach, more direct means of attack are available.
In practice, the download links themselves are the larger problem
because users might use them without checking anything. Eventually,
they will go away, together with the MD5 hashes. Newer versions of
APT also use the SHA-256 checksums embedded in the Release and
Packages files.
For me the MD5 hashes on file downloads are more valuable to ensure the
package is accurate to a byte rather than to verify its authenticity or
integrity.
Wouldn't listing both SHA-1 and MD5 hashes for a file download assure almost
complete confidence that the file is the original one? I don't think anyone
has been able to create a duplicate file that generates the same SHA-1 *and*
MD5 hashes as the original file.
Frank
For me the MD5 hashes on file downloads are more valuable to ensure the
package is accurate to a byte rather than to verify its authenticity or
integrity.
Wouldn't listing both SHA-1 and MD5 hashes for a file download assure almost
complete confidence that the file is the original one? I don't think anyone
has been able to create a duplicate file that generates the same SHA-1 *and*
MD5 hashes as the original file.
I would not be too sure. MD5 only makes one pass over the data.
Suppose that I find two messages, M1 and M2, that have the same MD5 hash - there are methods out there to do that.
M1 is the good message, M2 is the bad message.
Let "||" be the concatenation operator.
So, for any string S
M1||S and M2||S have the same MD5 hash.
So, if I can find an S such that the SHA-1 hash for M1||S and M2||S
are the same, the MD5 hashes for these messages will still be the same, and you
have your feared condition.
My understanding is that one type of collision search involves using an S
and trying to find collisions of
M1 and M2||S by varying S. Modifying this to a common S does
not seem that different, and I would not want to bet a lot on it being fundamentally much
more difficult. (It might be, it might not be, I have no idea, the
question is, how much are you willing to bet on it ?)
Regards
Marshall
Regards
Marshall
* Frank Bulk:
For me the MD5 hashes on file downloads are more valuable to ensure the
package is accurate to a byte rather than to verify its authenticity or
integrity.
Indeed. I've experienced that first-hand: the hashes helped to
isolate a case of faulty router memory at the ISP I used at home.
(The TCP checksum is very weak and does not detect bit errors which
occur at multiples of 16 bits. If the probability of such errors is
so high that two of them occur in a single segment, they very likely
cancel out, which was exactly what I observed in the issue mentioned
above.)
Wouldn't listing both SHA-1 and MD5 hashes for a file download assure almost
complete confidence that the file is the original one? I don't think anyone
has been able to create a duplicate file that generates the same SHA-1 *and*
MD5 hashes as the original file.
For most applications, it's better to include a totally random string
at the beginning of the message, before signing it, and strip it upon
signature verification (or encode it in a way so that it is simply
ignored by the application). The convergent property of hash
functions (if, by chance, two people come up independently with the
same document, it hashes to the same value) is rarely needed. A
random string near the beginning means that the attacker doesn't know
the initial internal state of the hash function when the collision is
constructed, which should make attacks involving evil twins much, much
harder.
I expect that at a not too distant point in the future (say, within
the next ten years), strong hashes keyed in a such a way offer very
significant performance gains over non-keyed, but still strong hashes,
so that most protocols which do not rely on the convergence property
will switch to them. Convergence might even turn out to be too
costly, not just in terms of performance, but in security. (And I
write this as a frequent Git user. 8-/)
More to the point - there are known easy ways for an attacker to generate *two*
documents that have the same MD5 hash (the basis of this attack). However, the
attacker has no control over what the actual value of that MD5 hash is.
What's *not* still feasible is for an attacker to take Debian's data and the
already-generated MD5 hash, and create a second file that hashes to that
same already-known hash.
At that point, it's probably easier to just attack the trusted infrastructure
in an attempt to recover the GnuPG private key, and then just sign your
evil replacement package. There's 2 advantages to this attack:
1) It doesn't *matter* if they PGP-sign the file with the MD5 hashes or if
the file has SHA1 or SHA512 - the signature will look fine.
2) It's been proven doable to at least one major distro in the past few months.