as you may or may not know, our resource management relies heavily on SHA-1. Basically every resource is identified by a SHA-1 over it’s contents. Now that is all pretty great, but given the fact that SHA-1 collisions can be manufactured these days we should rather quickly think about a replacement IMHO. So either we use a better hashing algorithm with less collision potential or we rethink the whole thing. I guess the former is the better option, but as this is a complicated and rather breaking change I would like to discuss how you feel about it and what solutions you see.
Since the hashing is not security related, but only used to identify the resource I think it’s not too big of a problem.
Concerning the probability of SHA-1 hash collision I found this answer:
So to run into the problem by accident is highly unlikely.
Correct me if I’m wrong:
I guess the hypthetical attack vector would be an adversary that constructs a bad document that collides with the hash of an otherwise uploaded good document, then somehow uploads that bad document in order to deliver it to all users that access the hash link to the other document.
So as long as the file of an existing hash isn’t overwritten, this is no issue, right?
Correct, the Neos asset management will not overwrite an existing resource with the same filename.
If I wanted to use that attack vector, I could also go ahead and replace the file in question and check Keep the filename "thisIsNoDangerousFileAtAll.pdf". This way I would not even have to think about how to produce a SHA-1 collision.
Similar issue when someone has access to your filesystem. The attacker could replace the underlying resource (again this has nothing to do with the automated way how resource names are generated).
So I don’t think that switching to a different hashing mechanism has any benefits security-wise.
This is correct, there is no direct attack vector for this at the moment. Just the question if we want to pre-emptively go to a less collision prone system.
O the one hand i am pretty sure that this can be exploited somehow. Just by looking at other bugs that are exploited nowadays but i would be reluctant to alter the resource handling now just for that. Especially since we have to adjust the asset-handling anyways and this might make further changes necessary.
For now if we can just replace the hashing algorithm with a smooth migration (might take some time during deployment) maybe we should do so.
So if we would change the hashing algorithm and run a migration every resource path on the website would change (which could well be A LOT of resources on some sites). And for each resource we’d need to generate a redirect as well (because there could be direct links to resources from external websites for example). Some sites you’d might even have to take down for a little while during the migration to prevent broken pages.
Forcing everyone to go through this process just does not sound worth it to me, since we all agree that there is no immediate cause for action.
Making it optional or mixing hashing algorithms (like just using another hash for newly uploaded resources) does not sound reasonable either because this could make further adaptations in this area more complicated for us.
If there’s no attack vector I’d also like to ask to avoid this.
We have cleaned things up before moving to Google Cloud but in peak times we already had about 1.2 million resources and migration of that stuff is tricky to say the least
So if there’s a security risk at some point I’d say go for it and everyone just has to make sure the upgrade works but if it’s just something that could theoretically happen I’d avoid it since it can potentially create a huge workload for some projects.