Scaling problems when calculating asset usage

While our Neos projects grow, I am identifying more and more parts of Neos which are difficult to scale with big numbers of nodes.

My current problem is the asset usage calculation done in AssetUsageInNodePropertiesStrategy - it does a like query searching for asset references in the property fields of all nodes. With about 500K nodes currently, this takes some time. The details view of an asset needs about 6 seconds to render, checking about 10K assets for unused ones about 14 hours.

My first question goes to the guys that are more deeply involved in the CR rewrite (@Nezaniel, @sebastian) - is this problem addressed in the new CR?

Are there any solutions out there to get this operation faster?

One solution I can think of, is to add another table besides the nodeData table with asset references to assist a faster lookup of asset usages.
The downside of course is the effort to keep the nodeData and this referencing table in sync which is always a hard task.
Another one is to use elastic for the lookups (having elastic in sync with the node data is already crucial in this projects)

Any thoughts on this would be very welcome.

Cheers,
Daniel

I think the rewritten CR can address the problem with a custom property projection. But it’s not in there yet.

I have had some success with using a mysql fulltext index on properties and then a custom Strategy using that. YMMV though.
Another option is ES which should work fine too for this with a custom strategy.

An additional table would be beneficial for assets as well as eg. urlPathSegments. It’s difficult to abstract though and obviously keeping it in sync will be tricky. But could be a way to go.

Hey @christianm,

thanks for your reply!

Two questions. First, if mySQL fulltext is a solution for that problem - wouldn’t that help also with other flowQuery operations? Would that be a problem with doctrine / database compatibility?

Is there a better way to exchange the Strategy than AOPing its methods?

The proper way in the new CR would be having assets as nodes in the CR and evaluating reference edges which are part of the default graph projection.

Since, as Christian pointed out, some time will pass until the new CR is ready, you could mimick the behaviour with a separate table. MySQL fulltext on properties sounds scary at half a million nodes but I might be wrong there

Thanks for the replies,
I guess I will go for the ES approach wich solved all my scaling problems till now :wink:

Let’s see that we get the magical new CR soon!

Cheers
Daniel

In the mean time, we can try to really use the JSON field type, in MySQL or at least in Postgres, and avoid those ugly like in the property fields, than can also help.

@daniellienert since we also had this problem I wonder if you came to a “good” solution.

For our case we “simply” disabled the usage counting in our environment by using AOP and overriding the functions that called it:

@Flow\Around("method(Neos\Neos\Domain\Strategy\AssetUsageInNodePropertiesStrategy->getUsageReferences())")
@Flow\Around("method(Neos\Neos\Service\ImageVariantGarbageCollector->removeUnusedImageVariant())")

… and live with the consequences (no “usage count” in the detail view and potential uncollected garbage of assets when an asset is replaced in some node).

… and the main problem: Assets may accidently be deleted by editor and a ./flow media:removeunused would be disastrous.

I fixed it using the Elasticsearch index: https://github.com/punktDe/elastic-assetusageinnodes

2 Likes