RFC: Asynchronous thumbnail generation in media browser

TL;DR

Support asynchronous thumbnail generation in image view helpers to avoid long loading times in media browser.

Problem

The media browser displays up to 30 thumbnails for a single view, which are generated on-the-fly blocking the view until all thumbnails have been created. This can easily result load times of over 60 seconds resulting in timeouts (blank screens) or long waiting times due to the thumbnail generation easily taking a couple of seconds per image for high resolution images. This is a terrible user experience and thus we need to improve it.

After thumbnails have been generated they rendering is much faster, but it’s a never ending problem and as soon as thumbnails have been cleared it will happen again.

Additional info

A client of mine having many images is having major issues with this problem and has offered funding me to do the work.

Goal

  • Remove render blocking to make the pages load in 0-2 seconds by generating thumbnails asynchronously

Technical how

  • Add asynchronous flag to view helpers
  • Enable asynchronous flag for media browser views
  • Asset service checks if thumbnail exists and has a resource
  • If it exists and has a resource, the resource URI is returned
  • If it exists but does not have a resource, the a asynchronous link is returned
  • If it doesn’t exist, a new asynchronous thumbnail domain model is created and saved in the database, but not generated (no resource linked) and a asynchronous link is returned
  • Asset service returns a URL to a thumbnailAction in ThumbnailController with the thumbnail
  • thumbnailAction controller accepts thumbnail identifier as argument, generates the thumbnail and redirects to the thumbnail URI

Benefits

  • Generic solution can be used in other places than the media browser
  • Only asynchronous if thumbnail doesn’t already exist
  • Doesn’t disclose any information about the resource (only thumbnail identifier exposed)
  • Cannot be used for generating random thumbnails (only thumbnail identifier exposed)
  • Backwards compatible
  • No rewrite rules necessary

Challenges

  • Is asynchronous (not generated/resource less) thumbnails allowed in the database? (data integrity)
  • Can they cause any problems in other areas?
  • Multiple requests will take longer and require more computing power
  • Should we limit the amount of concurrent thumbnail requests using JavaScript?

Alternative possibilities

  • Stream the image in the thumbnail controller instead of redirecting
  • Generate thumbnail in a subprocess immediately (faster response in controller requests)

Additional options

  • Load images asynchronously using JavaScript and display loading indicator (front-end performance)

@christianm, @dfeyer, @robert: Maybe you could provide some feedback? Thanks :smile:

Thanks for this RFC, I have some thought about this too, I have an alternative approach, generating thumbnail during upload / resource creation based on presets, let say:

  1. The editor upload an image
  2. Neos has a preset with all required dimensions for the backend usage (link editor, image editor, media browser, …), alternatively the preset can be extended to generate required thumbs for the frontend usage.
  3. The thumbnail are generated during the upload. The service can be implemented based on an interface, to be able to have a service based on a jobqueue.

I think this way is more “easy” to implement. But the performance impact during upload need to be tested carefully. This solution is also more pragmatic and can work in frontend context without requiring javascript. If the thumbnail is pregenerated it’s fast, if not it’s generated on demand.

If we can avoid creating thumbnail during rendering we can have a hug boost in performance for image centric website.

I also think that we need more flexibility in fluid to be able to use external thumbnail service, like https://github.com/thumbor/thumbor

Thanks for the feedback.

I did consider this, but would strongly advise against it though. It moves the concern to a whole different place where it really doesn’t belong. It could be done in a way that new uploads fires a signal and done asynchronously, however solution like these create a lot of technical depth and interferes with all uploads to the media library. How do you make sure that if the templates are changed, the thumbnail generation is changed too? Also it has the downside of being useless when clearing the thumbnails.

Also I think you misunderstand something, the proposed solution doesn’t require any JavaScript to work and can work in any context. So don’t really think this alternative is easier to implement.

Regarding allowing usage of thumbnail services, that can make sense, but out of scope of this RFC though.

OK yes I see, but my main issue with this, is the amount of request to the backend.

We have a project that currently use the new ThumbnailGenerator, 30 simultaneous request to generate PDF preview, will kill the server (same problem with subprocess if we try to generate thumb in parallel). And without JS we don’t have a way to limit the number of HTTP request the browser will trig.

The JS loader for the media browser can work, but doesn’t solve the issue in frontend.

Thumbnail can be tagged if they are generated based on a preset, if you change the preset you can flush the tag to only regenerate those thumbnails.

Project like thumbor use a random TTL for a generated image, like set the TTL during thumb generation at 90 days +/- 7 days. The thumbnail is deleted after the TTL. This can be a solution to avoid the problem that we face currently if we change some thumbnail size in any templates. It’s out of scope for this RFC, but I think we need to discuss this topic at some point.

Hey everybody,

I tend to like the solution proposal of @aertmann more, as I feel it makes the system more robust (e.g. when thumbnail sizes change etc), or when the thumbnails get deleted (for whatever reason). We recently had this case, and it was very helpful that we just needed to re-create the original assets and the thumbnails were autogenerated afterwards.

So, IMO, transient data like thumbnails must be regeneratable at all times, leading to a more resilient system than if the system would rely on the fact that thumbnails are always available.

Can’t we just combine both approaches: When uploading, try to proactively render a thumbnail after the upload, but have the other as fallback?

All the best,
Sebastian

Thanks for the feedback.

Good point, it is possible to do both. However that’s also more work and still has the problem of being completely separated from the view and affecting all image uploads delaying it in places where it’s not wanted, e.g. uploading images in the inspector.

Additionally we currently have at least four different thumbnail sizes used for images: the image view, the list view (mini and zoom – only fetched from server when needed) and the inspector. Generating all of those doesn’t make much sense, since they’re only potentially needed, thus the generation as needed solution is preferable.

Unless we decide to create some view independent sizes to use instead, like two or three generic sizes and use them in the views instead. That would mean less optimal image file sizes, but streamline it. Could even be configurable in settings, like:

TYPO3:
  Media:
    autoGeneratedThumbnails:
      small:
        width: 50
        height: 50
      large:
        width: 200
        height: 200

And use that configuration in the views width="{settings.autoGeneratedThumbnails.small.width}"

Another approach is to create a command controller that looks for ungenerated thumbnails and generates them. This wouldn’t have the regeneration problem and would be opt-in. The challenge would be creating those ungenerated thumbnails, although this approach has the exact same problems except for not delaying uploads.

Btw. it’s also possible to put all this behind a feature flag, so it’s possible to opt-in or out depending on the default.

So actually to summarize, a good solution could four steps:

  1. Asynchronous image URL
  2. Create generic thumbnail sizes
  3. Create command controller for generating ungenerated thumbnails
  4. Put everything behind feature flags

WDYT?

1 Like

I like that idea! Especially to have explicitly defined thumbnail configurations which can be pre created via command. This would make it possible to add additional configurations also for the use for example in frontend galleries.

I would further suggest to make it possible to pass the complete set of settings to the viewHelper like:

<media:image image="{image}" thumbnailConfiguration="{settings.autoGeneratedThumbnails.small}">

Its a good idea to make thumbnail configuration at upload time configurable. I could like to have a quick upload and do all the generation by a cron job triggered commandController run.

1 Like

I like the idea too, but maybe a better synthax (shorter) should be:

<typo3.media:image preset="typo3.media:small" .../>

Better because:

  • Shorter (imagine preset with maximumWith, allowCropping, …)
  • When you are not in the Media package your don’t have access to the settings, so with the preset name, people can add new preset in the Settings.yaml and use them anywhere
  • When used for a complete project, it’s pretty easy to have an overview of the available presets
  • Use a name space to avoid collision, with “small” that can exist for multiple purpose (Media package & Site package by ex.)
    TYPO3:
      Media:
        autoGeneratedThumbnails:
          'TYPO3.Media:small':
            width: 50
            height: 50
         'TYPO3.Media:large':
           width: 200
           height: 200
         'TYPO3.Media:small-square':
           width: 80
           height: 80
           allowCropping: TRUE

Generated thumbnail can be “tagged” with the preset identifier, and the CLI to clear thumbnail can be updated to support preset based cleaning.

About the performance impact, the thumbnail generation can be async based on job queue, it’s not that hard to provide a Interface that can be replace by something else, we can even provide a Flowpack package to support beanstalk by example.

We even have a nice DTO for those Presets TYPO3\Media\Domain\Model\ThumbnailConfiguration

Thanks for the feedback. +1 for the suggestion regarding the more flexible presets.

Also like the idea with the preset identifier, although it’s a bit tricky since it would have to be part of the thumbnail and not the configuration to avoid creating duplicates and keep backwards compatibility I think.

Regarding job queues, that could be a next step but not something I’ll be interested in taking care of since there’s not really any standard way to do that with Flow and something very few projects use.

preset can be added to the ThumbnailConfiguration as a transient property, so skipped from serialization. So no duplicate between a profile based thumbnail and the same based on parameters. We just need some internal tags support for the thumbnail entity.

But the idea was to be able to delete all thumbnails of a certain preset, for that you’d need to query the database for it. The thumbnail implementation is based on a configuration hash and the thumbnail configuration isn’t a entity.

Thanks everyone for all your great feedback. Seems to me we can agree with the direction proposed in RFC: Asynchronous thumbnail generation in media browser with the format thumbnail configuration suggested by @dfeyer in RFC: Asynchronous thumbnail generation in media browser. Without the queue and adding the thumbnail preset to the configuration. Both can be done separately afterwards as I see it.

I’d like to start implementing this, so in case anyone has any further to add please mention it. Thanks.

1 Like

I like the presets idea but would basically explore both ways as already suggested and at least check if we could easily create async thumbnails. IMHO it should be possible model-wise and sure creating 30 thumbs might be resource intensive I think it could work. Only trouble I see with this is high-traffic sites where multiple requests could be made to the same async thumb, so some kind of locking would be needed again…

Job queues would definitely help here, I am again and again contemplating the idea of a deamon process that you can start and with which you can centrally queue thumb generation and stuff like that.

I agree that the default implementation should not force the use a queue, but don’t disallow the queue based solution, so some kind of Service based on an Interface, if possible, should be replace to use a queue, especially for the thumbnail generation during upload.

I know that currently in the PHP world we are not too much in favor of queue based system, but when you look at the Ruby ecosystem, tools like Gitlab will be way much slower without their intensive use of queue. At some point we need to push more async support in Flow. Out of scope for this RFC, I agree.

Thanks for the feedback. Not sure exactly what you mean with, when would creating 30 thumbs be resource intensive? Quite some different places that could occur depending on approaches mentioned in the thread.

Regarding the lock, I’d really prefer to avoid going down that road as it’s an edge case that few would have and very seldom as I see it. If two browsers request the same thumbnail simultaneously, they will both generate it unless the database is updated when they check if the thumbnail has a resource. Not much harm done in that, unless you have lots of requests at the same time. In such cases one should use some sort of generation strategy (cronjob, queue) and it shouldn’t be a problem.

Could you explain how a solution would disallow the queue based solution? And maybe an example of what you’re thinking when kind of service based on a interface? What kind of service and interface do you have in mind?

Asking since a queue could just use a signal to create a job without having to interfere with anything else. No need for interfaces and services. E.g. the async thumbnail could be created using a signal and then the command controller just looks for ungenerated thumbnails when run. So either you listen to a signal or you run the command controller using crontab.

I’m also fine with the Signal based solution.

Pull request ready https://github.com/neos/neos-development-collection/pull/194 – feedback much appreciated :smile: Pretty satisfied with the result.