RFC: Media Package Search and Meta data

aertmann · June 21, 2015, 4:16pm

Hi David

Great to hear that you’re willing to work on improving this area. I recently did a lot of work on the usability part and created an epic for collecting ideas for improvement in Jira (https://jira.typo3.org/browse/NEOS-1020). Would make sense to add the issues once agreed upon there.

Regarding search, there’s currently two places where search is used which is when linking and the recently added search field (2.0 only) in the media browser. I assume you’re referring to improving both of them. And now you can at least filter by type and sort by name or last modified date.

Searching the caption field, would probably be a one line change and easy to do.

Also currently collection isn’t searchable, that could make sense too like with tags.

More advanced search could possibly also be done, however there isn’t anything like that elsewhere currently so it would be a custom implementation. For long we’ve wanted a proper search solution for searching nodes, however that’s still on the todo list. The challenge here is if it’s a good idea to implement a specific solution only for media searching. We’ve previously talked about integrating elastic search as a search engine with a fallback on the simple search solution, but that is quite some work to do though.
A simple solution would be to support simple stuff like +word (required), word (optional), -word (disallowed), “word sentence”, * for wildcard, () for grouping, OR e.g. Additionally specific fields could be searchable like filename:image.jpg (filesize, last modified, type, title, caption, tag, collection could also work here). I tried looking for a library with some standard for parsing a search string, but didn’t find anything Would probably be great if there were instead of another custom implementation.

Using something like visual search could work, however things get a little complicated when searching nodes as well like the link fields do.

Regarding meta data, that would be great to have. I’d suggest to only go for reading the data on import and not being able to write it back to the files (IPTC). Using a JSON field for storing them would be pretty flexible and probably work fine. However we have previously discussed using nodes for assets instead, which would allow for a lot of flexibility. Assets could have arbitrary properties, be nested, tags and collections could be references and nested, translation / variants of titles would be supported. Additionally a common node search solution would be usable instead of a custom one for assets. This is however a large task and we might want to do something intermediary before we do that.

Last, one thing you didn’t mention which I think is important and fairly easy to improve is using the title/caption from the assets when using the images in content. Currently the information isn’t used for anything out of the box besides search.

To answer your question, no it shouldn’t be too difficult, however it greatly depends on the level of ambition and where the tradeoffs are made.