RFC: Media Package Search and Meta data

Hi All,

we have a lot of pressure on this topic right now and would like to collecting ideas. The focus it to find a starting point to extend the current package and release milestone in small portions. There a lot of great ideas but I fear that they are to big to tackle problems in the near future, like having fields for copyright, publisher and so on.
This is an important topic to avoid legal problems.

#Search
IMO search is very limited and not very useful at the moment because you can just search for a exact phrase. e.g. (Vienna Church) and just in the title field. Tags are useful but if have have many of them it could get ugly.

Ideas:

  • Explode the phrase to make it possible to search for “Vienna” and “Church”. Like a search engine does
  • Search in caption field as well
  • Utilize search servers (e.g. elastic search)
  • Enhance the search field to support more complex search queries (example: http://documentcloud.github.io/visualsearch/)

#Meta Data
For more accurate search results storing meta data (e.g. iptc) would help a lot.
Ideas:

  • Store meta data in a json object like node properties in the CR Package.
  • Expose them in the ui somehow to filter for them.

are you looking for general feedback about the direction you want to take? Or do you need specific help with the planning or implementation?

I think in general, in makes a lot of sense to improve search, like you wrote. And we’ll also need a way to store and maintain asset meta data (there also has been a bit of thinking about this, but I can’t remember right now who tackled it).

Hi Robert,

I take whatever I get :-). The thing is that I have to start working on it quite soon and I don’t want to rush in a dead end street. I rather build something that everybody benefits from. My Neos learning curve is still pretty high and I would need some help/guidance to start working on it. So I’m happy with everything you or the community can provide.

#Search
what’s your opinion on my ideas? I think achieving them should be not that difficult.

#Meta
Extracting the data from the image is easy but the biggest question (from my point of view) is that how meta data is stored. I guess a JSON object is not a bad idea. It’s structure could easily be transformed in something else when the package evolves (media central, taxonomies and so on).

Right now, I see editing meta data not as the main priority. This needs for sure more thinking, because it would change the sha1 hash of the resource.

I’ll try to attract more people to this topic. Maybe the one/team working on this will join the discussion. Once there is a plan/direction, achieving some goals should not be that hard, don’t you think?

Hi David

Great to hear that you’re willing to work on improving this area. I recently did a lot of work on the usability part and created an epic for collecting ideas for improvement in Jira (https://jira.typo3.org/browse/NEOS-1020). Would make sense to add the issues once agreed upon there.

Regarding search, there’s currently two places where search is used which is when linking and the recently added search field (2.0 only) in the media browser. I assume you’re referring to improving both of them. And now you can at least filter by type and sort by name or last modified date.

Searching the caption field, would probably be a one line change and easy to do.

Also currently collection isn’t searchable, that could make sense too like with tags.

More advanced search could possibly also be done, however there isn’t anything like that elsewhere currently so it would be a custom implementation. For long we’ve wanted a proper search solution for searching nodes, however that’s still on the todo list. The challenge here is if it’s a good idea to implement a specific solution only for media searching. We’ve previously talked about integrating elastic search as a search engine with a fallback on the simple search solution, but that is quite some work to do though.
A simple solution would be to support simple stuff like +word (required), word (optional), -word (disallowed), “word sentence”, * for wildcard, () for grouping, OR e.g. Additionally specific fields could be searchable like filename:image.jpg (filesize, last modified, type, title, caption, tag, collection could also work here). I tried looking for a library with some standard for parsing a search string, but didn’t find anything :confused: Would probably be great if there were instead of another custom implementation.

Using something like visual search could work, however things get a little complicated when searching nodes as well like the link fields do.

Regarding meta data, that would be great to have. I’d suggest to only go for reading the data on import and not being able to write it back to the files (IPTC). Using a JSON field for storing them would be pretty flexible and probably work fine. However we have previously discussed using nodes for assets instead, which would allow for a lot of flexibility. Assets could have arbitrary properties, be nested, tags and collections could be references and nested, translation / variants of titles would be supported. Additionally a common node search solution would be usable instead of a custom one for assets. This is however a large task and we might want to do something intermediary before we do that.

Last, one thing you didn’t mention which I think is important and fairly easy to improve is using the title/caption from the assets when using the images in content. Currently the information isn’t used for anything out of the box besides search.

To answer your question, no it shouldn’t be too difficult, however it greatly depends on the level of ambition and where the tradeoffs are made.

Hi Aske,

thanks for your reply. I’m aware of that there is a bigger picture with a lot of work and decisions to make. I also would love to work on them.

I think we agree that there are some improvements that can be done quite fast/easily. If agreed I would break the discussion down to the following tasks and add them to your story:

  • Extend search to support capture field and meta data
  • Integrate a search string parser
  • basic: explode string and use each item as a phrase
  • advanced: support more complex phrases (optional, mandatory, wildcard, file extension, meta field)
  • Extract file meta data and store it as a json object
  • property name: metadata
  • enhance importResourcesCommand to import file meta data
  • Utilize title, caption and meta data for improved image rendering

Did I miss something?

Yeah seems like a good summary. One thing though, you only mention importing meta data using the command, but I’d argue it should also happen when uploading images, no? Also you mean caption instead of capture right?

both YES. Sometimes autocorrection and my dyslexia are beating me like hell :smile:

Not directly related, but as this change require some change in the Media browser: