As far as I understand it’s already possible to index the entities of the media-package via Flowpack.ElasticSearch. So the metadata of files can be indexed, right?
Is there also a solution ready to index the content of the referenced file-resources (like pdf, word, …)?
Thank you. I am a ware of that and this is the same server-component that our customer uses right now (TYPO3/SolR/tika-Solution running). We are offering a Neos solution for the relaunch and I need to estimate the effort to provide the same functionality.
I assume the way to go is to extend the code that indexes the media-asset-model to call tika and add the extracted file-contents to the index, too. Hope I am right with that?
Have you done more research on this topic in the meantime? This is a very common requirement and I think an effort to have one community package to extend the Flowpack Search stuff would be great.
@lorenzulrich, I did not do any further research because the feature was not booked by any of our customers yet. But meanwhile I already offered it in like 3 projects.
Once the first customer gives a go, I will go ahead with this.
@daniellienert, thank you for that hint. I saw that already, it did not work at our first try but maybe we need to try gain. I cannot remember what went wrong. We tried to test the feature in a timeboxed session where we want to find out “what is possible within like 2-3 hours”. We ended up with a working search (searches over nodes) but without assets-content-search, suggestion (autocomplete in search-field), facets (think this is called aggregations in ES).
As said, just waiting for the first customer to buy it. Or more time to try it out (which will not happen this year).