RFC: Improve NodeSearchServiceInterface

TL;DR

Currently the NodeSearchServiceInterface allow to search by term (sql like in all properties), and limit the search by nodeType and context. It’s really limited and need to be extended in a flexible way to allow advanced search like in https://github.com/neos/neos-development-collection/pull/5

The current implementation do not enforce a limit, some query can return a huge amount of node and can hit the memory or time limit.

Goal

  • Allow more flexible search implementation (by specific property, multiple properties, …)

Technical how

  • Based on JSON-API (http://jsonapi.org/format/#fetching-filtering), use a filter query parameter, with this implementation:
  • If filter is a string: same behavior as the current term query parameter (search in all properties, like statement)
  • If filter is an array: the key is the property name, the value the search query for this property. The value can be an array (this replace the current $searchNodeTypes, by filter[_nodeType][]=FirstNodeType&filter[_nodeType][]=SecondNodeType, by default use a OR operator
  • If filter is an array, internal property can queried by prefixing the property name by _
  • Search query is by default a like, ex. ```%term%``, see Query Vocabulary bellow
  • Add support to search by sub property in the JSON, see Query Vocabulary bellow
  • Add a default limit (ex: 20), based on http://jsonapi.org/format/#fetching-pagination (support for both page[number] and page[size] + page[offset] and page[limit])
  • Add support for sorting, based on http://jsonapi.org/format/#fetching-sorting
  • Keep the context parameter

Sorting

Currently only for some internal property, like modification time, can be improved in the futur.

Sparse Fieldsets

See: http://jsonapi.org/format/#fetching-sparse-fieldsets

Not supported currently, but reserved for futur implementation

Query Vocabulary

As JSON-API doesn’t specify a query voc. we have some freedom here :wink:

Required

  • Full text: filter[property]=term
  • Begin with: filter[property]=%term
  • End with: filter[property]=term%
  • Equal (exact match): filter[property]="term"

All query must be case insensitive

Nice to have

Maybe require custom implementation for MySQL and PostgreSQL to use JSON capabilities of the DB.

  • Sub property query: filter[property.subproperty]=%term

This can be added in the next iteration without changing the interface.

New Interface

interface NodeSearchServiceInterface
{
    /**
     * @param string|array $filter
     * @param Context $context
     * @param integer $limit
     * @param integer $offset
     * @param string $sort
     * @return array<NodeInterface>
     */
    public function findByProperties($filter, Context $context, $limit = 20, $offset = null, $sort = null);
}

Benefits

  • Simple and flexible interface
  • More powerful search query can be build without changing the interface
  • Avoid high memory /resource usage (limit)

Challenges

  • Did we need the AND operator for a single property (see the exemple with _nodeType)
  • The search vocabulary need to be common to all implementations, to be able to switch the implementation at any point
1 Like

I also have a pending pull request on github regarding the improvement of the NodeSearchInterface to search by values of a specific key: https://github.com/neos/neos-development-collection/pull/1

With your API, would it be possible to also select sub-properties in the nested JSON data?
Would it be an idea to define the query structure somehow like its defined for querying JSON values in MySQL https://dev.mysql.com/doc/refman/5.7/en/json.html?

I think about this RFC when I see your PR :wink:

For the query language, yes we can improve, but from my POV, if it’s possible in the CR or in the search implementation natively, filter[property.subproperty]=%term should search in the sub property, not sure if it’s doable in SQL, but we can do that by specific driver for MySQL and PostgreSQL.

Basically we should have a 3 implementations:

I update the RFC to be more JSON-API based (like reserving the include query parameter).

Now last decision:

  1. Did we move to JSON API for the JSON payload ? Basically we will move to JSON API at some point that can be a first move

@aertmann, @christianm, @sebastian, @christopher Really interested by your inputs.

In the new proposed interface you have removed $startingPoint and changed $context to a Node. Is that on purpose? We’re currently trying to figure out a way to improve the context handling in another thread, which would likely influence this part as well.

Aside from that I like the general direction.

However not sure if it makes sense to add the include parameter when the return data is an array of nodes. Where would you put the “include” parts?

Also we should define how the page parameter should behave, since it’s agnostic. Personally prefer using offset and limit instead of page numbers.

This is not really a question?

Generally I like the ideas, will have another look with a bit more time and add more comments, but
Odata has a fully defined filter syntax that can do anything you would like…

Hey @dfeyer,

to me, the idea sounds really nice – just two things I see as some open questions:

  1. I feel the query API is a little too restrictive, i.e. it does not give much room for “scaling up” and implementing additional query operators. As long as we know how we can add a more expressive query language lateron, I don’t mind this point. I think a good example for a verbose, but extensible query language is the ElasticSearch Query Language.
  2. How do you plan to actually implement the node-search based on property? Have an extra index table, do some LIKE string magic, use JSON type, …? (just out of curiosity…)

All the best,
Sebastian

I think we don’t need to full power of ES Query Language :wink: but if @christianm can try to resume to odata filter language, that can be nice. My idea if that the first implementation, should be slim and OK for 90% of use case, but extensible.

Check the PR from @daniellienert (FEATURE: Support search by property & exact value in NodeDataRepository by daniellienert · Pull Request #1 · neos/neos-development-collection · GitHub), can be a basic for the default implementation.

For a next iteration, I’m in favor of starting an other RFC to discuss to JSON support in the DB storage, can be pretty nice to have this native JSON support. I don’t like the extra index table, sounds like reimplementing ES inside Neos … If we choose to have a dedicated NodeDataRepository per DB storage, we need to improve our skills to not break PostgreSQL too often. For the first iteration, the PR from @daniellienert should be enough. Let’s keep this RFC simple but extensible.

Forget this “question”, it was a bit stupid, the interface return an array of NodeInterface and not a JSON feed :wink:

Here also a stupid parameter, let’s remove include, it make non sense in the context. I update the RFC in a few minute.

:+1: For offset / limit in this context

I don’t see $startingPoint in the current implementation ? And $context need to stay a Context.

public function findByProperties($term, array $searchNodeTypes, Context $context);

Open Questions

From my POV, we need to found solutions for the given question, before writting an Project Proposal:

  • Filter Query Language
  • Context handling
  • Future support of Native JSON DB storage (just to have a vision on own we want to implement this in the future, and maybe start an other RFC to track the progress in this area)

If you see other open questions, feel free to comment