RFC: Optimized Validation of Entities

Introduction

At the moment, validation has several problems:

  • Validation is recursive and does never stop, loading all related Entities
  • Unmodified Entities loaded from DB are validated

Additionally, most of the users of Flow do not design Aggregates in a manner that would satisfy the rules of Aggregate design in DDD. This makes it impossible to only validate Aggregates. Instead, validation should take Aggregate Boundaries into account and also be flexible enough to expand this boundary over more than one Aggregate.

Furthermore, validation should try to avoid loading lazy relations. This can be achieved by stopping at a given relation, if the related Entity is not already loaded and is considered an Aggregate Root by the system.

Finally, a way must be found to identify Entities that have been

a) loaded from persistence and
b) are unmodified

and skip validation there.

Validation Boundaries

Figure 1 visualizes the Validation Boundaries when taking Aggregate Boundaries into account. There are three cases displayed.

In case one, the first Aggregate holds a reference to another Aggregate. Given the relation is not yet loaded and is not annotated with @CascadeValidation the validation would stop after validating the first Aggregate.

Case two describes what happens if an Entity is passed via action argument. In this case, validation can only validate the Entity out of context of any Aggregate. Any related Aggregates would be handled just like in case one.

In the final case, the relation to the second Aggregate has been annotated by @CascadeValidation and thus the Validation Boundary covers both Aggregates. This will result in both Aggregates being validated at all times. If the second Aggregate is not yet loaded, it will be loaded, then validated.

Figure 1: 3 cases of Validation Boundaries

Implementation

The validation tree itself is built for classes, not instances. This makes it impossible to fully build the validation tree before the actual validation subject has been given. To still being able to recognise Validation Boundaries, I suggest adding a new validator, the AggregateBoundaryValidator, that is automatically inserted at each relation to other Aggregate Roots. Just like the CollectionValidator, it would then either build the remaining validation tree if validation is required (e. g. the Aggregate is loaded), or behave like the RawValidator (no-op).

The validation tree that is initially created for a given class would then be created according to the Aggregate Boundary (see Figure 1, case 1).

Missing

  • A concept for detecting unchanged Entities that have been loaded from persistence is still missing. Alexander Berl might have an idea here.
3 Likes

This is also related https://jira.neos.io/browse/FLOW-351

1 Like

Great to see some movement here :slight_smile:

While in theory, I really think this is a good idea, I just came up with a more pragmatic way of optimising validation: https://review.typo3.org/#/c/41971

This is basically a combination of my Validation Boundary and the idea to not validate unmodified objects.

While your idea for the doctrine proxy validation is very neat and would probably already handle the majority of use cases, it is still not capturing some corner cases as well as when relations are not lazy loaded (for any application specific reason).
So the change is an easy first step, but the full concept of only validating up to given “concistency boundaries” (I still like the term) is not obselete thereby.

Regarding the detection of entity changes during property mapping, see my change here:
https://review.typo3.org/#/c/36731/
The logic for detecting if an entity was changed after reconsitution from persistence is here:
https://review.typo3.org/#/c/36731/2/Classes/TYPO3/Flow/Property/TypeConverter/PersistentObjectConverter.php

All that is missing, is a proper storage for the information of which object instances have changed. First idea was to use the PropertyMappingConfiguration for this, but that involves some non-trivial issues, because of the way the PropertyMappingConfiguration tree is built internally (wildcards, etc. - see my last comment on the change).
My preference currently is a dedicated ValidationContext, which is filled inside the property mapper, though that creates a coupling between the two.

1 Like

Would someone (maybe @aberl) takecare of moving those changes from Gerrit to GitHub, if that still makes sense?

Yes, I’m gonna take care.

Awesome. Just a reminder: the Flow/Neos stuff might be gone from review.typo3.org any time after December 31st…

Ok, I got those changes locally, will push them to github when I find time to clean them up/rebase, but they definitely won’t be lost.

My big attempt at “optimal” validation is now here:
https://github.com/neos/flow-development-collection/pull/335

Also, a simple alternative solution considering Aggregate Validation Boundaries is here:
https://github.com/neos/flow-development-collection/pull/334

Both are worth being discussed, the latter could theoretically land relatively soon, as it is
less breaky and the implementation is quite simple, but adds a dependency on doctrine proxy.

Ok, I think we face one main issue here, that lies actually somewhere else:

The aproach to only validate objects that were submitted with changed properties is theoretically perfect, but it runs under the assumption that validation does not depend on other unchanged objects. This is breaking of course, since this assumption is not always true.

However, thinking about it, the only type of validations that depend on multiple objects are business rules (any counter-examples?). OTOH simple user input validation (stringlength, valid e-mail, etc.) would fall save under the above assumption.
So the actual issue we face is, that Flow currently does not separate user input validation and enforcing business rules. The only thing that comes close is writing individual model validators, but it’s not easy to distinguish the two types of validators in order to skip the one type depending on context.

From a pure DDD viewpoint, the latter should actually happen inside the aggregates, ie. the aggregate is responsible for asserting it’s invariants. Now Flow currently doesn’t provide any nice flow (no pun intended) or best practice for doing such validations and notifying the user on the outcome, e.g. return validation messages (It could maybe be solved with Views.yaml configuration for a specific DomainInvariantException class).

So maybe, this should be tackled first and then we can optimize the user input validation as suggested.
Any thoughts? @bwaidelich @robert @sebastian @christianm @kdambekalns (sorry for the roundhouse-kick mention, but IMO this is a rather conceptual core-decision)

Ok, after @christianm triggered some more thought process in me, we could actually find a pretty nice solution to the validation performance problem once we have that separation in place:

For user input validation, we could avoid loading lazy relations as is currently done, because that just makes no sense at all any more - if it’s lazily loaded, it wasn’t touched by property mapping, so there is also no user input that needs validation. Taking extra-lazy relations into consideration, that could maybe even be dealt with nicely in huge relations. Doctrine will only load single entities inside an extra-lazy collection, if it is oneToMany and indexBy'd its (non-composite) identifier. A bit special use-case, but that can be documented and the extra-lazy behaviour will most likely improve further.

With this https://github.com/neos/flow-development-collection/pull/335 will not be needed any more, so unless anyone of you objects, I’d like to start a project for the Validation Separation next week.

Sweet! :slight_smile: