RFC: The Arboretum Manifesto - The CR as a content graph

Nezaniel · November 28, 2016, 1:09pm

Motivation

This document reflects the team members’ shared modeling effort for the Content Repository. The model described serve as a blueprint for implementation as well as incremental documentation for interested developers.

Target audience

Neos core developers

Changelog

2016-11-28 Initial reference
2016-11-29 Minor term and formatting cleanup
2016-12-25 Introduction of the fallback graph

The Fallback Graph

The content repository allows integrators to define a set of dimensions with intra-dimensional fallbacks. A dimension configuration could look as follows:

contentDimensions:
  'market':
    presets:
      'world':
        values: ['world'] 
      'eu':
        values: ['eu', 'world']
      'de':
        values: ['de', 'eu', 'world']
  'language':
    presets:
      'en':
        values: ['en']
      'de':
        values: ['de']

defining two dimensions, market (priority 1, connected) and language (priority 2, disconnected).

In addition, the content repository provides workspaces (or editing sessions) that are relevant for fallback mechanisms and for that purpose can be treated like an additional dimension of minimal priority. For example, we use a live workspace and workspaces session1 and session2 falling back to live (priority 3, connected).

The intra-dimensional fallback graphs

Since the dimension values are not required to have a common fallback root, the fallback structure can in general be described as a possibly disconnected, directed graph per dimension:

The inter-dimensional fallback graph

Fallbacks occur not only within dimension boundaries but between content trees identified by a combination of dimension values. The above example allows for up to 3 (markets) * 2 (languages) * 3 (workspaces/editing sessions) = 18 different dimension combinations with individual content structures, resulting in 18 trees. The directed, acyclic inter-dimensional fallback graph shows which fallback rules apply between trees. It consists of the trees as nodes and fallback rules as edges.

example inter-dimensional fallback graph without defined fallbacks

Assigning fallback vectors

Each pair of trees (A,B) that are part of a fallback rule can be connected via an edge which has a vector assigned describing the grade of fallback per dimension. The grades represent the amount of edges traversed within the intra-dimensional fallback graph from A to B. If there is no directed path from A to B, the grade of fallback is undefined, leading to an invalid fallback vector, meaning B is no fallback tree of A.

In our example, world_en@session1 can fall back to world_en@live with the vector [0,0,1], meaning 0 edges traversed along the market root line, 0 edges traversed along the language root line and 1 edge traversed along the session root line. The other way around the connection from world_en@live to world_en@session1 cannot be represented by a fallback vector since there is no edge leading from live to session1 in the workspace graph.
A more complex fallback would be de_en@session1 to world_en@live with vector [2,0,1]. The connection from de_en@session1 to de_de@session2 again cannot be represented by a fallback vector since there is neither a directed path from en to de in the language graph nor from session1 to session2 in the workspace graph. These rules lead to the following unprioritized fallback graph:

Inter-dimensional fallback graph with fallback vectors

####Prioritizing fallback vectors####
The outgoing fallback vectors of a tree can be prioritized by minimizing the primary dimension’s fallback grade, then the secondary dimension’s and so on. For tree de_en@session1, the following fallback vectors are available:

2,0,0 to world_en@session1
1,0,0 to eu_en@session1
2,0,1 to world_en@live
1,0,1 to eu_en@live
0,0,1 to de_en@live

Sorting those vectors will lead to the following fallback priority:

de_en@live
eu_en@session1
eu_en@live
world_en@session1
world_en@live

From another perspective, eu_en@live has incoming fallback vectors from eu_en@session1, eu_en@sesion2, de_en@live, de_en@session1 and de_en@session2, making them variants of it.

Side note:
Fallback priorities mainly provide an overview for integrators what to expect when a node is not translated to a variant tree. The implementation itself only cares about the primary fallback when creating a new variant tree and for all variants of a tree when performing structural operations like creating or moving a node.

The Content Graph

The content repository is a directed graph with a single root node. The graph aggregates trees in a way that nodes can be shared among trees, enabling fallback mechanisms. This is achieved by connecting nodes via multiple edges that belong to one tree each.

Complete example graph, fallback tree components in solid blue, variant tree components in dashed black

Fallback tree within the graph

Variant tree within the graph

Building Blocks

Tree

A tree is a part of the content graph, starting at the graph root. It includes all edges assigned to the tree as well as all nodes connected to those edges, regardless which tree the nodes originally were created in.
A tree is identified by a hash of arbitrary identity components.

Side note:
The most prominent examples for identity components would be

workspace name

dimension values

, but identity components are configurable beyond those concepts. In fact trees can very well exist with all their functionalities without these concepts. Thus, those are not part of this model.
On the other hand, more domain-specific components could be introduced, enabling e.g. Neos to define site as another one to make dimensions site-specific without introducing own models.
In summary, trees replace the concept of content contexts.

A tree knows of its fallback as well as its variant trees to allow propagating edge modifications.

Trees must also prevent their part of the path from becoming disconnected or cyclical.

Node

A node is a graph element that aggregates properties based on its type. A node has an identifier that is unique to its tree but shared with its variants in other trees. A node is aware of its original tree, enabling fallback rules to create the necessary edges.
Additionally, per node there are

exactly one incoming edge per tree it is included in
an arbitrary amounts of outgoing edges per tree

Node Type

A node type is a semantic discriminator for nodes as well as an aggregation rule for both nodes and the graph. It defines

Which properties a node aggregates
Which child nodes and outgoing edges will be auto-generated together with its parent
Which nodes can be connected via edges in the graph

Edge

An edge is a graph element connecting nodes within a given tree. As they are the means of enacting the fallback rules, there has to be one edge per tree to connect two nodes. An edge is definied by

Its parent node
Its child node
The tree it connects the nodes for
A name unique among its siblings in a tree
A position to arrange it among its siblings in a tree

Side note:
Neos can very well use edge names as uriPathSegments for document nodes and thus use the tree’s logic to enforce their uniqueness on their level in the path

Property

A property is a simple, arbitrary-typed key: value pair. There is no fallback mechanism planned on property level, otherwise properties would be modeled as nodes and their names as edges.

Events

The following events may occur within the graph:
(list to be completed)

Created a tree

When creating a new tree, it is registered in the graph. Also, if the tree has a fallback defined, it is registered there as a variant tree. Most significantly though, edges for the new tree are created alongside the fallback tree as copies of its edges.

Created a fallback tree

Created a node in a tree

A node is created as a child of another node at a specific position, which is stored in the newly created edge. This edge is also copied to all variant trees the parent node is connected to.

Created nodes in a fallback tree

Created a node in a variant tree

Created a node variant

A node is copied from fallback to variant tree. The fallback node’s incoming edge in the variant tree is assigned to the variant.

Created a node variant

Set a node property

The changed property is set in the node, no fallback mechanisms apply

Moved a node in a tree

The incoming edge for that tree is assigned a new parent. Its name has to be validated by the new parent, it may also change its position. The same applies to the incoming edges of variant trees with the same parent.

Moved node in the fallback tree (middle) / the variant tree (bottom)

Tbd: will this also affect edges connecting variants with the same identifier? E.g. will affect moving node 4 node 4’ (see full graph)?

Merged a node to its tree’s fallback (“publish” the node)

Case 1: If a fallback node exists, its outgoing edges are merged into the published node and its incoming edge is linked to the published node. The published node is assigned to the fallback tree and the original node is removed.

Published, case 1
If no fallback node exists, the fallback tree will be assigned to the published node making it a new fallback node.
Case 2: If the published node’s parent was in the fallback tree, the edge connecting them will be copied to the fallback tree

Published, case 2
Case 3: If the published node’s parent was in the variant tree, a new edge is created in the fallback tree to the new node from the parent’s fallback node

Published, case 3

Merged a node to one of its tree’s variant trees (“discard” the node)

Same as publishing, save the fallback node isn’t assigned to another tree

Removed a node in a fallback tree

The node is deregistered from its parents and the connecting edges are removed. The effect cascades down the fallback tree, removing all the nodes and edges affected.

Removed node in a fallback tree

Tbd: Does this also affect variant nodes?

Removed a node in a variant tree

If the node is part of the fallback tree, its outgoing edge is removed.

Removed a node from the variant tree, case 1

Tbd: Does this cascade down the variant tree?

If the node is part of the variant tree, see fallback tree

Removed a node from the variant tree, case 2

Side note:
Actually removing a node from a variant tree is a new feature provided by the graph model

Resulting aggregates

Graph

The obvious aggregate root is the graph, aggregating trees which aggregate edges and nodes, which aggregate properties in return.
Technically all events can be stored in the graph’s event stream, but replaying this for each and every event will be very expensive, as amount of events can be expected to be in the millions at least. So smaller aggregates are required.

Tree

The tree aggregate is required to keep track of all nodes being created or removed to ensure node identifiers are unique within the tree.
It also has to take care that edge operations do not result in detached or cyclical segments that would destroy the tree structure.

Node

Node aggregates have to keep track of their property operations. At a structural level, they have to keep track of their connecting edges to enforce there is only one incoming edge per tree, and that outgoing edge names are unique per tree.

sebastian · November 28, 2016, 1:40pm

Hey Bernhard,

quite some tough stuff to digest – I am not sure I fully grasp the concept yet. For me it would help if we’d use more practical terms like “create new node at position X” or “publish a workspace” for describing the behavior, as for me it is quite hard to see whether the list above is exhaustive and is enough to model the desired CR properties of us.

Some detailed questions follow below; but I think we should schedule a hangout to discuss the concept further:

I am not sure what an “aggregation rule” is

It’s child nodes, right?

Actually, we need to implement the following:

When you move a Document node, it is moved across all variants.
When you move a Content node, it is NOT moved across all variants. It is currently not so well defined what happens when you move a content node in one dimension to a totally other document node parent.

All the best,
Sebastian
PS: I’m really curious where this concept will lead to

Nezaniel · November 29, 2016, 12:09am

Well, I guess I should add a transition section for mapping our current models and concepts to the new ones, that should clear a lot of stuff up. In general I am confident the model will be able to provide all necessary features, as graphs are a very powerful tool. The main task will be rather to restrict it to our use cases than to enhance it.

Concerning your questions:

That’s a term I came up with, perhaps there is a better one. Node types act as rules regarding aggregation in that they define which properties a node may aggregate and which child nodes may be linked via edges. In fact, I’m not 100% sure if the latter constraints may rather end up in the edges than in the nodes. We’ll see.

No, an edge connects a single parent node to a single child node. Edges may connect nodes from different trees, providing a real fallback mechanism in the data structure itself and thus are the main difference to the old model. There edges were merged into the nodes, what ended up in nodes having path names, sorting indexes etc. and imho severe impediments to the tree structure.

That could be easily arranged by the respective command handlers (ES speaking). The graph structure itself allows for arbitrary move operations in general.

So am I, hopefully to an extremely well performing aggregate that can easily be event sourced and projected to anything we like via tree traversal

christopher · March 29, 2017, 9:34am

Hi there,

some notes while reading again through this:

I don’t quite understand the concept of a variant tree here, does it mean all the ancestor nodes from the inter-dimensional fallback tree?. But we need to support the idea of the targetDimension values, which means that writing operations can propagate other dimension values (having a connection in the inter-dimensional fallback graph) than the current reading dimension values. This makes sense from an editors point of view, if for example you want to explicitly edit the world market content (and not want to create new variants when updating existing nodes) but have to work in a specific dimension combination like eu_en because the world dimension might not have a preset itself.

Well, I think that could be something we want to have in the future. Sometimes it’s very interesting to just “translate” a single node property with text and keep images and other language invariant properties synchronized. But this means we have to include this concept in the UI (like we have to improve for variants on the node level).

Yes! It just depends on the setting aggregate: TRUE in the NodeType configuration. While working on the dimension concept, we identified that moving a node of an aggregated type (e.g. a Document node) should always be in-sync with variants. While an unaggregated type (e.g. Content) could have independent positions in the tree for variants (e.g. have the “same” text element in different grid positions for different variants.

Nezaniel · March 29, 2017, 10:33am

christopher:

Nezaniel:

A tree knows of its fallback as well as its variant trees to allow propagating edge modifications.

I don’t quite understand the concept of a variant tree here, does it mean all the ancestor nodes from the inter-dimensional fallback tree?. But we need to support the idea of the targetDimension values, which means that writing operations can propagate other dimension values (having a connection in the inter-dimensional fallback graph) than the current reading dimension values. This makes sense from an editors point of view, if for example you want to explicitly edit the world market content (and not want to create new variants when updating existing nodes) but have to work in a specific dimension combination like eu_en because the world dimension might not have a preset itself.

Variant trees are a tree’s descendants in the inter dimensional fallback graph, while the ancestors are called fallbacks.
As for target dimension values, if you want to change something in a different tree than the one you are currently looking at, well… you should be able to do so. They are all there available and retrievable from the Inter-DFG (we need some sensible abbreviations for those…)

I agree, this should be discussed and rather soonish.
I’ve prepared an alternative graph model for this, that also comes with property value search capabilities and native NodeInterface adaptation without the need of a content context or subgraph. The drawbacks are that property retrieval requires an additional join across property edges and that there are lots of those. Lots as in |hierarchy edges| * |weighed average of properties per node type|

I’d like to move that from node type config to dimension configuration, because of a recurring customer requirement. Has to be thought over, though.