RFC: Content Repository Building Blocks

Motivation
With the Content Repository rewrite and the first related sprint passed, it’s time to give the building blocks more than just a little thought. Although much of the discussion will take place elsewhere, I think it’s a good idea to keep stuff organized somewhere before it joins the code base.

Changelog
2017-09-13 Initial Reference
2017-09-24 Integration of CR code sprint 09-2017 results
2017-02-08 Update after implementation

Contexts
The building blocks are divided into three contexts: Dimension with focus on dimensions, their values and fallback mechanisms within each dimension, DimensionSpace with focus on combinations of dimension values and fallback mechanisms between these combinations and finally Content with focus on nodes, their relations and how to find them.

Dimension

Content Dimension Value
Scope: Value Object

A value for distinguishing content in one specific dimension. Might be an ISO 3166 country code, an ISO 639 language code, a policy role identifier or anything else of meaning. Currently not modeled as a reusable value object because its purpose is too generic.
A value also carries information about specialization depth, constraints regarding combination with values of other dimensions as well as arbitrary configuration, e.g. for UI or detection from an HTTP request.

Note: Content Dimension Values completely replace the concept of Presets in versions <= 3. Since Presets are no longer allowed to define variation rules or use multiple values in any other way, values and presets are merged.

Content Dimension Value Specialization Depth
Scope: Value Object

Encapsulates a positive integer to describe the depth of a Content Dimension Value in its Dimension’s varation tree.

Content Dimension Constraints
Scope: Value Object

Encapsulates information about with which values of another dimension a Contant Dimension Value may be combined.

Content Dimension
Scope: Repository

Content Dimensions are groups of Content Dimension Values that belong together semantically. Examples would be language, market or target audience. Not only are the values registered there, the Content Dimension also knows about variation mechanisms. Consider the following variation graph for the dimension language:


There is a fallback defined from Swiss German (gsw) to German (de), while English (eng) is disconnected from this. We define three terms for this behavior:
Specialization
Connected variants further down the graph (against the direction of the edges) are called specializations. The number of edges traversed by this is called the specialization depth. In our example, Swiss German is a specialization of German with depth 1.
Generalization
Connected variants further up the graph (along the direction of the edges) are called generalization. The number of edges traversed by this is called the generalization depth. In our example, German is a generalization of Swiss German with depth 1.
Lateral shift?
This was originally declared a translation which is totally fine from a technical point of view but collides with the same term describing the variation of language in general. Since language is a very common use case for content dimensions, we should avoid that term. The provisional term for this is lateral shift.
Disconnected variants can be accessed via lateral shift. In our case that means you need a lateral shift to switch from English to any other variant or back.

Note: This replaces the variation part of the IntraDimensionalFallbackGraph

Content Dimension Identifier
Scope: Value Object

Identifies a content dimension

Content Dimension Value Variation Edge
Scope: Value Object

Connects two Content Dimension Values as specialization and generalization within a Content Dimension

Content Dimension Source
Scope: Repository

The repository for Content Dimensions. Initializes them from a given backend (currently: Configuration).

Note: This replaces the old ContentDimensionRepository, the old ContentDimensionPresetSource and the retrieval part of the IntraDimensionalFallbackGraph

Content Dimension Zookeeper
Scope: Repository

Gives information about all available combinations of content dimension values.

Note: This replaces the ContentDimensionCombinator

Dimension Space

Dimension Space
Scope: Repository

Defines the space generated by the different Content Dimension. Content Dimensions are interpreted as dimensions of that space and thus may define the axes of a diagram.

This is a conceptual model and currently not to be implemented. See Allowed Dimension Subspace for the actual implementation.

Dimension Space Point
Scope: Value Object

Defines a single point in the Dimension Space by using Dimension Values as coordinates. A typical Dimenson Space Point could look like {language: 'fra', market: 'CH'}.

Allowed Dimension Subspace
Scope: Repository

Not all dimension value combinations might be allowed, since some might be forbidden by Content Dimension Restrictions. We usually only care about Dimension Space Points that are not affected by those restrictions and define the Allowed Dimension Subspace as a subspace of the Dimension Space, holding all allowed Dimension Space Points.

The Allowed Dimension Subspace is populated using the Content Dimension Zookeeper.

ContentSubgraph
Scope: Value Object

A Content Subgraph is initialized with a set of dimension values. It is thereby identified by the corresponding Dimension Space Point and in addition can be weighed using its Dimension ValuesSpecialization Depths.

Content Subgraph Variation Weight
Scope: Value Object

The absolute weight of a Content Subgraph. Can be normalized to an integer value when given a normalization base.

Inter Dimensional Variation Graph
Scope: Repository

The Inter Dimensional Variation Graph is responsible of keeping track of all generalizations and specializations between Content Subgraphs including their priority, calculated as the difference between their normalized weights.

Content
The actual content part of the CR, designed for user interaction. This reference describes the query side, i.e. the given models are read-only. Write interactions are modeled as commands.

Node Interface
Scope: Entity

The content repository’s main entity. Holds data in a Property Collection, is related to other nodes, can be identified by a NodeIdentifier and grouped with its dimension space variants to an aggregate sharing a common NodeAggregateIdentifier . Nodes originate from a defined Dimension Space Point but may be visible in a larger Dimension Space Point Set holding multiple points due to fallback mechanisms defined in the Inter Dimensional Fallback Graph.

The NodeInterface will be a lot smaller than the old one as all write operations are moved to commands. Also some properties that mainly existed to support the legacy implementation like path or level might be removed or at least discouraged for use.

Node Aggregate
Scope: Aggregate

Node Aggregates currently are a purely technical concept without a concrete implementations. Node Aggregates group nodes with the same Aggregate Identifier . They describe a set of variants of the same entity, e.g. translations or similar variations of the same thing.

Hierarchy Relation
Scope: Entity

Hierarchy relations are currently a purely technical concept but may be subject to extensions. They currently describe which node is what node’s parent in which Content Stream, Dimension Space Point and order.

Property Collection
Scope: Value Object

Replaces the current property array to enable direct property access without initializing all properties at once. Helper functions like ${q(node).property('myProperty')} become at least partially obsolete because regardless of presentation component of choice (Fusion vs. Fluid) ($){node.properties.myProperty} can be called directly as long as the property name is no variable.

Scope Value Object is only true if there are no entity properties. Note hat references to other nodes may in the future be relations instead of properties.

Node Identifier
Scope: Value Object

Node identifiers point to a specific node. They define a stronger identity than aggregate identifiers as they can only be used once across the dimension space. They are oblivious to Content Streams however, meaning that there may very well be multiple nodes with the same identifier in different workspaces.

Node Identifiers are currently implemented as UUIDs.

Node Aggregate Identifier
Scope: Value Object

This replaces node identifiers in previous versions, which were just plain strings being validated all over the place. The replacement implementation is a value object validating itself.
Purpose of this concept is to be able to group variants of the same thing to a Node Aggregate and make it externally referenceable (e.g. a link in a text, an external system holding a reference to a node aggregate). The preferred way of accessing a node should be by node aggregate identifier in a subgraph (content stream identifier + dimension space point).
Currently, the Node Aggregate Identifiers are modeled as UUIDs, which I’d like to open to discussion (again), as legacy node identifiers do not have to be UUIDs either.
What should be kept in mind is what internal validation constraints exist. The legacy validation is against a deliberate string pattern. We only allow lowercase characters, numbers and dashes, which works fine for now, I’d move on to a new, even less strict pattern. For that we need to know what constraints can actually be reasoned.
Therefore it has to be defined what identifier are used for. Currently as I see it, there is persistence, as e.g. caching would rather use the Node Identifier. Persistence recommends a maximum length of 255 characters to support indexing in your average MySQL database in default configuration.
Anything I missed?

Content Stream
Scope: Entity

Content streams define another level of variation on top of content dimensions. They describe the workflows of different users changing content independently. They are currently more of a concept than an actual implementation; See more at
https://sandstorm.de/de/blog/post/event-sourced-content-repository.html

Content Stream Identifier
Scope: Value Object

Uniquely defines a content stream, required to distinguish Hierarchy Relations connecting the same two nodes in the same Dimension Space Point.

Content Subgraph Interface
Scope: Repository

The main (read-only, since projection based) repository to find nodes, resolve relations etc., Subgraphs are identified by a ContentStreamIdentifier and a DimensionSpacePoint. They are aware of all fallback mechanisms that usually are already be projected into them. Subgraphs are able to find parents, children, siblings, arbitrary nodes by identifier. They can also be easily traversed.

Note: This is the read-side content subgraph and not to be mistaken for its write-side inter-dimensional cousin.
Together with query filters, Content Subgraphs are designed to completely replace the old Content Context

Content Query Filter? (TBD)
Scope: Value Object

Takes over the filter part of the old Content Context by defining a reference date, reference roles etc.

Content Graph Interface
Scope: Repository

A collection of all available subgraphs in the system. Has little to no own retrieval methods although it is aware of all nodes and hierarchy relations.

2 Likes

[] Merged to post 1

Hi!

After the latest sprint discussions we should bring this up to date. I think we tackled a lot of these concepts and phrased new names for them.

done

Should read NodeAggregateIdentifier.

… and make it externally referenceable (e.g. a link in a text, an external system holding a reference to a node aggregate). The preferred way of accessing a node should be by node aggregate identifier in a subgraph (content stream identifier + dimension space point).

Updated this to adjust to the current implementation and @christopher s notes