RFC: Adding additional properties to media assets

theilm · October 24, 2015, 1:14pm

At the moment, the possibilities to configure assets in Neos are limited. Depending on the project, it can be necessary to store copyright information, author information or accessibility status for assets. While status information (like “is a accessible document” or “has to be reviewed”) could be handled via tagging, this is not possible for string properties.

As discussed in RFC: Media Package Search and Meta data, we could use nodes to store asset data, but this is a bigger project. I could imagine to make additional properties (besides title / caption) configurable and store them as JSON object in the media model. We have the main drawback that the values of these properties do not know about dimensions. But it could be a good intermediate solution until we come up with a better node based concept.

We could use something similar to node type definitions (without the complete functionality, though):

TYPO3:
  Media:
    asset:
      properties:
        copyright:
          type: string
          ui:
            label: 'Copyright'
    document:
      properties:
        isAccessible:
          type: boolean
          ui:
            label: 'Document is accessible'

If you think it makes sense, I’ll try to start implementing a prototype. Thanks for your feedback!

dfeyer · October 26, 2015, 8:01am

Thanks for creating this RFC, I think we need to stick to a common standard, like https://iptc.org/

We can have a specific parts IPTC applied based on the document type, like for image: https://iptc.org/standards/photo-metadata/

The first step will be to provide a interface to edit those properties. Next step will be to support automatic extraction of metadata and a way to map those extracted data to the IPTC standard.

As discussed earlier, it could be really nice to use nodes to store those data and link the entity to the node, that can be a first step to move all asset related things to nodes .

dfeyer · October 26, 2015, 8:08am

Not directly related, but as this change require some change in the Media browser:

christopher · October 26, 2015, 8:28am

Using (one or more) referenced nodes for metadata would be the way to go for me. We need features like workspaces and dimensions (at least localization) and this is already solved with nodes. We also get an extensible schema with node types.

I would think that this is easier than implementing a custom attribute?

theilm · October 26, 2015, 9:16am

Thanks for the comments.

Concerning IPTC: I suggest to use configurable properties and provide a default configuration for IPTC properties (perhaps not all, but common ones). Probably there should be mapping options in configuration to allow for automatic import.

Probably just referencing a node is indeed quite easy. Although dimension handling might be a bit difficult (at least concerning UI). We could start with using nodes as meta data storage and improve the functionality afterwards (use the node tree for a tree based asset navigation for example).

But thinking of the tree based navigation: It could make sense to use the CR as the basis and reference the assets from the nodes. Adding a reference to a node in the asset model would introduce a dependency to TYPO3CR in the media package. That could be a problem for those who use it in a Flow application without Neos.

So I propose to follow the RFC of @dfeyer (merge media module in Neos) so we can use nodes as a basis and reference the media assets from the nodes without creating a dependency to TYPO3CR in the media package. Sounds quite complex, though, if you think about the implications (tags, asset collections…).

dfeyer · October 26, 2015, 9:59am

To avoid coupling the Media package with TYPO3CR, we should create a new package, like Media.MetaData, only this package should be coupled with the CR.

Basically we should not touch the Media models, but the Media.MetaData should provide a service, based on an interface, to provide the meta data for a given asset, so we have a soft relation between the entity and the metadata and moving to a full node based asset storage should be really easy. That offer great flexibility, like external storage for metadata, …

The package should provide an other service for searching based on Metadata, also based on a interface to allow custom implementation, like ElasticSearch based, …

Feature wise, we should start with:

Basic Metadata node storage based on IPTC Core (see https://www.iptc.org/std/photometadata/documentation/IPTC-CS5-FileInfo-UserGuide_6.pdf)
Add support for IPTC Photo Metadata Standard for Image Document
Merge the Media browser module in Neos (see RFC: Merging the Media module in Neos)
Build the Metadata UI
Add search based on Metadata (maybe with a connector for ElasticSearch)
Add support for localization (not now, but for future version)

And please, stick to the standard, make it extendable is a nice to have from my POV. First we need a solid standard support.

Customize the metadata schema

From my POV, the CoreMetaData node should support the full IPTC Core standard and the Media Browser should be configurable to show only the required fields (so hide not required fields for the current project)

The implementation of the PhotoMetadata should be done in a second node type, …

Required level of flexibility

Currently I’m not sure if the Media.MetaData service should complety hide the storage (don’t return a node, but a simple DTO or just an array). Digital Asset Management require flexibility, and I see some case, where we need to sync asset from an external digital asset management tools (alfresco, …) and in this case we can use the external tools to store the metadata. Not sure if it’s too complex …

What’s next ?

Can one of you create an EPIC on Jira ?

I’m really happy to help on this area.

daniellienert · October 27, 2015, 8:43pm

I’ve created an EPIC in Jira. https://jira.neos.io/browse/NEOS-1645 The stories could need further refinement.

Orienting at the IPTC standard is a good idea. For an automatic extraction, I would like to have a configurable mapping between the files meta data and the fields in our meta data node type. This would make it possible to also map EXIF / XMP Data to these fields.

Fo the automatic meta data extraction, http://image.intervention.io/ could be used, which is an actively developed image handling package. It could also replace the currently used Imagine package completely.

I am also realy looking forward to help implementing this feature!

theilm · October 27, 2015, 9:12pm

Thanks for the Epic! Another question / suggestion: @dfeyer started creating a Neos.MediaBrowser package (see https://github.com/neos/neos-development-collection/pull/159), which really makes sense. What do you think about combining the meta data functionality and the media browser functionality in one package (Neos.Media for example). I know we make the meta data package less “optional” if we combine the packages. But as we probably want to use dimensions for title and caption anyway and will use meta data or node structure for browsing later on, it could make sense. What do you think?

We don’t have to decide it now, I’ll try to start with a basic prototype without UI (NEOS-1646 basically). But before implementing UI we should decide if we want to merge the packages, I guess, to avoid complexity.

dfeyer · October 28, 2015, 5:08pm

From my current POV it’s a no go, the Meta data package should be handle like the current search packages:

One package that provide interface and common service
An other package that provide the implementation

Official implementation based on the CR

The first and official implementation will be node based, but that should be fully abstracted by the package, so the node dont go outside of the “bounded context”, only DTO provided by the common package are allowed to get out of the bounded context.

Like this we can have more than one implementation, based on a completely different backend (RethinkDB, Alfresco, Documentum, …) to allow maximum flexibility for project that need advanced stuff.

I want to avoid putting “everything” in the official implementation package. Don’t miss understand me, the official implementation should be solid with a complete feature set. But we need to have the flexibility to replace this package by an other implement to avoid having to implement exotic thing in the official package.

Data extraction / Mapping

Data extraction, can be provided by small dedicated package, that can registre “DataExtractor” and use a common mapping configuration to store the extracted meta data with the current storage implementation.

For me, Data Extraction is for step 2 of the project, first let build a flexible structure to handle the storage, and flexible implementation.

Media Browser

The media browser should detect if a meta data storage package is available, and show the UI provided by the implementation. So different implementation, can provide completely different UI.

By default the Media Browser just support the meta data provider by the Media package (like now).

theilm · October 28, 2015, 6:21pm

Thanks for your comment and explanation! Perhaps we are mixing up two different topics. I am absolutely in favor of creating separate packages for meta data extraction (sorry for my misleading last comment), to handle everthing interface based to allow for extension / replacement like in the search package. But we have to decide on the basis for media handling (not only meta data storage) in Neos. Do we “just” want to add meta data to the assets e.g. via the CR or any other data storage? Or do we want to use the CR as the basis for asset storage and asset handling in Neos. If I did not misunderstand his comment, @aertmann mentioned the second solution as a possibility in this post about media package and search / meta data.

Solution 1 is more or less what you think of I guess. One major problem I see is the context. You have to provide the service with the asset and the current context in order to get the correct node properties for the dimensions, workspace etc. But I think it does not make sense to make a service interface that requires a context (which is CR specific) if we want to generalize. For me it feels more “natural” to use a node that already knows about its context in the first place.
To me it makes absolutely sense to have DataExtractors or configurable meta data sources (Alfresco etc.). But I do not quite see the use case for different storages. I am not against it, I just don’t see the use case for now.

Solution 2 is probably more complex. We could use nodes also for tags etc. We could use the node tree for navigation or nested tags. And we already have search / elasticsearch for nodes for advanced search and filtering. As such it has nothing to do with meta data handling, although the nodes could be used for meta data storage easily. With this solution, the Neos.Media package would include the node based asset storage. So my last comment was misleading, sorry for that. It’s not about packing meta data functionality in a media browser package, it’s about providing a different (node based) kind of asset handling.

As I thought of solution 2 when talking about node based storage, I initially proposed the intermediary solution, as solution 2 seems quite complex to me.

Sorry, I guess my initial RFC has not very much to do with the things we are discussing now… I don’t know if this thread is the right place to discuss all this, I feel like spoiling the thread with all my questions. Happy to use other channels like Slack if you think that’s more appropriate.

daniellienert · October 29, 2015, 7:17am

To rephrase @dfeyer, a Package structure to start could look like this:

Neos.Media.MetaData.ContentRepositoryAdapter

Provides an interface to store and receive meta data for assets

Neos.Media.MetaData.Extractor

Uses a third party package to extract the meta data
Uses a mapping configuration to map the extracted data to defined fields
Provide configuration / maybe templates to display and edit the data in the browser

Neos.Media.Browser

(@dfeyer - what do you think about calling the package like that?)

Provides the UI to manage media files
Provides the UI to view and edit meta data of a file (without MetaData packages installed, only the data the Media Package provides)
Provides the UI to view and edit additional meta data which is provided through the meta data package.

dfeyer · October 29, 2015, 8:13am

Out of scope for the current RFC.

The RFC is about metadata handling, so we touch currently nothing in the Media package, and stay on an entity based Media stuff. Only the meta data will use the CR.

For an other RFC, we will definitively move the resource storage from entity to node, but that’s two distinct projects and RFC.

If we respect the bounded context from DDD, it should be easy to move the resource from entity to node, without any modification on the MetaData bounded context.

From my POV:

By default you can use Neos with the current set of meta data (from the current Media package), as title, description and tags
If you install a MetaData content repository adaptor, you have access to a new UI with MetaData based on whatever standard (IPTC, for the default implementation)
Having the advanced metadata should not be the default (maybe we can improve the setup, to choose during installation if you need basic (read current) or advanced metadata)

Naming of the packages

@daniellienert I need to think about it a bit more, but your naming sounds pretty good.

dfeyer · October 29, 2015, 11:04am

For this package it’s OK, I just update the PR, see:

Now this PR depends on [TASK] More flexible controllerObjectName in Request Patterns settings by dfeyer · Pull Request #114 · neos/flow-development-collection · GitHub (because nothing is simple …)

dfeyer · October 29, 2015, 11:37am

@theilm Did you allow me to rewrite the first post based on the current discussion just to have a more easy to read RFC ?

daniellienert · October 29, 2015, 11:52am

Alternatively start a new thread “Project Proposal” and mark it as wiki post so we could all participate.

theilm · October 29, 2015, 11:58am

@dfeyer Yes of course! I still have some architectural doubts about context handling, but that’s another issue. (Or we do it like @daniellienert proposed, perhaps that makes more sense?)

My advice would be to keep archtitectural overhead as small as possible concerning meta data storage. Meta data extraction should be extendable by third party packages. Meta data storage will probably be node based, especially when we will work on the CR based asset storage later on. What do you think?

Regarding tags: I agree, it’s not worth building something special when we can switch to Taxonomy later on (although tag translation should be possible later, as the system can be used by editors that speak different languages, or might even been shown in the frontend).

aertmann · October 30, 2015, 8:43am

Hey just chiming in here. Project proposals are not about technical details, to figure that out use RFCs and then when there’s a consensus on the RFC or RFCs then create a project proposal. Project proposals are about justifying taking on a project and it’s scope and to allow people to give feedback before a project is taking on by a team.

dfeyer · November 2, 2015, 1:41pm

Thanks I start working on something, but need a bit more time to cleanup, will post it here before the end of the week.

dfeyer · November 2, 2015, 10:32pm

Here you can found a first draft of the RFC:

http://pad.ttree.ch/p/qKkUAJo6qP

Feel free to comment here, I need to read a bit more about IPTC, to see if everything fit correctly and write something about MetaData Extraction.

cgat · December 23, 2015, 11:55pm

May there be a need to save an additional ID or version info to solve the need to allow replacing an asset with a new version, while also maintaining a stable symlink (URI) that allways points to the last version?

So that the webserver knows to answer with a short “browoser cache is still valid message” or deliver the new version.