RFC: Adding additional properties to media assets

I’ve created an EPIC in Jira. https://jira.neos.io/browse/NEOS-1645 The stories could need further refinement.

Orienting at the IPTC standard is a good idea. For an automatic extraction, I would like to have a configurable mapping between the files meta data and the fields in our meta data node type. This would make it possible to also map EXIF / XMP Data to these fields.

Fo the automatic meta data extraction, http://image.intervention.io/ could be used, which is an actively developed image handling package. It could also replace the currently used Imagine package completely.

I am also realy looking forward to help implementing this feature!

Thanks for the Epic! Another question / suggestion: @dfeyer started creating a Neos.MediaBrowser package (see https://github.com/neos/neos-development-collection/pull/159), which really makes sense. What do you think about combining the meta data functionality and the media browser functionality in one package (Neos.Media for example). I know we make the meta data package less “optional” if we combine the packages. But as we probably want to use dimensions for title and caption anyway and will use meta data or node structure for browsing later on, it could make sense. What do you think?

We don’t have to decide it now, I’ll try to start with a basic prototype without UI (NEOS-1646 basically). But before implementing UI we should decide if we want to merge the packages, I guess, to avoid complexity.

From my current POV it’s a no go, the Meta data package should be handle like the current search packages:

  • One package that provide interface and common service
  • An other package that provide the implementation

Official implementation based on the CR

The first and official implementation will be node based, but that should be fully abstracted by the package, so the node dont go outside of the “bounded context”, only DTO provided by the common package are allowed to get out of the bounded context.

Like this we can have more than one implementation, based on a completely different backend (RethinkDB, Alfresco, Documentum, …) to allow maximum flexibility for project that need advanced stuff.

I want to avoid putting “everything” in the official implementation package. Don’t miss understand me, the official implementation should be solid with a complete feature set. But we need to have the flexibility to replace this package by an other implement to avoid having to implement exotic thing in the official package.

Data extraction / Mapping

Data extraction, can be provided by small dedicated package, that can registre “DataExtractor” and use a common mapping configuration to store the extracted meta data with the current storage implementation.

For me, Data Extraction is for step 2 of the project, first let build a flexible structure to handle the storage, and flexible implementation.

Media Browser

The media browser should detect if a meta data storage package is available, and show the UI provided by the implementation. So different implementation, can provide completely different UI.

By default the Media Browser just support the meta data provider by the Media package (like now).

1 Like

Thanks for your comment and explanation! Perhaps we are mixing up two different topics. I am absolutely in favor of creating separate packages for meta data extraction (sorry for my misleading last comment), to handle everthing interface based to allow for extension / replacement like in the search package. But we have to decide on the basis for media handling (not only meta data storage) in Neos. Do we “just” want to add meta data to the assets e.g. via the CR or any other data storage? Or do we want to use the CR as the basis for asset storage and asset handling in Neos. If I did not misunderstand his comment, @aertmann mentioned the second solution as a possibility in this post about media package and search / meta data.

Solution 1 is more or less what you think of I guess. One major problem I see is the context. You have to provide the service with the asset and the current context in order to get the correct node properties for the dimensions, workspace etc. But I think it does not make sense to make a service interface that requires a context (which is CR specific) if we want to generalize. For me it feels more “natural” to use a node that already knows about its context in the first place.
To me it makes absolutely sense to have DataExtractors or configurable meta data sources (Alfresco etc.). But I do not quite see the use case for different storages. I am not against it, I just don’t see the use case for now.

Solution 2 is probably more complex. We could use nodes also for tags etc. We could use the node tree for navigation or nested tags. And we already have search / elasticsearch for nodes for advanced search and filtering. As such it has nothing to do with meta data handling, although the nodes could be used for meta data storage easily. With this solution, the Neos.Media package would include the node based asset storage. So my last comment was misleading, sorry for that. It’s not about packing meta data functionality in a media browser package, it’s about providing a different (node based) kind of asset handling.

As I thought of solution 2 when talking about node based storage, I initially proposed the intermediary solution, as solution 2 seems quite complex to me.

Sorry, I guess my initial RFC has not very much to do with the things we are discussing now… I don’t know if this thread is the right place to discuss all this, I feel like spoiling the thread with all my questions. Happy to use other channels like Slack if you think that’s more appropriate. :smile:

1 Like

To rephrase @dfeyer, a Package structure to start could look like this:

Neos.Media.MetaData.ContentRepositoryAdapter

  • Provides an interface to store and receive meta data for assets

Neos.Media.MetaData.Extractor

  • Uses a third party package to extract the meta data
  • Uses a mapping configuration to map the extracted data to defined fields
  • Provide configuration / maybe templates to display and edit the data in the browser

Neos.Media.Browser

(@dfeyer - what do you think about calling the package like that?)

  • Provides the UI to manage media files
  • Provides the UI to view and edit meta data of a file (without MetaData packages installed, only the data the Media Package provides)
  • Provides the UI to view and edit additional meta data which is provided through the meta data package.

Out of scope for the current RFC.

The RFC is about metadata handling, so we touch currently nothing in the Media package, and stay on an entity based Media stuff. Only the meta data will use the CR.

For an other RFC, we will definitively move the resource storage from entity to node, but that’s two distinct projects and RFC.

If we respect the bounded context from DDD, it should be easy to move the resource from entity to node, without any modification on the MetaData bounded context.

From my POV:

  • By default you can use Neos with the current set of meta data (from the current Media package), as title, description and tags
  • If you install a MetaData content repository adaptor, you have access to a new UI with MetaData based on whatever standard (IPTC, for the default implementation)
  • Having the advanced metadata should not be the default (maybe we can improve the setup, to choose during installation if you need basic (read current) or advanced metadata)

Tags

It’s a bit the challenge of the project. From my POV Tags need to move to CR in Taxonomy package, but not sure if it can be done before the metadata project. And doing both project at the same time it’s a bit too big.

So maybe we need to be pragmatic, and keep the tagging based on entities, and improve / migrate when the Taxonomy project is done.

Tags doe not need dimensions (by default Tags are backend only stuff, and rarely translated), so it’s fine for me to live with an entity based storage for the next few month.

What do you think about this ?

Naming of the packages

@daniellienert I need to think about it a bit more, but your naming sounds pretty good.

For this package it’s OK, I just update the PR, see:

Now this PR depends on [TASK] More flexible controllerObjectName in Request Patterns settings by dfeyer · Pull Request #114 · neos/flow-development-collection · GitHub (because nothing is simple …)

@theilm Did you allow me to rewrite the first post based on the current discussion just to have a more easy to read RFC ?

Alternatively start a new thread “Project Proposal” and mark it as wiki post so we could all participate.

@dfeyer Yes of course! I still have some architectural doubts about context handling, but that’s another issue. (Or we do it like @daniellienert proposed, perhaps that makes more sense?)

My advice would be to keep archtitectural overhead as small as possible concerning meta data storage. Meta data extraction should be extendable by third party packages. Meta data storage will probably be node based, especially when we will work on the CR based asset storage later on. What do you think?

Regarding tags: I agree, it’s not worth building something special when we can switch to Taxonomy later on (although tag translation should be possible later, as the system can be used by editors that speak different languages, or might even been shown in the frontend).

Hey just chiming in here. Project proposals are not about technical details, to figure that out use RFCs and then when there’s a consensus on the RFC or RFCs then create a project proposal. Project proposals are about justifying taking on a project and it’s scope and to allow people to give feedback before a project is taking on by a team.

1 Like

Thanks I start working on something, but need a bit more time to cleanup, will post it here before the end of the week.

Here you can found a first draft of the RFC:

http://pad.ttree.ch/p/qKkUAJo6qP

Feel free to comment here, I need to read a bit more about IPTC, to see if everything fit correctly and write something about MetaData Extraction.

May there be a need to save an additional ID or version info to solve the need to allow replacing an asset with a new version, while also maintaining a stable symlink (URI) that allways points to the last version?

So that the webserver knows to answer with a short “browoser cache is still valid message” or deliver the new version.

Hi @cgat, thanks for your message, this RFC is just about handling asset meta data, not about replacing asset. We know the problem with stable URI for Assets and Nodes.

I’m currently working on a redirection module for Documents Nodes (see https://jira.neos.io/browse/NEOS-721). If we move the asset storage, to the CR, maybe we can use the some parts of this API to generate redirection to new resource.

I added a project proposal for this topic: Project Proposal: Asset MetaData Handling
@dfeyer I would be thrilled to work on that topic during the sprint in Rosenheim.

I’d join you in Rosenheim to work on this @daniellienert!

2 Likes

@daniellienert Feel free to work on the topic, a first prototype will be nice. Unfortunatlly I don’t know when I arrive at the sprint too much thing to do this month.

Hello,

I worked on that topic and built a (i think quite promising) prototype. Nothing is carved in stone of course and I appreciate any feedback.

Like discussed earlier, the prototype currently consists of these three parts:

Neos.MetaData

  • Defines DTOs for metadata standards, like EXIF, IPTC or XMP
  • Provides a manager class to handle updated meta data
  • Already maps the DTOs to title / caption of the asset table. Mappings can be provided using eel:

Settings.yaml:

Neos:
  MetaData:
    metaDataMapping:
      title: '${asset.title || iptc.documentTitle}'
      caption: '${asset.caption || iptc.caption}'

Neos.MetaData.Extractor

  • Provides ExtractorAdapters to 3rd party packages and converts the result to the according DTOs
  • An Extractor class that selectes the apropriate ExtractorAdapater by asset type and builds a collection of DTOs
  • A command to extract metaData of existing assets

Command:

./flow metadata:extract
No Extractor available for media type application/pdf
17/17 [============================] 100%
Finished extraction.

Neos.MetaData.ContentRepositoryAdapter

  • Generates nodes in the root meta
  • Maps the DTOs on specific metaData Node Types
  • Provides FlowQueryOperations to handle meta data

Mapping MetaDataDTOs to nodes
There is a lot of meta data formats out there. Meta data in images for example are stored in three different standards. Some of the standards cover their own domain (eg EXIF for capturing device generated data) but some are overlapping. Therefore a mapping to the storage in neos needs to be flexible. I would suggest to configure a small standard, as it is easily possible to add more properties to the nodes and adjust the mapping. A nodeType definition could look like:

‘Neos.MetaData:Asset’:
abstract: true
properties:
originalFileName:
type: string
mapping: ‘${asset.fileName}’

'Neos.MetaData:Iptc':
  abstract: true
  properties:
    creationDate:
      type: DateTime
      mapping: '${iptc.creationDate}'
    authorByline:
      type: string
      mapping: '${iptc.authorByLine}'

'Neos.MetaData:Exif':
  abstract: true
  properties:
    model:
      type: string
      mapping: '${exif.Model}'


'Neos.MetaData:Image':
  superTypes:
    'Neos.MetaData:Asset': true
    'Neos.MetaData:Exif': true
    'Neos.MetaData:Iptc': true

NodeData Table after import

Typoscript Interface

Get the meta data of the property image:

prototype(TYPO3.Neos.NodeTypes:Image) {
	metaData = ${q(node).metaData('image').properties}
}

Select Images by their MetaData:

assets = ${q(meta).find('[instanceof Neos.MetaData:Image][authorByline*="Daniel Lienert"]').getAssets()}

(currently needs a hack in the TypoScriptView to work, see How to extend the Node Context to query an additional root node)

I am no confident with the API yet - so any ideas welcome.

I haven’t added the packages to composer yet, as I hope to move them to the neos account soon. If you want to try it out - here are the packages:

What is missing yet is the storage of the extracted tags
A next step could be the kickstarting of the Neos.MetaData.Browser package with an interface to edit the meta data. I also am curious to see what filters are possible with the meta data.

I hope to decide on this as a project for the neos teams soon :slight_smile:

@daniellienert Thanks a lots for the prototype look promising and love the current code a lots. I’m in holiday between tomorrow and for 10 days. So no times to test this carefully, I will test it when I’m back at the office.