Flow magically injects a property called
persistence_object_identifier into entities or value objects that don’t define an ID column on their own. This property is defined as a string, and Doctrine will store it in a column of type
varchar with length
40. Flow automatically stores a UUID in it.
When digging into the depths of Doctrine, I stumbled upon a field type called
guid, designed to handle UUIDs.
Some database platforms have a column type specifically for UUIDs. Doctrine will use that type to store
guid fields if it runs on such a platform. If the platform has no specific type for UUIDs, Doctrine will fall back to
For example, Doctrine will use
uuid as column type on PostgreSQL. Since MySQL has no specific type, it will default to
varchar(36) there. Other platforms supported by Doctrine with a specific
guid column type are SQLAnywhere and MS SQL Server.
Why not use
persistence_object_identifier field type?
I know this is a huge change to Flow, but it might be worth it. Please tell me what you think.
guid on a PostgreSQL machine
Two basic tables, each has two columns: A primary key,
varchar(40) on table 1,
uuid on table 2. Each table has an index on the primary column. The second column is just a data payload.
Each test was run five times, time figures are average.
INSERT of 1.000.000 rows, values generated with Flow’s
Table 1: 3899 seconds
Table 2: 3910 seconds
SELECT COUNT (DISTINCT col1):
Table 1: 629 ms
Table 2: 208 ms
SELECT * using a
WHERE clause with previously selected IDs:
Table 1: 112 seconds
Table 2: 109 seconds
UPDATEs on the payload column using a
WHERE clause with previously selected IDs:
Table 1: 391 seconds
Table 2: 381 seconds
I know that these figures only show a very moderate performance gain. The test cases are constructed and not all of them are close to what happens in the real world.
Case Study on PostgreSQL
I’m working on a Flow project that has a very large and very complex PostgreSQL database with millions of datasets. I use a custom PHP trait to change the column type of
This brought a considerable performance boost.
I don’t have solid figures on the improvement, only that the number of queries slower than 100ms is now at least 50% lower than before.
There are lots of
JOINed queries in my project, and I assume that this is common to most, if not all, Flow-based projects. In my project there is a huge performance gain for queries with complex
JOINs. Unfortunately I don’t have any before-and-after figues on them. But think of some web requests taking 1/4 less time and importer processes improving by 1/4 to 1/3.
When working with the
persistence_object_identifier on PostgreSQL, I noticed that there are issues when Flow directly uses identifiers based on user input. PostgreSQL checks incoming UUIDs for wellformedness, and if they aren’t well-formed, the query fails.
For example, if you use routes with custom
objectType, but no special
IdentityRoutePart will use the object’s identifier as the URI segment, resulting in UUIDs in the URL.
When matching an incoming URL (and there is no cache yet), Flow may query the database for objects that are identified by that segment, even if it’s not a UUID. It’s not unlikely that the incoming string is not a UUID, making Flow fail the request with error 500.
To prevent that, I had to implement a custom child class of
IdentityRoutePart that checks if the incoming segment really is a UUID first.
If Flow opts for a transition to a
persistence_object_identifier, a similar functionality would have to be implemented in
IdentityRoutePart and other classes that use foreign data as an object identifier.
Migrating an existing Flow instance
Doctrine builds a migration to change the column type from
uuid. You have to manually add a
USING persistence_object_identifier::uuid type cast to the
ALTER TABLE statement. Each existing foreign key/constraint and index on a changed column has to be dropped and re-created. Doctrine does not regard these changes in migrations on its own, so they have to be put there manually.
Consequences of the
guid change to MySQL-based installations
On migration, each
persistence_object_identifier column will change its
varchar length from 40 to 36. In most cases that shouldn’t be a problem, since canonical string representations of UUIDs are 36 chars long.
Only for some edge cases when there are non-GUID-
persistence_object_identifiers this can cause trouble.