Motivation
Flow magically injects a property called persistence_object_identifier
into entities or value objects that don’t define an ID column on their own. This property is defined as a string, and Doctrine will store it in a column of type varchar
with length 40
. Flow automatically stores a UUID in it.
When digging into the depths of Doctrine, I stumbled upon a field type called guid
, designed to handle UUIDs.
Some database platforms have a column type specifically for UUIDs. Doctrine will use that type to store guid
fields if it runs on such a platform. If the platform has no specific type for UUIDs, Doctrine will fall back to varchar(36)
instead.
For example, Doctrine will use uuid
as column type on PostgreSQL. Since MySQL has no specific type, it will default to varchar(36)
there. Other platforms supported by Doctrine with a specific guid
column type are SQLAnywhere and MS SQL Server.
Why not use guid
as persistence_object_identifier
field type?
I know this is a huge change to Flow, but it might be worth it. Please tell me what you think.
Comparison between varchar(40)
and guid
on a PostgreSQL machine
Two basic tables, each has two columns: A primary key, varchar(40)
on table 1, uuid
on table 2. Each table has an index on the primary column. The second column is just a data payload.
Each test was run five times, time figures are average.
INSERT
of 1.000.000 rows, values generated with Flow’s Algorithms::generateUUID()
:
Table 1: 3899 seconds
Table 2: 3910 seconds
SELECT COUNT (DISTINCT col1)
:
Table 1: 629 ms
Table 2: 208 ms
1.000.000 times SELECT *
using a WHERE
clause with previously selected IDs:
Table 1: 112 seconds
Table 2: 109 seconds
100.000 UPDATE
s on the payload column using a WHERE
clause with previously selected IDs:
Table 1: 391 seconds
Table 2: 381 seconds
I know that these figures only show a very moderate performance gain. The test cases are constructed and not all of them are close to what happens in the real world.
Case Study on PostgreSQL
I’m working on a Flow project that has a very large and very complex PostgreSQL database with millions of datasets. I use a custom PHP trait to change the column type of persistence_object_identifier
to guid
.
This brought a considerable performance boost.
I don’t have solid figures on the improvement, only that the number of queries slower than 100ms is now at least 50% lower than before.
There are lots of JOIN
ed queries in my project, and I assume that this is common to most, if not all, Flow-based projects. In my project there is a huge performance gain for queries with complex JOIN
s. Unfortunately I don’t have any before-and-after figues on them. But think of some web requests taking 1/4 less time and importer processes improving by 1/4 to 1/3.
Caveats
When working with the guid
ed persistence_object_identifier
on PostgreSQL, I noticed that there are issues when Flow directly uses identifiers based on user input. PostgreSQL checks incoming UUIDs for wellformedness, and if they aren’t well-formed, the query fails.
For example, if you use routes with custom routeParts
with objectType
, but no special uriPattern
, the IdentityRoutePart
will use the object’s identifier as the URI segment, resulting in UUIDs in the URL.
When matching an incoming URL (and there is no cache yet), Flow may query the database for objects that are identified by that segment, even if it’s not a UUID. It’s not unlikely that the incoming string is not a UUID, making Flow fail the request with error 500.
To prevent that, I had to implement a custom child class of IdentityRoutePart
that checks if the incoming segment really is a UUID first.
If Flow opts for a transition to a guid
ed persistence_object_identifier
, a similar functionality would have to be implemented in IdentityRoutePart
and other classes that use foreign data as an object identifier.
Migrating an existing Flow instance
PostgreSQL
Doctrine builds a migration to change the column type from varchar(40)
to uuid
. You have to manually add a USING persistence_object_identifier::uuid
type cast to the ALTER TABLE
statement. Each existing foreign key/constraint and index on a changed column has to be dropped and re-created. Doctrine does not regard these changes in migrations on its own, so they have to be put there manually.
Consequences of the guid
change to MySQL-based installations
On migration, each persistence_object_identifier
column will change its varchar
length from 40 to 36. In most cases that shouldn’t be a problem, since canonical string representations of UUIDs are 36 chars long.
Only for some edge cases when there are non-GUID-persistence_object_identifier
s this can cause trouble.
See also
https://www.postgresql.org/docs/9.6/static/datatype-uuid.html