I don’t know if you are interested in impressions/feedback of me testing/looking at Neos 2.0. If not, just ignore or delete this discussion post.
Unicode out-of-the-box?
First I was puzzled that in the year 2015 there exists a modern CMS that do not support Unicode out-of-the-box, NEOS-1548. Later more on this.
strtoupper: A problem in the year 2015? Really?
Because the “C” in “CMS” means “Content” I started to look at the important type of content: strings.
I took a look at the source code: As a example I looked at strtoupper
. Interestingly one finds two different versions: one in TYPO3/Eel/Helper/StringHelper
and one in TYPO3/Flow/Utility/Unicode/Functions
. Of course they differ. Let’s have a look at one of the two:
static public function strtoupper($string) {
return str_replace('ß', 'SS', mb_strtoupper($string, 'UTF-8'));
}
Another shocking moment for me. One by one. Let’s start. By writing (or copy&pasting) such a code, one sees, that obviously mb_strtoupper
seems to have a problem with “ß”. What is the natural thing to do? mb_strtoupper
belongs to mbstring, a php extension.
- Why not reporting this bug upstream?
- Why not investigating further?
- Why not helping to improve php?
A further investigation shows mbstring has a lot of problems. Why? One (not the only) reason is that /php-src/ext/mbstring/unicode_data.h
is horribly outdated. Its from 2010(!). Php-folks don’t seem to be very unicode-affine. So why not reporting this bug? php is the language you have chosen for your CMS. The less bugs php has, the better for Neos. I opened a bug at php.net:
PHP Bug 70475
Datatypes
Then a nice person was so kind to comment on my Neos bug report to use flow database:setcharset
if I really really want Unicode. Did he tested this? Did somebody ever tested this? I do not know. I tried it for my Neos 2.0 default installation and got error:
An exception occurred while executing 'ALTER TABLE `flow_doctrine_migrationstatus` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci':
SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes
[ERROR] The transaction was rolled back.
Now we are entering the world of MySQL and datatypes. flow_doctrine_migrationstatus
and the column called version
. It has as data-type VARCHAR(255)
with Collation “utf8_unicode_ci”! Inside the columns are Datetimes. How do I know? Because in the source code I see the values of this columns are used for DateTime::createFromFormat
.
- Why not using the Datetime datatype or the timestamp datatype?
- Are there really persons who believe a version information like 20110620155001 can contain up to 255 characters?
- Are there really persons who believe a version information like 20110620155001 can contain non-ascii characters?
Is this a singular exception. No! In several other columns in other tables where hash-values are stored the datatype used is VARCHAR(x)
with collation “utf8_unicode_ci”.
- Are there persons out there who believe the result of a hash-operation has variable length?
- Are there persons out there who beliebe the result can contain non-ascii characters?
I cannot find a rationale for all this decisions. In my eyes this is weird.
The result of this one can be seen at the above error message: Because the version column is the primary key, MySQL want to index this column. But with this VARCHAR
idea and the 255 characters idea and the unicode idea this are (in principle not for your data) more than 767 bytes for the utf8mb4=realutf8 encoding.
Quick look at the html output
I tested the output for the TYPO3.NeosDemoTypo3Org
Site.
The w3.org validator is telling me that lang and hreflang values cannot be “en_US”, etc. Because of RFC 5646.
Summary of my first look
I have mixed feelings. One the one side in the source the conecpt of “Domain Driven Design” is used. This is really cool. But looking at basic stuff (like Unicode handling, strings and SQL-datatypes, etc.) I’m completely worried.
That are my impressions.
Bye
C. Ludwig