Feedback of a 1st time user/tester (Neos 2.0)

I don’t know if you are interested in impressions/feedback of me testing/looking at Neos 2.0. If not, just ignore or delete this discussion post.

Unicode out-of-the-box?

First I was puzzled that in the year 2015 there exists a modern CMS that do not support Unicode out-of-the-box, NEOS-1548. Later more on this.

strtoupper: A problem in the year 2015? Really?

Because the “C” in “CMS” means “Content” I started to look at the important type of content: strings.

I took a look at the source code: As a example I looked at strtoupper. Interestingly one finds two different versions: one in TYPO3/Eel/Helper/StringHelper and one in TYPO3/Flow/Utility/Unicode/Functions. Of course they differ. Let’s have a look at one of the two:

 static public function strtoupper($string) {       
   return str_replace('ß', 'SS', mb_strtoupper($string, 'UTF-8'));
 }

Another shocking moment for me. One by one. Let’s start. By writing (or copy&pasting) such a code, one sees, that obviously mb_strtoupper seems to have a problem with “ß”. What is the natural thing to do? mb_strtoupper belongs to mbstring, a php extension.

  1. Why not reporting this bug upstream?
  2. Why not investigating further?
  3. Why not helping to improve php?

A further investigation shows mbstring has a lot of problems. Why? One (not the only) reason is that /php-src/ext/mbstring/unicode_data.h is horribly outdated. Its from 2010(!). Php-folks don’t seem to be very unicode-affine. So why not reporting this bug? php is the language you have chosen for your CMS. The less bugs php has, the better for Neos. I opened a bug at php.net:
PHP Bug 70475

Datatypes

Then a nice person was so kind to comment on my Neos bug report to use flow database:setcharset if I really really want Unicode. Did he tested this? Did somebody ever tested this? I do not know. I tried it for my Neos 2.0 default installation and got error:

 An exception occurred while executing 'ALTER TABLE `flow_doctrine_migrationstatus` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci':
 SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes
 [ERROR] The transaction was rolled back.

Now we are entering the world of MySQL and datatypes. flow_doctrine_migrationstatus and the column called version. It has as data-type VARCHAR(255) with Collation “utf8_unicode_ci”! Inside the columns are Datetimes. How do I know? Because in the source code I see the values of this columns are used for DateTime::createFromFormat.

  1. Why not using the Datetime datatype or the timestamp datatype?
  2. Are there really persons who believe a version information like 20110620155001 can contain up to 255 characters?
  3. Are there really persons who believe a version information like 20110620155001 can contain non-ascii characters?

Is this a singular exception. No! In several other columns in other tables where hash-values are stored the datatype used is VARCHAR(x) with collation “utf8_unicode_ci”.

  1. Are there persons out there who believe the result of a hash-operation has variable length?
  2. Are there persons out there who beliebe the result can contain non-ascii characters?

I cannot find a rationale for all this decisions. In my eyes this is weird.

The result of this one can be seen at the above error message: Because the version column is the primary key, MySQL want to index this column. But with this VARCHAR idea and the 255 characters idea and the unicode idea this are (in principle not for your data) more than 767 bytes for the utf8mb4=realutf8 encoding.

Quick look at the html output

I tested the output for the TYPO3.NeosDemoTypo3Org Site.
The w3.org validator is telling me that lang and hreflang values cannot be “en_US”, etc. Because of RFC 5646.

Summary of my first look

I have mixed feelings. One the one side in the source the conecpt of “Domain Driven Design” is used. This is really cool. But looking at basic stuff (like Unicode handling, strings and SQL-datatypes, etc.) I’m completely worried.

That are my impressions.
Bye
C. Ludwig

1 Like

Welcome Christian, nice to read your first impressions. And nice to ready comments with technical background, and not just complaining :wink:

First, I have to say one thing, we need people like you in the team and you are highly welcome if you can contribute some fix for one of the you favorit issue.

Now in more details, moving to utf8mb4 as the default charset has been discussed multiple times, and will be a good choice at some point. Currently the requirement of Neos, doesn’t explicitly specify a MySQL version (see http://neos.readthedocs.org/en/latest/GettingStarted/Installation.html). As utf8mb4 is available in MySQL 5.5.3, we need to make this change in our requirement first, communicate to our user base and switch our code base. It’s not easy to keep backward compatibility.

Regarding the database schema, I think there is lots of improvements that can be done, but as always it’s a question of time and priority. If you have time to prepare some PR regarding database schema, index optimization, … you are highly welcome. Keep in mind that any DB migration need to be provided/tested for MySQL and PostresSQL (SQLite too, but just for not breaking some test).

About the strtoupper issue … it’s pretty ugly I agree. And look some german coder fix their bug at the wrong place. Thanks for opening the issue on the PHP bugtracker, that’s also one point where we need to be more aware: contributing upstream.

Hope to read you again, bests

Hello Dominique Feyer,

thank you for reading and answering.

For contributing fixes you need persons with php experience. I had my first contact with php (and php-fpm) a few days ago.

I hope you are switching to utf8mb4 soon. Interesting: you had multiple discussions [if you want to support more than 5.88% of the Unicode code pionts] recently. Several years ago this if-question was a hot topic.

As far as I know, Mysql 5.5.3 was 2010. Good luck with building a modern CMS supporting such ancient software.

Your database schema: Get them right in the first place. Otherwise you have to write all the migrations. [By the way: even ancient software supportes fixed-size ASCII-Strings for your md5/sha-hash-values; if you really want to store them as characters instead of binary.]

I’m not very confident, that PHP’s mbstring will support full case mapping soon, unfortunately. See the bug report 70475 there …

Bye

Thanks for the follow up on PHP issue tracker, nice answer

Hi Christian,

thank for taking the time to give us feedback, this is very much appreciated and we rely especially on comments by people being new to Neos, because they will naturally approach it from a different angle than the core team might do.

Since Dominique already replied to the specific topic, I’d only like to add one remark regarding the tone of your post. I received the way you wrote your feedback in a bit insulting manner. There may be different reasons for that – for example for most of us, English is not a mother language and therefore we often misinterprete the emotions which we think we read between the lines.

But no matter if intended or not, I’d like to ask you to re-read your next post before sending it and check if it is written in a way suggesting that you would like to be helpful and understanding, or if it curt be received in an insulting way.

I don’t want to complicate things here, and I’ve been too long part of Open Source projects to feel offended. But I’d like to have a good atmosphere here, because, remember that everybody helping to improve Neos is doing this as a volunteer.

Now, I hope you don’t feel insulted by my comment, and thanks again for pointing us to your Unicode issues.

Cheers,
Robert

2 Likes