Breaking bugfixes

christianm · November 9, 2018, 12:22pm

Hey all,

we currently look at a curious case of a breaking bugfix and as the opinions about this seem to be widely different I would like to get a team decision on how to deal with this.

The example case below is just for reference purposes. Generally it’s about wrong behaviour that is found in already released version branches. As changing the behaviour might be breaking for people that rely on the wrong behaviour you can either introduce a “breaking” bugfix to get things in line or leave the broken behaviour in until the next release. Both have their pros and cons so obviously there might be some edge cases were we need to do the opposite of what we decide here. But a general direction for this would be neat as it might prevent awkward situations like this:

Bugfix merged to 3.3:

People decided to revert after figuring it might be breaking in some cases:

another bugfix was merged later on again to 3.3:

which now leads to another (potential) revert:

So what do we (generally) want to do with it:

Fix bug (despite potential breaking behaviour)
Keep bug (meaning people have to deal with unexpected behaviour)

0 voters

bwaidelich · November 9, 2018, 12:44pm

In general I agree to solve bugs as early as possible but IMO this is one of the topics that should not be solved with a generic rule but needs per case analysis.

As an example, allow me to describe my point of view for the case mentioned above:

To recap, it’s all about this bug

After I found a fix and applied it to the lowest affected branch, we realized that it is more breaking than I initially thought. That’s why we quickly decided to revert the change before it was released with the following reasoning:

I thought about this once again and came to the conclusion that it really doesn’t make sense to change the behavior in a patchlevel release. Even if it’s unlikely to break nobody will actually benefit. And if it breaks it won’t be in the form of an exception but by returning unexpected nodes which can be really nasty…

I still stand by that: To risk that a running website breaks due to a patch level update is the worst case scenario, especially if it is not in the form of an exception but via some maybe subtle content changes that might only be noticed long after the fact.
If we go for a general “we just ignore breaking behavior” kind of way, we risk peoples trust in patch level updates eventually leading to outdated installations.

Again, the opposite (never introduce breaking changes to patchlevel releases) is not what I suggest. As a German I like to have simple rules, but in this case I think it’s counter-productive.

daniellienert · November 9, 2018, 1:09pm

The options to vote are IMHO very misleading. I am absolutely in favor of fixing that bug, but I would do that with a minor release at least. I like to have a carefree patch level update possible (as I recommend doing patchlevel updates regularily, to everyone asking).
I am against introducing behavior changing changes like this in a patch level release.

christianm · November 9, 2018, 1:10pm

As said, we might change this decision on a case by case basis. But I had the feeling we have very different opinions in the team and you might end up in a funny round trip between people merging (breaking) bugfixes and people reverting them, both thinking they are on the right track

For me the example bug is very clearly wrong behaviour that is super unexpected for users and so far I see the potential breaking factor as pretty low. So IMHO a clear candidate for fix. The more obviously a given behaviour is unexpected the more I would be in favour of fixing it on the spot.

IF we wanted to keep bugs like this that means we need to:

document the existence in code and documentation very clearly
Make sure we have a way to prevent people needlessly adding bugfixes that we won’t merge (or even worse, revert)
Describe a workaround to archive the “correct” behaviour.

christianm · November 9, 2018, 1:15pm

I don’t necessarily agree with that but I would be fine to keep those bugs IF we find a way to follow the things I wrote above.

Besides having those bugs being annoying I feel it’s super hard otherwise for us and users to keep track if we don’t clearly communicate that.

Alternative might be a kidn of feature switch?

daniellienert · November 9, 2018, 1:18pm

Uhh If thats the alternative I go with a patch level fix.

aberl · November 9, 2018, 1:27pm

I’m with Bastian here. In most cases, this is a case-by-case decision in the end. But to have a baseline decision, I’m in favor of having reliable patch-level releases, like Daniel says. The reason is simple:

The user should always have the power of decision if he wants to invest time into adjusting his code base to new versions in order to gain bugfixes or new features.

But it should always be possible to just do patch-level updates without skimming through all release notes of all packages and dependent packages in order to decide, if a potentially breaking change inside is worth (or even possible) to be fixed! The patchlevel contract should be “Get access to bugfixes and security fixes that you don’t have to think about”.

If we just don’t apply the breaking fixes, but instead document them and move them to higher versions, this contract still holds: You apply a patchlevel release, nothing newly breaks. You then notice the (still unfixed) bug inside the package and look at the documentation/issues and see, that the fix exists, but in a higher minor (or major version). Then at that point you have the free decision to upgrade your version, or backport the fix yourself in adjust your code, or maybe just apply a workaround in your own codebase.

The base contract should be to empower the users of our products, not to dictate when they have to adjust their code that depends on ours.

kdambekalns · November 9, 2018, 1:35pm

So, this should have gone into the next minor, at least. We missed that, meaning the broken behaviour will stick with us for 3.3, 4.0 and 4.1. That makes it really bad, IMHO. If this would have been only for 3.3 and 4.0, different story…

Anyway, the PR that started this now contains documentation on the behviour and links to all related issues (as well as this very discussion). That should keep people from fixing this again and again. And in master it is fixed, so 4.2 will end this chapter…

aberl · November 9, 2018, 1:41pm

Alternative might be a kidn of feature switch?

That would not solve the core issue from my PoV, unless we only make the bugfix opt-in. I would only do this for really nasty bugs, that are in dire need of fixing, but can not be fixed b/c. For all other cases, the opt-in is to just do a minor/major upgrade and conciously take care of all possibly breaking changes.

IF we wanted to keep bugs like this that means we need to:

document the existence in code and documentation very clearly

Yes. Specifically, it should be documented in: The release notes of the version that does not contain the bugfix, the issue and PR that contains the bugfix, as well as the release notes of the version that contains the bugfix.

Make sure we have a way to prevent people needlessly adding bugfixes that we won’t merge (or even worse, revert)

This would be the review process. The assumption here is, that before a bugfix is applied the according issue is checked for duplication and hence the “will not fix for this version” decision be transported.

Describe a workaround to archive the “correct” behaviour.

Yes, if that is possible.

bwaidelich · November 9, 2018, 2:47pm

From the voting and the discussion it seems like it. But I think it’s mainly a misunderstanding (at least I hope so):

IMO it’s a very simple matter: Avoid breaking changes in patchlevel releases whenever possible (This excludes security relevant breaking changes ofc, but I think we never had one so far).

I think there is absolutely no question in that and it’s a shame that we implemented it incorrectly in the first place.
But it’s not like somebody would apply the patchlevel update and then go “aah, finally our website shows the right content”. Instead they might have been confronted with the issue and worked around this (which is quite easy in this case). And now the update might break the work around.

As @kdambekalns wrote: This should have been gone into the next minor. The PR was merged on the 5th of September, after the 4.1 release. So 4.2 is the next minor. Or am I mistaken?

kdambekalns · November 9, 2018, 3:40pm

It is the next minor. But still, leaves three broken minors behind…

christianm · November 11, 2018, 11:11am

I can totally see the argument for stability of patch level releases. It seems this is really a difficult issue, my argument was more about the length of time / amount of updates people have to do to get a bug fixed. I feel this kind of inconsistency and bug being around for a longer time also is bad for keeping people around. I guess we have two valid arguments here and it’s a matter of weighting them. I can totally follow people being against it.

Do we want to rephrase and restart the voting? I thought it’s clear but I agree it could be even clearer that it is about fixing it in bugfix releases.
Or just drop the voting, close the matter and decide case by case with an “impact analysis” (bug impact vs. break impact)?

kdambekalns · November 12, 2018, 9:13am

To me this one makes clear, that generally we agree that

fixing bugs in “breaking ways” should not happen in patch level releases. That means fixes whose underlying bugs were “non-obvious” in a way: no exceptions, it (mostly) seemed to work, maybe someone worked around the “weird” behaviour caused by them.
such fixes should come in the (next) minor instead.
the existence and the fixing of such bugs need to be documented accordingly.
exception to the rule: it’s a security issue, then even a breaking fix is fine.

For this particular bug it’s still a bit fuzzy, though:

the bug exists in 3.3, 4.0 and 4.1, we did a really bad job at fixing it fast
the behaviour is really broken, it’s just plain wrong.
(personally) I can’t see myself working around this in a way that would break now, with the proposed fix
but it’s in 3.3 LTS anyway, so maybe it’s ok to have it fixed in 4.2 only…

In the interest of no longer blocking the next patch level releases, I’d love to take a decision on this particular issue (https://github.com/neos/neos-development-collection/pull/2265).

We can then refine (my suggested) “general outcome” and put it somewhere for reference, if we feel like it.

bwaidelich · November 12, 2018, 10:35am

Personally I’d prefer not to have conversations like “For this kind of behavior you need version 4.1.8 of Neos”. But as you wrote: It’s probably very unlikely that it actually breaks things so I will no longer put my foot down.

But we should synchronize the two fixes, as currently we have:

public function isOfType($nodeType)
{
    if ($nodeType === $this->name) {
        return true;
    }
    $inheritanceChain = $this->buildInheritanceChain();
    return isset($inheritanceChain[$nodeType]);
}

(#2217)

vs

public function isOfType($nodeType)
{
    if ($nodeType === $this->name) {
        return true;
    }
    if (array_key_exists($nodeType, $this->declaredSuperTypes) && $this->declaredSuperTypes[$nodeType] === null) {
        return false;
    }
    foreach ($this->declaredSuperTypes as $superType) {
        if ($superType !== null && $superType->isOfType($nodeType) === true) {
            return true;
        }
    }
    return false;
}

(#2147)

The former rebuilds the inheritance chain each time it is called, e.g. multiple times in allowsChildNodeType()) which can be a real performance issue.

aberl · November 12, 2018, 4:40pm

Thanks for the recapture of the situation.

I can fully agree to the mentioned general agreement

For this particular bug it’s still a bit fuzzy, though:

the bug exists in 3.3, 4.0 and 4.1, we did a really bad job at fixing it fast

That’s a shame. But IMO that’s not a reason at all to push a breaky fix into a patch-level.

the behaviour is really broken , it’s just plain wrong.

That could be a reason to push a fix exceptionally.

But the main issue here, AFAIS, is, that the fix is already merged and the question is rather if it should be reverted now, right? So for me, the main indicator on how to proceed with the issue at hand is: How long, if at all, has the fix already been released? We shouldn’t revert a fix that has been around for longer than a couple days, otherwise we only make things worse in the end. If the fix has not yet been released, only merged, then I’d have to fully rely on

I can’t see myself working around this in a way that would break now

because I can not assess fully how (little) breaking this fix really is. I can only see, that it is breaking by changing return values of an API method.

So I retract from making a final decision on the questioned bugfix.

christianm · November 12, 2018, 5:56pm

I was about to write that there is no release, but alas Karsten did releases today or end of last week, so I am not so sure if it wasn’t in there. But according to

It seems not so.

I think the main problem for me is the fact that a bugfix for this was merged twice to 3.3 and now stood to be reverted the second time. Fixing an obviously wrong behaviour, for which I still don’t know the real breaking scenario. Which made this difficult for me to accept.

I think it’s fine to say “this behaves wrongly until Neos 3.3.x in which we finally fixed it.”

Anyway, it seems we found a solution for this incident. I guess we may just need to make sure that we don’t repeat this kind of round-robin merge/revert thingy.

mstruebing · November 12, 2018, 10:17pm

Ignoring the rest of the situation at hand now, reverting a bugfix which is breaky and fix a falsy behaviour in a future breaking bugfix release (which formerly fixed exactly this wrong behaviour) is even worse than any of the discussed scenarios. So, if it is released I think it should be dealed with now because every other scenario would be worse.

bwaidelich · November 13, 2018, 9:02am

It is not and the “falsey” fix was reverted.

So to avoid any more confusion:
Right now there is no release with the fixed behavior, but it has been fixed in master so it will be in the next minor release 4.2.

Now we could backport the original fix (#2147) to the 3.3 branch in order to include it into the next patchlevel releases (3.3.16, 4.0.10 & 4.1.8)

SoulCover · November 13, 2018, 10:01am

At first I was all for a patch level fix because I was pretty sure that there shouldn’t be any workaround breaking with the fix - and since the behaviour is “obviously” wrong, anyone aware of that wrong behaviour should have reported it instead of relying on it…

But your argument is very solid. Patch level releases should never be breaking because they should “just be installed” without much thought. Even fixes like this one will weaken the trust in patch level releases and will be preventing some users from applying them in the future.

For anyone wanting to backport or implement such fixes right away, I really can recommend using cweagans/composer-patches.

bwaidelich · November 13, 2018, 2:17pm

I talked to Christian again and we both agree that

It is fine not to backport the fix to a patchlevel release in this case
We’ll have to be conscious about similar breaking changes in the future to avoid confusion

If noone objects I would consider this case closed