Development workflow for GitHub

Until the move to GitHub we used Gerrit and it’s clearly defined review process for development. With the move to GitHub that has changed, but right now there is no clear decision yet as how we want to continue.

While the overall workflow shouldn’t be a problem (fork, hack, create pull request), there are some questions to answer. To do that, we’d like to draw from your combined knowledge and preferences.

Development workflow - what is clear so far

  • Changes against packages are done in forks.
  • For packages that are part of monolithic repositores (“development collections” as we call them), pull requests need to be done against those, not against the individual (read-only) package repositories.
  • Merges can be done by any member of the Neos organisation on GitHub (which should be everyone in the Neos team)
  • To be considered merge-ready, a change must pass the existing tests and have been reviewed by some people. Basically the rules we had when using Gerrit still apply, we want clean and working code.
  • As soon as something is merged, the read-only repositories are updated automatically (GitHub webhook to our Jenkins)
  • Commit messages should still follow the established rules we have now

Q: Branch maintenance

Here comes the first open question… How do we maintain branches?

Until now we used a Releases footer in commit messages to record where a change needs to be applied (e.g. Releases: master, 2.0, 2.1). We then cherry-picked those changes to all those branches as needed, either using the corresponding button in Gerrit or the command line (in case of conflicts).

This was easy to do, but has the drawback that for keeping an overview of the status of those changes, we need to rely on Gerrit (it has “included in” information for changes) and the “merged changes overviews” for Flow and Neos. And it isn’t really an established way to maintain branches in the Git world.

An alternative would be to create any bugfix or change that needs to go into an older branch for the “lowest” (oldest) maintained branch that is affected. As bugs are fixed in those branches, we can do (occasional) merges into the newer branches and a) be sure changes are merged and b) clearly see the state of affairs by looking at the Git tree.

Q: Pull request creation & handling

Depending on the decision for branch handling we know how to create pull requests (oldest branch first or master first). When creating a pull request, in the easiest scenario it is just one commit. In this case, write the commit message following the established rules, since it will become the message on the pull request (and the commit message of the resulting merge commit). If more than one commit are in one pull request, make sure to give the pull request a message that follows the established rules - that could be done by copying the message of the first commit, or by writing a proper one from scratch.

There is one more decision to take, though: how to merge pull requests.

If a PR is just a single commit, we can of course just merge via the GitHub UI. In case a pull request went through a more or less intensive review and has some commits to fix things, we have options to choose from:

  • One is to use the GitHub UI as well and keep that history (if nothing else, it can serve as an example of how we work and what makes a good pull request).
  • Another option is to squash during merge. This keeps history more linear and can “hide stupid things that had to be fixed”, but might be more work.
  • The third option is to amend as we used to do with Gerrit and force push the branch the PR is created for. Testing has shown this works fine with comments on pull requests etc. Not the “pure form” of working with git, but might be some “middle ground”.

All that being said: I am looking forward to your input!

1 Like

As said in chat earlier I am strongly against squashing as it will diminish work done by contributors.
I would rather keep intermediate commits and focus only on the merge commit message for the change log. Everything in between SHOULD have a nice commit message but doesn’t necessarily need one.

1 Like

Did you consider usage of labels and milestones?

Personally I’m pro a clean history and high quality commits over tons of crappy commit messages (which is the what most people do). So I’d suggest using rebase & squash to improve commits before merging, however squash should only be used when it makes sense and that commit has exactly the same releases footer.

Not sure how this effects the unreliable Github statistics, but I think the commit log is more important than a those Github stats. Does setting a different commit author work for Github stats? Also it’s possible for people to improve their commit message and update the PR if they care about those statistics.

I know this adds a barrier for committing, so we should combine it with being more forgiving about commit messages as @christianm suggests, but there is a limit IMO.

I doubt a merge from lower is possible, since there are commits that will only be targeted for those branches and shouldn’t be forward ported. So I guess cherry-picking is the only way.

1 Like

Labels and milestones? So far I haven’t thought about them, since we’ll stick to JIRA for issues so… Any input welcome!

I’m at a loss what milestones are about, but would love to use “releases” on GitHub. A test I created is - but it feels weird (for me) to have those releases on the collection, whereas the individual packages have no “real” releases. And just using them for binaries in the “neos-project” repository doesn’t work, since a release must correspond to a tag.

That brings me to which might be a useful tool. It relies on format being used and helps preparing release notes for releases on GH. I am a beta user, so we could try it out (if I find the beta login page again…)

I added a third option to my post, which is force pushing to pull requests. That can be used to fix tiny and/or stupid things and would be a mixture between “as is” and “squash”. Might help to achieve the goal of making it easy but still have a nice git log. To which, for me, the merge messages are key.

Of course I’d like to not see “single-line, two word” commit messages all over the place. I’d ask people to describe wat they fixed in a “patch set commit” the same way I expected them to comment on a new patch set (instead of just expecting the reviewers to check the diff).

How do we go about deciding this then? Doodle?

I can only say that I am STRONGLY against everything modifying the history (squash or force push).
Gerrit did it that way and it was fine but now that we are not bound to do that I want my history to be as original as possible. Everyone doing a pull request can freely decide to squash if they wanted to do that and then reissue a pull request. I think that’s fine but I wouldn’t make that a required part of our workflow.
This is the chance to actually see that 3 people worked on a bigger change and who to ask for a specific line if needed.
Also for commit messages, as stated earlier there should be a soft rule about good commit messages but enforcement only on merge commits. Makes contributions so much easier IMHO. Yes by squash/force push we could still have it easy for contributors and make everything nice and shiny afterwards but it doesn’t reflect the actual change.
Sorry but when new contributor X chnaged something in the docs I want to know that it was them (for good and bad) instead of eg. Aske who maybe fixed the commit message and squashed 5 documentation changes into one…

1 Like

I agree with @christianm here, but do like clean history too… I’ve been working with a PR approach over the last months using bitbucket, so I think it’s safe to assume the following:

  • if a PR consists of multiple commit messages done by 1 committer it’s fine to squash (no contributer data lost)
  • if the merger does some minor changes to get the change merged it’s fine to squash too (the oldest committer will stay the author, the commit of the merger is ‘lost’ for contributor data, but who cares :stuck_out_tongue: )
  • if a PR contains heavy work from multiple people a commit can be squashed into multiple commits grouping the changes of the different contributors so we don’t loose contribution data and still have a sane history

Reason to care about the contribution data is not just stats by the way. I find that pretty disrespectful to the contributor… It’s also about copyright, appreciation and respect :slight_smile:

(edit: in the above I don’t mention I like a meaningful history, so all possible rewriting should be functional and clear!)

Regarding the branches: can we use the target version’s of jira maybe? If I got it right the atlassian toolset has support to see in what branches a certain change is merged…

What @radmiraal suggests would be fine for me :smile:

Quick in-between summary: We seem to agree on the importance of a meaningful history and on the fact that through taking care of the merge commit messages we can achieve that. What happens “in between” can be handled with common sense. Awesome! :smile:

Now, anyone up for actually trying out some cherry-pick vs. merge operations on some throwaway fork?

I think cherry-pick has preference for back-porting reasons, so if we merge a PR into master and start backporting it should be done by cherry-pick.

If PR’s are merged or cherry-picked is equal to me, just don’t know what github does with the PR so that indeed needs to be tested. I noticed that bitbucket and stash both have a pretty bad experience for PR’s and manual merges of the changes… They do not detect the merge then, leaving the PR empty but open… And then the only way to close the PR is by pressing “Decline” which I completely dislike because of the negative tone to the contributor :wink:

One question I’d like to ask is how to collaborate on PRs, since that happens quite often. Do I apply the PR on my fork, add additional commits and push a new PR or can you update an existing PR from someone else?

It would be pretty cumbersome if that would be difficult to do and keep track of.

I would like to forward merge and think that stuff only merged to a specific branch is rather rare. Usually only if something was forgotten or specific conditions apply because it’s an older branch, but then we can solve this during the merge up. I guess we first need a try on how easy it would be to merge the current state so that this workflow becomes available…

There are circumstances in which this is possible. I think for example if yuo have (write) access to the original repo you can push to a PR branch of someone else too.

I think we again run into a common sense issue here :wink: If a group of people start to work on a task together they should decide to move the branch to 1 central place where all contributors working on the change have access, or just give the right permissions… Can imagine that we have some temporary branches in the main repo for such tasks if they’re really big, but preferably people just find a way to work together on a fork.

If someone already made a PR you can (afaik) not update it unless you have write access to the original branch. So if you take over a PR someone else made you’ve 2 choices (I think):

  • make a PR to the branch of the fork of the PR creator
  • fork the PR creators repo and create a new PR from your fork to master (but that will make the original PR obsolete)
1 Like

For reference, here are the guidelines by the Symfony Project for how to submit a patch:

A brief overview of the core team and pull request procedure is outlined here:

And in general, I do like the structure of the overall “Contributing” section, including the section about contributing code.

Maybe a good inspiration for our own Contributing chapter?

Yes, besides that their contributing guideline is much too long and contains too many rules. It’s off putting.

Just a quick info here regarding working together on some change. if i remember correctly github itself doesn’t really do much forking in itself.
almost all stuff is done in custom feature branches in the original repository itself.

the pull-requests are then done “repository internal” between the feature branch and master. That would have the benefit, that any core member can jump in and participate in tasks of another core member. This wouldn’t solve this issue for non repository members of course.

Yes, that would be an option, but you need to rely on everyone not pushing to master (by accident).

GitLab with the approach of locked branches that require to have PRs to be changed solves that nicely, but on GH you have direct push access to all branches, if you have access at all.

1 Like

yea, that would be an issue of course. although i could imagine using a github post-receive webhook or something to “revert” such commits and send an e-mail with bleeding red letters to whoever tried to push to master ^^