Best Practice: Broken Link Checker

Hi :slight_smile:,

I have to build a backend module, which gives the editor an overview of all broken links (404, 500, etc) of the whole site.

I currently see two possible approaches to achieve this:

  1. Use a crawler of some sort (like https://github.com/spatie/crawler) to crawl the whole site for dead links.
    The problem i have with this approach is, that it takes the crawler quite some time to finish crawling every link (because it has to wait for every response to check if there are more links to crawl).

  2. Build up a Database-Table with all links on the site and curl them one by one. I kinda like the idea, but it is probably difficult to keep the table up to date.

Are there any other approaches?
Which one would you recommend?

Thank you :blush:
Torben

Hi,

you could query the content repository for all nodes with node uris or external urls in certain properties. Afterwards you can check the node uris by just looking if a node with the given id exists and is not hidden. The external urls have to be crawled of course.

That should be much faster and would be an interesting community package.

Best,
Seb

I think, @markusguenther has done something like this…

Yes indeed, we thought in the past there is already something like this but it wasn‘t and I had not much time to build one. I went the spatie way to save time.

Always had in mind to polish the package and put more effort in this. But the time was always limited.

I have the package on private but can publish that for you.

Had plans to rewrite the backend module with react and add some features.
Would be nice to join forces.

I thought about something like this too. I am a bit worried though, that this would run into the same issues as the asset usage counter in the media browser (Scaling problems when calculating asset usage).
What do you think?

That is very kind! :slight_smile: Thank you very much :heart:
This definitely is a step into the right direction.

Would be nice indeed! Would you create a new package, or may I create pull requests into your package?
I also have time to work on this at my workplace, so if you have anything particular in mind, let me know :wink:

Don’t know if I find time to write down something this week. But feel free create Pull Requests in the Unikka package. If you want I can also invite you as collaborator.

We could also have a short chat on slack or what ever tool you want :slight_smile:

Disclamer
The package was created under little time pressure so it is not a bullet proof solution right now.
But maybe something we can start with.