Automating Dead Link Detection
A dead link, or broken link, occurs when a hyperlink points to a web page that has been removed or does not exist anymore. Beyond mere inconvenience, these dead links can significantly degrade the user experience, harm your website’s SEO, and introduce security vulnerabilities. For instance:
- Phishing Risks: When a domain expires, it can be purchased by malicious entities who might set up fake pages to capture sensitive information from users expecting a legitimate site.
- Malware Distribution: Sometimes, if a linked resource is taken down, a similar named malicious site might take its place, potentially leading unsuspecting visitors to download malware.
- Loss of Trust: Frequent encounters with dead links can erode user trust in your content, suggesting neglect or outdated information, which might be exploited by attackers to lure users towards harmful alternatives.
Therefore, maintaining link integrity not only preserves your site’s professionalism and usability but also plays a critical role in safeguarding your visitors from potential security threats.
What is Deadfinder?
Deadfinder is a versatile tool designed to help webmasters and bloggers keep their site’s integrity intact by finding dead links. By the way, this tool was created by me to manage my own website.
And Deadfinder supports GitHub Actions, allowing you to easily run it with the following workflow code:
steps:
- name: Run DeadFinder
uses: hahwul/deadfinder@1.5.0
# or uses: hahwul/deadfinder@latest
id: broken-link
with:
command: sitemap # url / file / sitemap
target: https://www.hahwul.com/sitemap.xml
# timeout: 10
# concurrency: 50
# silent: false
# headers: "X-API-Key: 123444"
# worker_headers: "User-Agent: Deadfinder Bot"
# include30x: false
# user_agent: "Apple"
# proxy: "http://localhost:8070"
- name: Output Handling
run: echo '${{ steps.broken-link.outputs.output }}'
Integrating Deadfinder with GitHub Actions
GitHub Actions is a CI/CD (Continuous Integration/Continuous Deployment) tool provided by GitHub, which allows you to automate your software development workflows directly in your repository. Here’s how you can leverage GitHub Actions to automate dead link detection with Deadfinder:
---
name: DeadLink
on:
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Find Broken Link
uses: hahwul/deadfinder@1.5.0
id: broken-link
with:
command: sitemap
target: https://hahwul.com/sitemap.xml
- name: Create Markdown Table from JSON
id: create-markdown-table
run: |
echo "## DeadLink Report" > deadlink_report.md
echo "" >> deadlink_report.md
echo "| Target URL | Deadlink |" >> deadlink_report.md
echo "|------------|------------|" >> deadlink_report.md
echo '${{ steps.broken-link.outputs.output }}' | jq -r 'to_entries[] | .key as $k | .value[] | "| \($k) | \(.) |"' >> deadlink_report.md
- name: Read Markdown Table from File
id: read-markdown-table
run: |
table_content=$(cat deadlink_report.md)
echo "TABLE_CONTENT<<EOF" >> $GITHUB_ENV
echo "$table_content" >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name: Create an issue
uses: dacbd/create-issue-action@main
with:
token: ${{ secrets.GITHUB_TOKEN }}
title: DeadLink Issue
body: ${{ env.TABLE_CONTENT }}
This code runs Deadfinder according to specified conditions to identify dead links, converts them into Markdown format, and then posts them as a GitHub issue so that users can be aware of them. This is also applied to the site you’re looking at now, and since there are many articles, I run it periodically to remove dead links.
Github workflow history
Found and reported deadlinks!
Conclusion
You can easily enhance the quality and security of your site with this straightforward method. Give it a try and manage your site using this technique :)