Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
Gitea is a magnet for search engines, which once they find an instance are very happy to follow all the links on the site, of which there are many, resulting in never ending indexer bot traffic. Among the links followed are UI buttons (star a page, sort by XYZ, select a UI language...), as well as pages that are expensive to render, but don't provide much value once indexed (blame, compare, commit, ...).
Ideally, these would not be (attempted to be) indexed.
I tried to accomplish this on my site via a robots.txt along the following lines, but was not exactly successful, probably because many bots don't understand the wildcard syntax:
A better approach would be to render most links with the rel="nofollow" attribute. I'd argue this could be applied to all links, except for links to
landingpage
user / org
repo
issue(s) / pr(s) / release(s) / wiki / yougettheidea..
Screenshots
No response
The text was updated successfully, but these errors were encountered:
noerw
changed the title
Mark most UI links & buttons as rel="nofollow" to avoid search engine
Mark most UI links & buttons as rel="nofollow" to avoid constant bot traffic
Oct 17, 2021
Feature Description
Gitea is a magnet for search engines, which once they find an instance are very happy to follow all the links on the site, of which there are many, resulting in never ending indexer bot traffic. Among the links followed are UI buttons (star a page, sort by XYZ, select a UI language...), as well as pages that are expensive to render, but don't provide much value once indexed (blame, compare, commit, ...).
Ideally, these would not be (attempted to be) indexed.
I tried to accomplish this on my site via a
robots.txt
along the following lines, but was not exactly successful, probably because many bots don't understand the wildcard syntax:A better approach would be to render most links with the
rel="nofollow"
attribute. I'd argue this could be applied to all links, except for links toScreenshots
No response
The text was updated successfully, but these errors were encountered: