Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
At the moment, ItemLoader(response=response) fails if response is not a TextResponse instance.
Passing a binary response can still be useful, though. For example, to allow processors to access the response from their loader context, and hence be able to report the source URL (response.url) when reporting input issues.
Unless I missed something, the documentation doesn't explain how to query document metadata (searching "site:montferret.dev metadata" through Google returned nothing, neither did grepping the source code).
As an example, I tried to query the og:url metadata.
I tried variations of //meta[property='og:url']::attr(content), with or without the leading //, and with or without the `attr(conte
At the moment,
ItemLoader(response=response)
fails ifresponse
is not aTextResponse
instance.Passing a binary response can still be useful, though. For example, to allow processors to access the response from their loader context, and hence be able to report the source URL (
response.url
) when reporting input issues.