You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DOWNLOAD_SESSION that is used to download resources sets no explicit header - this proves to be an issue, for example, when downloading from wikimedia sites, because of their User Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy
Expected behavior
Ideally, we would follow the kind of User-Agent that the wikimedia policy spells out - we already retrieve the email for the user whose API token we are running with from Studio, so we should reuse this to set the header.
With that in place, we would then do the following for the User Agent:
Observed behavior
The DOWNLOAD_SESSION that is used to download resources sets no explicit header - this proves to be an issue, for example, when downloading from wikimedia sites, because of their User Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy
Expected behavior
Ideally, we would follow the kind of User-Agent that the wikimedia policy spells out - we already retrieve the email for the user whose API token we are running with from Studio, so we should reuse this to set the header.
With that in place, we would then do the following for the User Agent:
User-facing consequences
Attempts to scrape without setting these headers may be treated as malicious.
Steps to reproduce
Attempt to download any file from wikimedia
Context
Ricecooker develop branch
The text was updated successfully, but these errors were encountered: