Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No explicit header set for the DOWNLOAD_SESSION #483

Open
rtibbles opened this issue Mar 13, 2024 · 3 comments
Open

No explicit header set for the DOWNLOAD_SESSION #483

rtibbles opened this issue Mar 13, 2024 · 3 comments

Comments

@rtibbles
Copy link
Member

rtibbles commented Mar 13, 2024

Observed behavior

The DOWNLOAD_SESSION that is used to download resources sets no explicit header - this proves to be an issue, for example, when downloading from wikimedia sites, because of their User Agent policy: https://meta.wikimedia.org/wiki/User-Agent_policy

Expected behavior

Ideally, we would follow the kind of User-Agent that the wikimedia policy spells out - we already retrieve the email for the user whose API token we are running with from Studio, so we should reuse this to set the header.

With that in place, we would then do the following for the User Agent:

f"Ricecooker/{ricecooker.__version__} bot ({user_email})"

User-facing consequences

Attempts to scrape without setting these headers may be treated as malicious.

Steps to reproduce

Attempt to download any file from wikimedia

Context

Ricecooker develop branch

@nikkuAg
Copy link

nikkuAg commented Mar 26, 2024

Hey, is this issue still open? I would like to work on this

@rtibbles
Copy link
Member Author

Absolutely @nikkuAg - I will assign you, thanks for volunteering!

@MisRob
Copy link
Member

MisRob commented Jun 11, 2024

Hi @nikkuAg, are you still planning to work on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants