Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Website "downloads page" returns binary data #2411

Open
maltfield opened this issue Mar 16, 2024 · 4 comments
Open

Python Website "downloads page" returns binary data #2411

maltfield opened this issue Mar 16, 2024 · 4 comments
Labels
bug This is a bug! help-wanted The maintainers would welcome help with this issue

Comments

@maltfield
Copy link

maltfield commented Mar 16, 2024

Describe the bug

When attempting to curl or wget the downloads page, the web server returns binary data

To Reproduce

Execute either of the following commands in Debian Linux

curl --location 'https://www.python.org/downloads/'
wget 'https://www.python.org/downloads/'

Example execution:

user@disp897:/tmp/tmp.aQ3uHh4PqB$ curl --location 'https://www.python.org/downloads'
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
user@disp897:/tmp/tmp.aQ3uHh4PqB$

user@disp897:/tmp/tmp.aQ3uHh4PqB$ wget 'https://www.python.org/downloads/'
--2024-03-15 19:17:59--  https://www.python.org/downloads/
Resolving www.python.org (www.python.org)... 199.232.16.223, 2a04:4e42:41::223
Connecting to www.python.org (www.python.org)|199.232.16.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19113 (19K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  18.67K  --.-KB/s    in 0.05s   

2024-03-15 19:18:00 (384 KB/s) - ‘index.html’ saved [19113/19113]

user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

user@disp897:/tmp/tmp.aQ3uHh4PqB$ head -c256 index.html 
�}�r�F����*�CS�5����|�,;�؎'r���M�@$a����o���������'���ƥ�$(R�@�rD��s���ލ�?[�^/m6
                                                                               ����t&l��g���1vD��97���z��s�.�;v_|
                                    �ǰƯ��?m�r&������e=pۓp-�����]���J��u�߭�r��L��h�567��q�vk�r���<�^�\y����mX����:{�yӹ�Bc�O��1x�user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

Expected behavior
The pyhon.org webserver(s) should return HTML

@maltfield
Copy link
Author

maltfield commented Mar 16, 2024

As a workaround, adding the --compressed argument to curl fetches the HTML as-desired

user@disp897:/tmp/tmp.aQ3uHh4PqB$ curl --location --compressed 'https://www.python.org/downloads/' | head
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
    <script>
 41 19113   41  8007    0     0   3748      0  0:00:05  0:00:02  0:00:03  3748
curl: (23) Failure writing output to destination
user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

And setting --compression=gzip in wget is a workaround too

user@disp897:/tmp/tmp.aQ3uHh4PqB$ wget --compression=gzip 'https://www.python.org/downloads/'
--2024-03-15 19:22:27--  https://www.python.org/downloads/
Resolving www.python.org (www.python.org)... 199.232.16.223, 2a04:4e42:41::223
Connecting to www.python.org (www.python.org)|199.232.16.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19113 (19K) [text/html]
Saving to: ‘index.html’

index.html          100%[===================>]  18.67K  80.3KB/s    in 0.2s    

2024-03-15 19:22:29 (80.3 KB/s) - ‘index.html’ saved [174854]

user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

user@disp897:/tmp/tmp.aQ3uHh4PqB$ head index.html 
<!doctype html>
<!--[if lt IE 7]>   <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9">   <![endif]-->
<!--[if IE 7]>      <html class="no-js ie7 lt-ie8 lt-ie9">          <![endif]-->
<!--[if IE 8]>      <html class="no-js ie8 lt-ie9">                 <![endif]-->
<!--[if gt IE 8]><!--><html class="no-js" lang="en" dir="ltr">  <!--<![endif]-->

<head>
    <!-- Google tag (gtag.js) -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=G-TF35YF9CVH"></script>
    <script>
user@disp897:/tmp/tmp.aQ3uHh4PqB$ 

@hugovk
Copy link
Member

hugovk commented Sep 5, 2024

Are you trying to scrape the page? What info exactly are you after? Perhaps there's a better place to fetch that from.

For example, you can also find downloads at https://www.python.org/ftp/python/

@maltfield
Copy link
Author

I was programmatically downloading the GPG keys listed on that page for 3TOFU, yes.

Why shouldn't this bug be fixed?

@hugovk
Copy link
Member

hugovk commented Sep 7, 2024

I'm not saying it shouldn't be fixed, but I can't say when or if that will happen.

Anyway, I think that is the only source for GPG keys, so it's good you have a workaround for now.

@JacobCoffee JacobCoffee added bug This is a bug! help-wanted The maintainers would welcome help with this issue labels Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This is a bug! help-wanted The maintainers would welcome help with this issue
Projects
None yet
Development

No branches or pull requests

3 participants