Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove R and Python installations from the base product images and use docker ONBUILD instructions #791

Open
dbkegley opened this issue Jun 3, 2024 · 0 comments

Comments

@dbkegley
Copy link
Contributor

dbkegley commented Jun 3, 2024

Our product images currently support a matrix of Python and R versions. This matrix is also reflected in our docker image tags which has a number of shortcomings:

  1. The matrix of possible installation versions (and thus the number of docker image tags) is huge
  2. Quarto (and now Tensorflow) versions aren't represented in the docker image tagging strategy which is confusing and inconsistent
  3. Changing the default R/Python installations breaks customer environments during upgrades; this is especially problematic for Python because Connect does not discover the available Python installations at startup so they must be enumerated in the default config file which means that a Connect version is unnecessarily coupled to a specific version(s) of Python.

I'd like to propose that we stop installing R and Python in our default product images. Instead, we should build slim base product images w/o any Python/R/Quarto/Tensorflow installations. These base product images should include ONBUILD instructions which install R/Python/Quarto/Tensorflow using the installer scripts that are baked into the base images. For example:

FROM ubuntu:22.04

...

### Install R versions ###
ONBUILD RUN R_VERSION=${R_VERSION} ${SCRIPTS_DIR}/install_r.sh \
    && R_VERSION=${R_VERSION_ALT} ${SCRIPTS_DIR}/install_r.sh \
    && ln -s /opt/R/${R_VERSION} /opt/R/default \
    && ln -s /opt/R/default/bin/R /usr/local/bin/R \
    && ln -s /opt/R/default/bin/Rscript /usr/local/bin/Rscript

Our base images are then published without R/Python version tagging, giving us a much simpler tagging/release strategy:

This image tagging complexity is really only relevant for the connect-content images, but this ON BUILD strategy can be adopted there as well.

ghcr.io/rstudio/rstudio-connect:jammy-2024.05.0

The obvious downside of this approach is that customers who who want to use our images out of the box would be required to build a derivative image using our image as a base. Simply:

FROM ghcr.io/rstudio/rstudio-connect:jammy-2024.05.0
docker build . --build-arg R_VERSION=4.4.0 -t myorg/rstudio-connect:jammy-2024.0.5.0-R4.4.0

To alleviate this, Posit could publish a single derivative image for each product release with the latest version of R/Python already installed. The version of the product image with R/Python installed is intended to be a convenience serving as a starting point but we should document that it is not backwards compatible with each release. If the customer requires a stable version of R/Python, then they must build a derivative image for each product release. An alternative to using the latest R/Python version for each release might be to use a stable version of R/Python for 3 releases before upgrading to the newest version.

As things stand today, we are reluctant to upgrade the default R/Python in our base images because we know it breaks customer environments. Our preference would be for customers to decide when to do version upgrades in a way that is decoupled for Connect version upgrades.

@dbkegley dbkegley changed the title Proposal: Remove R and Python installations from the base product images and use docker ON BUILD instructions Proposal: Remove R and Python installations from the base product images and use docker ONBUILD instructions Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant