Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs for partitioned backfills on sling runs #24234

Open
nrsmac opened this issue Sep 4, 2024 · 1 comment
Open

Docs for partitioned backfills on sling runs #24234

nrsmac opened this issue Sep 4, 2024 · 1 comment
Labels
area: docs Related to documentation in general area: partitions Related to Partitions integration: embedded-elt Related to dagster-embedded-elt which uses Sling and data Load Tool (dlt)

Comments

@nrsmac
Copy link

nrsmac commented Sep 4, 2024

What's the issue or suggestion?

There isn't a clear documented way to use partitions in Sling. I see I can provide a partitions_def but how do those values pass to Sling for a backfill?
https://docs.dagster.io/_apidocs/libraries/dagster-embedded-elt#sling-dagster-embedded-elt-sling

My defined asset:

from dagster_embedded_elt.sling import SlingResource, sling_assets
from dagster import file_relative_path
from partitions import daily_partitions_def

replication_config = file_relative_path(__file__, "../resources/replication.yml")

@sling_assets(replication_config=replication_config, partitions_def=daily_partitions_def)
def sling_assets(context, sling: SlingResource):
    yield from sling.replicate(context=context)  # Tried passing as kwargs here...
    for row in sling.stream_raw_logs():
        context.log.info(row)

In the Sling documentation, it gives an example of passing environment variables to Sling
https://docs.slingdata.io/sling-cli/run/configuration/variables.

replication.yml:

source: SQLDB
target: DUCKDB

defaults:
  mode: backfill
  object: "{stream_schema}_{stream_table}"
  source_options:
    empty_as_null: false
  target_options:
    column_casing: snake
streams:
  example.stream:
    object: example.object
    primary_key: pk
    update_key: start_time
    source_options:
      limit: 1000
      #range: 2024-07-01,2024-07-02
      range:${START_DATE},${END_DATE}   # How do I get partition keys to populate here?
env:
  SLING_LOADED_AT_COLUMN: true
  SLING_STREAM_URL_COLUMN: true
  start_date: '${START_DATE}'  # In the case of using envvars, but I want the partition keys from the execution context here.
  end_date: '${END_DATE}' 

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@nrsmac nrsmac added the area: docs Related to documentation in general label Sep 4, 2024
@garethbrickman garethbrickman added integration: embedded-elt Related to dagster-embedded-elt which uses Sling and data Load Tool (dlt) area: partitions Related to Partitions labels Sep 5, 2024
@cmpadden
Copy link
Contributor

Hi @nrsmac - @nicklausroach and I are going to explore this, and plan to update the documentation accordingly.

The replicate method passes the environment variables to the Sling subprocess. So one possible solution is to set the environment variables from the partition key. For example:

@sling_assets(
    replication_config=config_dir / "example.yaml",
    dagster_sling_translator=CustomSlingTranslatorMain(),
    partitions_def=DailyPartitionsDefinition(start_date=datetime.now()),
)
def example_sling_assets(context, embedded_elt: SlingResource):
    start_date = context.partition_key
    os.environ['START_DATE'] = start_date
    os.environ['END_DATE'] = start_date + timedelta(days=1)
    yield from embedded_elt.replicate(context=context)

Will keep you posted as we update docs. Please let me know if you make any progress yourself. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: docs Related to documentation in general area: partitions Related to Partitions integration: embedded-elt Related to dagster-embedded-elt which uses Sling and data Load Tool (dlt)
Projects
None yet
Development

No branches or pull requests

3 participants