Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when run inside module #105

Open
gbruer15 opened this issue Jul 10, 2024 · 10 comments
Open

Failure when run inside module #105

gbruer15 opened this issue Jul 10, 2024 · 10 comments

Comments

@gbruer15
Copy link

I am trying to run some parallel code defined inside a module via include_string, and I can't quite figure out how it works or how it's supposed to work.

It seems like a bug for such a simple case to not work, so I'm making an issue here. But it may be something that can't be fixed with Julia's current module system.

Main question: Is there a way to use a sandbox module without breaking Distributed.jl?

Short failing code

A simple remote call works when run in module Main:

using Distributed
worker_ids = addprocs(2)
remotecall_wait(() -> println("I want to run this remotely"), 2)

But doesn't work when in a different module:

module sandbox
using Distributed
worker_ids = addprocs(2)
remotecall_wait(() -> println("I want to run this remotely"), 2)
end
ERROR: On worker 2:
UndefVarError: sandbox not defined
Stacktrace:
  [1] deserialize_module
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:996
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:895
  [3] deserialize

Test scripts

I made a test script to figure out how to work with this. I run the script in a sandbox module and in the Main module.

Driver script mfe_driver.jl.

filename = ARGS[1]
include_string(Module(:sandbox_mod), read(filename, String), filename)

Escape-to-main method: mfe.jl.

The second is a script that escapes the sandbox module to get the function to run remotely: mfe.jl.

mfe.jl
using Distributed

worker_ids = addprocs(2)
try
    println("================== Main process - try to run remotely")
    try
        remotecall_wait(() -> println("I want to run this remotely"), 2)
    catch e
        showerror(stdout, e)
    end
    println()
    println()
    println("================== Main process - define and run function locally")
    function I_want_to_run_this_remotely()
        println("I want to run this remotely")
    end
    I_want_to_run_this_remotely()
    println()
    println()
    println("================== Main process - run remotely")
    try
        remotecall_wait(I_want_to_run_this_remotely, 2)
    catch e
        showerror(stdout, e)
    end
    println()
    println()
    println("================== Main process - define function in local Main and run locally")
    @everywhere [1] begin
        using Distributed
        println(" process $(myid()) has module $(@__MODULE__)")
        function I_want_to_run_this_remotely()
            println("I want to run this remotely")
        end
    end
    Main.I_want_to_run_this_remotely()
    println()
    println()
    println("================== Main process - run remotely")
    try
        remotecall_wait(Main.I_want_to_run_this_remotely, 2)
    catch e
        showerror(stdout, e)
    end
    println()
    println()
    println("================== Main process - define function in everyone's Main")
    @everywhere begin
        using Distributed
        println(" process $(myid()) has module $(@__MODULE__)")
        function I_want_to_run_this_remotely()
            println("I want to run this remotely")
        end
    end
    println()
    println()
    println("================== Main process - run remotely")
    try
        remotecall_wait(Main.I_want_to_run_this_remotely, 2)
    catch e
        showerror(stdout, e)
    end
finally
    rmprocs(worker_ids)
end
  • In the non-sandboxed case, remoteprocess_call works as expected.
  • The only case that fails is when the function is defined on the driver process using @everywhere. And I don't understand how that is different from defining the function on the driver process without @everywhere.
Output of "julia mfe.jl"
================== Main process - try to run remotely
      From worker 2:	I want to run this remotely


================== Main process - define and run function locally
I want to run this remotely


================== Main process - run remotely
      From worker 2:	I want to run this remotely


================== Main process - define function in local Main and run locally
 process 1 has module Main
I want to run this remotely


================== Main process - run remotely
On worker 2:
UndefVarError: #I_want_to_run_this_remotely not defined
Stacktrace:
  [1] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1364
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [5] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [6] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
  [7] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
  [8] invokelatest
    @ ./essentials.jl:726 [inlined]
  [9] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [10] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [11] #103
    @ ./task.jl:484

================== Main process - define function in everyone's Main
 process 1 has module Main
      From worker 2:	 process 2 has module Main
      From worker 3:	 process 3 has module Main


================== Main process - run remotely
      From worker 2:	I want to run this remotely
  • In the sandboxed case, the remotecall_wait works only if the function is defined in Main for each process.
Output of "julia mfe_driver.jl mfe.jl"
================== Main process - try to run remotely
On worker 2:
UndefVarError: sandbox_mod not defined
Stacktrace:
  [1] deserialize_module
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:996
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:895
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1363
  [5] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [6] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [7] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [8] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [9] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
 [10] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [11] invokelatest
    @ ./essentials.jl:726 [inlined]
 [12] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [13] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [14] #103
    @ ./task.jl:484

================== Main process - define and run function locally
I want to run this remotely


================== Main process - run remotely
On worker 2:
UndefVarError: sandbox_mod not defined
Stacktrace:
  [1] deserialize_module
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:996
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:895
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1363
  [5] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [6] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [7] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [8] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [9] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
 [10] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [11] invokelatest
    @ ./essentials.jl:726 [inlined]
 [12] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [13] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [14] #103
    @ ./task.jl:484

================== Main process - define function in local Main and run locally
 process 1 has module Main
I want to run this remotely


================== Main process - run remotely
On worker 2:
UndefVarError: #I_want_to_run_this_remotely not defined
Stacktrace:
  [1] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1364
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [5] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [6] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
  [7] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
  [8] invokelatest
    @ ./essentials.jl:726 [inlined]
  [9] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [10] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [11] #103
    @ ./task.jl:484

================== Main process - define function in everyone's Main
 process 1 has module Main
      From worker 3:	 process 3 has module Main
      From worker 2:	 process 2 has module Main


================== Main process - run remotely
      From worker 2:	I want to run this remotely

Define-sandbox-everywhere method: mfe_define_module.jl

I don't want to escape the sandbox to define things in the Main scope, so I tried to work within the sandbox.

This file tries to define the sandbox module for the worker processes.

mfe_define_module.jl
using Distributed

worker_ids = addprocs(2)
try
    println("================== Main process - determine current module information")
    current_module_local = split(string(@__MODULE__), ".")[end]
    println("  Driver module: $(@__MODULE__)")
    println("  Driver module string: $current_module_local")
    println()
    println()
    println("================== Main process - define module in workers")
    @everywhere worker_ids begin
        using Distributed
        driver_module = try
            Module(Symbol(Meta.parse($current_module_local)))
        catch e
            showerror(stdout, e)
        end
        println("Process $(myid()): ", driver_module, " ", typeof(driver_module))
    end
    println()
    println()
    driver_module = @__MODULE__
    println("================== Main process - import module in workers")
    try
        @everywhere worker_ids begin
            println("Process $(myid()): ", driver_module, " ", typeof(driver_module))
        end
    catch e
        showerror(stdout, e)
    end
    println()
    println()
    println("================== Main process - try to run remotely")
    try
        remotecall_wait(() -> println("I want to run this remotely"), 2)
    catch e
        showerror(stdout, e)
    end
    println()
    println()
    println("================== Main process - define and run function locally")
    function I_want_to_run_this_remotely()
        println("I want to run this remotely")
    end
    I_want_to_run_this_remotely()
    println()
    println()
    println("================== Main process - run remotely")
    try
        remotecall_wait(I_want_to_run_this_remotely, 2)
    catch e
        showerror(stdout, e)
    end
finally
    rmprocs(worker_ids)
end
  • It successfully defines a module of the same name in the worker processes, but it is still not possible to call functions with remotecall_wait.
Output of "julia mfe_driver.jl mfe_define_module.jl"
================== Main process - determine current module information
  Driver module: Main.sandbox_mod
  Driver module string: sandbox_mod


================== Main process - define module in workers
      From worker 2:	Process 2: Main.sandbox_mod Module
      From worker 3:	Process 3: Main.sandbox_mod Module


================== Main process - import module in workers
      From worker 2:	Process 2: Main.sandbox_mod Module
      From worker 3:	Process 3: Main.sandbox_mod Module


================== Main process - try to run remotely
On worker 2:
UndefVarError: sandbox_mod not defined
Stacktrace:
  [1] deserialize_module
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:996
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:895
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1363
  [5] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [6] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [7] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [8] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [9] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
 [10] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [11] invokelatest
    @ ./essentials.jl:726 [inlined]
 [12] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [13] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [14] #103
    @ ./task.jl:484

================== Main process - define and run function locally
I want to run this remotely


================== Main process - run remotely
On worker 2:
UndefVarError: sandbox_mod not defined
Stacktrace:
  [1] deserialize_module
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:996
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:895
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [4] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:1363
  [5] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:866
  [6] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813
  [7] handle_deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:873
  [8] deserialize
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Serialization/src/Serialization.jl:813 [inlined]
  [9] deserialize_msg
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/messages.jl:87
 [10] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [11] invokelatest
    @ ./essentials.jl:726 [inlined]
 [12] message_handler_loop
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:176
 [13] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.8.5+0.x64.linux.gnu/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:133
 [14] #103
    @ ./task.jl:484

Version info

I tested this with Julia installed by juliaup version 1.13.0, with Julia versions below.

Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores

and

Julia Version 1.11.0-rc1
Commit 3a35aec36d1 (2024-06-25 10:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
@JamesWrigley
Copy link

Not an expert on the internals so take this with a grain of salt, but my understanding is that Distributed by design expects all processes in the cluster to have the same global state. And defining new modules or importing things changes that state, which causes the errors you see.

@andreyz4k
Copy link

Not an expert on the internals so take this with a grain of salt, but my understanding is that Distributed by design expects all processes in the cluster to have the same global state. And defining new modules or importing things changes that state, which causes the errors you see.

The problem is that when you create new workers with addprocs they don't get the same global state, and there is no simple way to initialize them properly. And in this case, you get into Catch-22 when your init code is a part of a module, but you can't use it in the worker because the module is not loaded.

@jpsamaroo
Copy link
Member

You should generally not be performing side-effects like this at the top-level of a module - our module top-level is basically "compile-time", if that helps think about whether this is a good or bad idea. You could maybe do this in __init__ to run it during module initialization, as the module is already compiled and loaded at that point. But personally I would make your own init function and have users call that directly, or something more "explicit" like that.

@andreyz4k
Copy link

I have a slightly different setup from the topic starter, and I can say that the problem is not related to his code being in the module's top level. If I have a module with a function that creates a new process with addprocs and then tries to run anything on it, I get an error that the module is not defined. It's not possible to run import/using because it's not at the top level, and it's impossible to run any more complicated code because it's defined as part of the module, which is not loaded.

@jpsamaroo
Copy link
Member

@andreyz4k can you provide an MWE?

@andreyz4k
Copy link

andreyz4k commented Aug 3, 2024

Ok, here it goes

module distributed_test

using Distributed

function test()
    new_pids = addprocs(1)
    try
        # @spawnat new_pids[1] using distributed_test
        # Can't call using because we're not at the top level
        @fetchfrom new_pids[1] println("I want to run this remotely")
    finally
        rmprocs(new_pids)
    end
end

function test2()
    new_pids = addprocs(1)
    try
        ex = Expr(
            :toplevel,
            :(task_local_storage()[:SOURCE_PATH] = $(get(task_local_storage(), :SOURCE_PATH, nothing))),
            :(using distributed_test),
        )
        Distributed.remotecall_eval(Main, new_pids[1], ex)
        println("Finished import")
        @fetchfrom new_pids[1] println("I want to run this remotely")
    finally
        rmprocs(new_pids)
    end
end

end

The first test is what I would prefer — clean code, no manual initialization. Here is the result

julia> using distributed_test

julia> distributed_test.test()
ERROR: On worker 9:
KeyError: key distributed_test [3f6dfd82-8c60-4c67-bce8-86bcb295ad2e] not found
Stacktrace:
  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] macro expansion
    @ ./lock.jl:267 [inlined]
  [3] root_module
    @ ./loading.jl:1878
  [4] deserialize_module
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:994
  [5] handle_deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:896
  [6] deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:814
  [7] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:1398
  [8] handle_deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:867
  [9] deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:814
 [10] handle_deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:874
 [11] deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:814 [inlined]
 [12] deserialize_msg
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/messages.jl:87
 [13] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [14] invokelatest
    @ ./essentials.jl:889 [inlined]
 [15] message_handler_loop
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:176
 [16] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:133
 [17] #103
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:121
Stacktrace:
 [1] remotecall_fetch(::Function, ::Distributed.Worker; kwargs::@Kwargs{})
   @ Distributed ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:465
 [2] remotecall_fetch(::Function, ::Distributed.Worker)
   @ Distributed ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:454
 [3] remotecall_fetch
   @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:492 [inlined]
 [4] test()
   @ distributed_test ~/distributed_test/distributed_test/src/distributed_test.jl:11
 [5] top-level scope
   @ REPL[34]:1

In the second test, I tried to adapt the trick from @everywhere macro to go around the limitation that using can be used only at the file's top level. Still no luck, but the log is slightly different

julia> distributed_test.test2()
Finished import
ERROR: On worker 8:
UndefVarError: `#19#20` not defined
Stacktrace:
  [1] deserialize_datatype
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:1399
  [2] handle_deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:867
  [3] deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:814
  [4] handle_deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:874
  [5] deserialize
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Serialization/src/Serialization.jl:814 [inlined]
  [6] deserialize_msg
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/messages.jl:87
  [7] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
  [8] invokelatest
    @ ./essentials.jl:889 [inlined]
  [9] message_handler_loop
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:176
 [10] process_tcp_streams
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:133
 [11] #103
    @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/process_messages.jl:121
Stacktrace:
 [1] remotecall_fetch(::Function, ::Distributed.Worker; kwargs::@Kwargs{})
   @ Distributed ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:465
 [2] remotecall_fetch(::Function, ::Distributed.Worker)
   @ Distributed ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:454
 [3] remotecall_fetch
   @ ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Distributed/src/remotecall.jl:492 [inlined]
 [4] test2()
   @ distributed_test ~/distributed_test/distributed_test/src/distributed_test.jl:27
 [5] top-level scope
   @ REPL[33]:1

@andreyz4k
Copy link

Well, it looks like I found a partial workaround

module distributed_test

using Distributed

function test4()
    new_pids = addprocs(1)
    try
        ex = Expr(
            :toplevel,
            :(task_local_storage()[:SOURCE_PATH] = $(get(task_local_storage(), :SOURCE_PATH, nothing))),
            :(using distributed_test),
        )
        Distributed.remotecall_eval(Main, new_pids[1], ex)
        println("Finished import")

        remotecall_wait(inner_func, new_pids[1])
    finally
        rmprocs(new_pids)
    end
end

function inner_func()
    println("I want to run this remotely")
end

export inner_func

end

When the called function is exported, it can be accessed in the global scope, where the worker does the evaluation, so it succeeds.

julia> distributed_test.test4()
Finished import
      From worker 11:	I want to run this remotely
Future(11, 1, 44, ReentrantLock(nothing, 0x00000000, 0x00, Base.GenericCondition{Base.Threads.SpinLock}(Base.IntrusiveLinkedList{Task}(nothing, nothing), Base.Threads.SpinLock(0)), (0, 4297724304, 0)), nothing)

But if I'm trying to use a macro like @fetchfrom new_pids[1] inner_func(), it fails because it creates a closure that is not exported upon calling using distributed_test.

@jpsamaroo
Copy link
Member

Ahh ok that makes sense - this is related to the discussion we had in the JuliaHPC call about having addprocs load user-defined startup code before it returns, so that serialized values work as you'd expect. I agree that the first example could probably work once we implement such auto-load support.

@andreyz4k
Copy link

I'm not sure if it can be solved by just running some user-defined startup code. I've also tried to add an exeflags parameter to addprocs with -L flag and a file that does import, but it looked like it didn't work for some reason.

In any case, the main problem here is that serialization is done in the module scope, and deserialization in the worker is done in Main. I don't think that there is any way to change that in the initialization script. Furthermore, module context can be different for different invocations sent to the same worker, so it should be an invocation-level setting.

@andreyz4k
Copy link

I did some more experiments and the temporary fix is to use remotecall_eval everywhere, for example, like this

macro fetchfrom2(p, expr)
    :(Distributed.remotecall_eval(@__MODULE__, $(esc(p)), $(esc(expr))))
end


remote2(p, f) = (args...; kwargs...) -> begin
    thunk = :($f($args...; $kwargs...))
    Distributed.remotecall_eval(@__MODULE__, p, thunk)
end

It still requires the same initialization hack, but I won't need to export everything I want to run. I will use this method in my project for now, but it would be great if this issue could be fixed in the main Distributed codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants