Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed.jl should verify Julia version between primary-worker #49

Open
Naikless opened this issue May 18, 2022 · 8 comments
Open

Distributed.jl should verify Julia version between primary-worker #49

Naikless opened this issue May 18, 2022 · 8 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Naikless
Copy link
Contributor

Naikless commented May 18, 2022

As described already here, running

using Distributed
addprocs(["<remotename>"], exename="<pathToRemoteJuliaExe>", dir="<pathToRemoteHomeDir>")

@fetch myid()

on a host with Julia 1.6.1 and connecting to a remote with Julia 1.7.2 leads to

ERROR: On worker 2:
TypeError: non-boolean (Nothing) used in boolean context
Stacktrace:
  [1] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:1166
  [2] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:947
  [3] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:801 [inlined]
  [4] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:1018
  [5] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:947
  [6] deserialize_fillarray!
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:1230
  [7] deserialize_array
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:1222
  [8] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:852
  [9] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:801 [inlined]
 [10] deserialize_typename
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:1296
 [11] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/clusterserialize.jl:68
 [12] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:947
 [13] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:801
 [14] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:858
 [15] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:801
 [16] handle_deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:861
 [17] deserialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Serialization/src/Serialization.jl:801 [inlined]
 [18] deserialize_msg
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/messages.jl:87
 [19] #invokelatest#2
    @ ./essentials.jl:716 [inlined]
 [20] invokelatest
    @ ./essentials.jl:714 [inlined]
 [21] message_handler_loop
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:169
 [22] process_tcp_streams
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/Distributed/src/process_messages.jl:126
 [23] JuliaLang/julia#99
    @ ./task.jl:423
Stacktrace:
 [1] #remotecall_fetch#143
   @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker)
   @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:386
 [3] remotecall_fetch(::Function, ::Int64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:421
 [4] remotecall_fetch(::Function, ::Int64)
   @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:421
 [5] top-level scope
   @ none:1

However, when I use Julia 1.7.2 instead of 1.6.1 on the host, everything works as expected. Both systems run CentOS 7.

@vchuravy
Copy link
Sponsor Member

The serialization format between versions is not stable and thus mixed version are not supported (see https://docs.julialang.org/en/v1/stdlib/Serialization/).

I think what would be a great addition is during initial connection to check that the Julia version is the same and otherwise error.

@vchuravy vchuravy changed the title Distributed.fetch errors for differing Julia versions on host and remote Distributed.jl should verify Julia version between primary-worker May 18, 2022
@vchuravy vchuravy added help wanted Extra attention is needed good first issue Good for newcomers labels May 18, 2022
@Naikless
Copy link
Contributor Author

Naikless commented May 18, 2022

Thanks for clearing this up!

At the very least this should be mentioned in the documentation of the Distributed module where I currently couldn’t find any hint about these kind of issues.

For someone not as familiar with the inner workings of the remote function calls this is especially confusing, because Julia in general claims to be mostly compatible for all 1.x versions.

@giordano
Copy link
Contributor

because Julia in general claims to be mostly compatible for all 1.x versions.

The code you write is mostly backward-compatible. Communication between different processes, which is the problem here, is a different matter.

@Naikless
Copy link
Contributor Author

I think what would be a great addition is during initial connection to check that the Julia version is the same and otherwise error.

It might be sufficient to improve the error message. Since JuliaLang/julia#35376 introduced an explicit check for binaries coming from a Julia version higher than the local one, this could be extended to only allow the same version. However, the above conversation at least indicates that backward compatibility should be expected.

The code you write is mostly backward-compatible. Communication between different processes, which is the problem here, is a different matter.

Yes, I see that now. However, I feel this is not reflected sufficiently in the documentation, so I filed PR JuliaLang/julia#45368 to improve it.

@giordano
Copy link
Contributor

Maybe instead of (or in addition to?) the Julia version we should check the serialization format version?
https://github.com/JuliaLang/julia/blob/0f2ed77dca88785c9ae0fb1cf1a77593d1527c18/stdlib/Serialization/src/Serialization.jl#L82

@Naikless
Copy link
Contributor Author

As I said, that check already exists for future versions:
https://github.com/JuliaLang/julia/blob/0f2ed77dca88785c9ae0fb1cf1a77593d1527c18/stdlib/Serialization/src/Serialization.jl#L742

I would probably either

  • extend the above code to only allow for the exact same version. But I understand the intention that in some cases it can work for former versions so this might be unnecessarily strict.
  • Print a warning when serialization formats differ so that if it fails, the user knows why.

@davpayne
Copy link

Does this still need to be worked since JuliaLang/julia#45368 is merged? If so, I'm thinking of just adding an elseif warning to Serialization.jl for a less than version to put in practice what @Naikless suggested

@Naikless
Copy link
Contributor Author

My PR only addressed the documentation. If the checks are still the same, I believe this could still improve error identification.

davpayne referenced this issue in davpayne/julia Aug 1, 2023
Inserted a warning for a less than case of data version to complement error for greater than case. Modeled language after doc language update in issue JuliaLang#45368
@vtjnash vtjnash transferred this issue from JuliaLang/julia Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants