Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety for poisson random calls #2301

Open
SamuelBrand1 opened this issue Aug 12, 2024 · 6 comments
Open

Safety for poisson random calls #2301

SamuelBrand1 opened this issue Aug 12, 2024 · 6 comments

Comments

@SamuelBrand1
Copy link

Hi everyone,

Base Julia can't convert large floats into Int type using floor or round etc. This creates a problem for sampling from Poisson's with a large mean because this is used in fast polyalgo for Poisson sampling.

e.g.

using Distributions
log_μ = 48.0 #Plausible value in a log-scale random walk with a Poisson link
d = Poisson(exp(log_μ))
 rand(d)

ERROR: InexactError: trunc(Int64, 7.016735912097631e20)
Stacktrace:
[1] trunc
@ ./float.jl:905 [inlined]
[2] floor
@ ./float.jl:383 [inlined]
[3] PoissonADSampler
@ ~/.julia/packages/Distributions/oGEEM/src/samplers/poisson.jl:54 [inlined]
[4] PoissonADSampler
@ ~/.julia/packages/Distributions/oGEEM/src/samplers/poisson.jl:49 [inlined]
[5] sampler
@ ~/.julia/packages/Distributions/oGEEM/src/univariate/discrete/poisson.jl:144 [inlined]
[6] rand(rng::Random.TaskLocalRNG, d::Poisson{Float64})
@ Distributions ~/.julia/packages/Distributions/oGEEM/src/univariate/discrete/poisson.jl:148
[7] rand(::Poisson{Float64})
@ Distributions ~/.julia/packages/Distributions/oGEEM/src/genericrand.jl:22
[8] top-level scope
@ REPL[18]:1

Now this would not be problematic, because the logpdf calls are not affected but for some reason a rand call comes into using Turing at inference time (not sure why...).

Is there any chance of a safer version of the Poisson/Negative Binomial that can detect if round(BigInt, ... should be used?

@penelopeysm
Copy link
Member

Hi @SamuelBrand1, thanks for the issue!

Could you provide a MWE that shows what you're trying to do with Turing that causes this crash? My initial impression is that this seems like an upstream issue with Distributions.

@SamuelBrand1
Copy link
Author

Hey @penelopeysm

Its actually quite hard to construct a MWE here from a clean script, for example,

using Turing, Distributions

@model function rw_model(rw₀, n)
    ϵ_t ~ filldist(Normal(), n)
    rw = rw₀ .+ cumsum(ϵ_t)
    return rw
end

@model function data_model(ys, rw)
    for i in eachindex(rw)
        ys[i] ~ Poisson(exp(rw[i]))
    end
    return ys
end

@model function pois_rw(ys, rw₀)
    n = length(ys)
    @submodel rw = rw_model(rw₀, n)
    @submodel gen_ys = data_model(ys, rw)
    return rw, gen_ys
end

generative_mdl = pois_rw(fill(missing,10), 3.)
ys = generative_mdl()[2] .|> Int
inference_mdl = pois_rw(ys, 48.) #deliberately bad choice for rw_0

chn = sample(inference_mdl, NUTS(), 1000)

Works ok with package versions:

[31c24e10] Distributions v0.25.110
[fce5fe82] Turing v0.33.3

And this is roughly analogous to what @seabbs and myself are doing atm.

@SamuelBrand1
Copy link
Author

This seems to suggest that I should move down towards Distributions, as well as investigating why the large mean Poisson problem is hitting our modelling but not in this minimal script.

@seabbs
Copy link

seabbs commented Aug 15, 2024

Agreeing this does seem to be a Distributions issue so flagging it there makes sense. F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions). As this seems like an issue that would be common for users Turing users weighing in on the Distributions issue might be useful (as the rand call being buried in the inference call is quite confusing/hard to resolve so reducing the chance of edge case issues with rand seems like a good thing).

@devmotion
Copy link
Member

F2F @SamuelBrand1 and I have discussed that the SciMl ecosystem has had to develop their own faster Poisson (presumably due to issues changing it in Distributions).

That's basically a myth, and I'm not really sure where it comes from. The sampler in SciML and Distributions are actually basically identical and already quite a few years ago I noticed that SciML is not faster: SciML/PoissonRandom.jl#6

@devmotion
Copy link
Member

Issue in Distributions for the problem here: JuliaStats/Distributions.jl#821

We should hopefully be able to fix it when overhauling rand etc. more generally. But my gut feeling is that it's somewhat suspicious if such large values show up in your model anyway - I guess it's an indication that you might want to rescale your quantities?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants