Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Exercise 9.6.2 #24

Open
Pipe-Vash opened this issue Oct 30, 2022 · 2 comments
Open

Issues with Exercise 9.6.2 #24

Pipe-Vash opened this issue Oct 30, 2022 · 2 comments

Comments

@Pipe-Vash
Copy link

1- I don't think that the "fourth method" that the exercise ask for is the Parametric bootstrap. but the "The Jacknife" Confidence Interval that is explained in the Appendix of the chapter.

2- The second part of the exercise ask for "the true coverage", which means the "the percentage in which the true value of the statistic" (in this case the Skewness parameter). My way to solve this is the following (excuse me if my code is not very fluent)

import random as rd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
from tqdm.notebook import trange, tqdm
np.random.seed(1)

def skew(x):
    mean=np.mean(x)
    se=np.std(x)
    return np.mean((x-mean)**3)/se**3

B=10000
def T_boot(x):
    t_boot=np.empty(B)
    n_x=len(x)
    for i in tqdm(range(B)):
        x_star=np.random.choice(x,n_x,replace=True)
        t_boot[i]=skew(x_star)
    return t_boot,np.std(t_boot)

#Creation of 1000 random vectors of n=50 datapoints
Y2=norm.rvs(loc=0, scale=1, size=(1000,50))
X2=np.exp(Y2)

#Mapping each random vector with the bootstrap to produce the confidence intervals
t_bootV=list(map(T_boot,X2))

#Generating a list with the confidence intervals from every random vector
#Normal 95% confidence intervals
z_95=norm.ppf(0.975)
NCI=np.array(list(map(lambda x,y: (skew(x)-z_95*y,skew(x)+z_95*y),X2,list(zip(*t_bootV))[1])))

#Percentile 95% confidence intervals
PerCI=np.array(list(map(lambda x: (np.quantile(x,0.025),np.quantile(x,0.975)),list(zip(*t_bootV))[0])))

#Pivotal 95% confidence intervals
PivCI=np.array(list(map(lambda x,y: (2*skew(x)-np.quantile(y,0.975),2*skew(x)-np.quantile(y,0.025)),X2,list(zip(*t_bootV))[0])))

According to theory:

  • $E(X)=E(e^{Y})= \int_{-\infty}^\infty e^{y} {\rm{d}}F_Y(y)=e^{1/2}$.
  • $\sigma_X^2=V(X)=E[X-E(X)]^2=E[e^{Y}-E(e^Y)]^2= \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^2 {\rm{d}}F_Y(y)=e(e-1)$.
  • $E[X-E(X)]^3=E[e^{Y}-E(e^Y)]^3 = \int_{-\infty}^\infty \left(e^{y} - e^{1/2}\right)^3 {\rm{d}}F_Y(y) = e^{3/2} (2-3 e +e^3)$.

So that the skewness parameter of $X=e^Y$ is $\theta=T(F)=\int (x-\mu)^{3} {\rm d}F(x) /\sigma_X^3=\frac{E[X-E(X)]^3}{\left(E[X-E[X)]^2\right)^{3/2}}=\frac{e^{3/2} (2-3 e +e^3)}{e^{3/2}(e-1)^{3/2}}= \frac{(2-3 e +e^3)}{(e-1)^{3/2}} \approx 6.18$.
This will be the value we expect to be within our confidences intervals (not only for each method, but also for each simulated random vector). The "true coverage" estimations continues as follow:

#The true parameter:
skewness_f=(2-3*np.exp(1)+np.exp(1)**3)/(np.exp(1)-1)**(3/2)

#Comparison with the Normal 95% Confidence Intervals
cov_NCI=np.count_nonzero((NCI.T[1]>=skewness_f) == (skewness_f>=NCI.T[0]))/len(NCI.T[0])
cov_NCI
#Comparison with the Percentile 95% Confidence Intervals
cov_PerCI=np.count_nonzero((PerCI.T[1]>=skewness_f) == (skewness_f>=PerCI.T[0]))/len(PerCI.T[0])
cov_PerCI

#Comparison with the Pivotal 95% Confidence Intervals
cov_PivCI=np.count_nonzero((PivCI.T[1]>=skewness_f) == (skewness_f>=PivCI.T[0]))/len(PivCI.T[0])
cov_PivCI

print('The true coverage for the simulated Normal confidence intervals is: %.3f' % cov_NCI)
print('The true coverage for the simulated Percentile confidence intervals is: %.3f' % cov_PerCI)
print('The true coverage for the simulated Pivotal confidence intervals is: %.3f' % cov_PivCI)

The output of the simulation:
image
This means that our 50 datapoints are not enough to make our bootstrap 95% confidence intervals to reach the expected coverage...

With $n=1000$ datapoints the coverage gets better:
image

With $n=100000$ datapoints the coverage gets closer to the expected 95%,
image

Assuming that my procedure is correct, I only need more data points (which is more computationally expensive, reason of which I do not continue reporting results)...

I would appreciate some feedback to compare for.

@GustavoMeza
Copy link

I got roughly the same coverage.
I came here thinking I have done something wrong.

@gallego-posada
Copy link

I agree with the comments above.

@Pipe-Vash If you want to verify that you get the correct asymptotic coverage without having to generate millions of samples ($N \approx 10^6$), you can sample $Y$ from a Gaussian with smaller variance $\sigma^2 < 1$. Then you check for coverage by testing whether the bootstrap intervals contain the true skewness of a general $\text{LogNormal}(\mu=0, \sigma^2)$.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants