resample_data

sherpa.astro.ui.resample_data(id=None, niter=1000, seed=None)

Resample data with asymmetric error bars.

The function performs a parametric bootstrap assuming a skewed normal distribution centered on the observed data point with the variance given by the low and high measurement errors. The function simulates niter realizations of the data and fits each realization with the assumed model to obtain the best fit parameters. The function returns the best fit parameters for each realization, and displays the average and standard deviation for each parameter.

New in version 4.12.2: The samples and statistic keys were added to the return value and the parameter values are returned as NumPy arrays rather than as lists.

Parameters

id (int or str, optional) – The identifier of the data set to use.
niter (int, optional) – The number of iterations to use. The default is 1000.
seed (int, optional) – The seed for the random number generator. The default is `None`.

Returns

sampled – The keys are statistic, which contains the best-fit statistic value for each iteration, samples, which contains the resampled data used in the fits as a niter by ndata array, and the free parameters in the fit, containing a NumPy array containing the fit parameter for each iteration (of size niter).

Return type

dict

See also

load_ascii_with_errors: Load an ASCII file with asymmetric errors as a data set.

Examples

Account for of asymmetric errors when calculating parameter uncertainties:

>>> load_ascii_with_errors(1, 'test.dat')
>>> set_model(polynom1d.p0)
>>> thaw(p0.c1)
>>> fit()
Dataset               = 1
Method                = levmar
Statistic             = leastsq
Initial fit statistic = 4322.56
Final fit statistic   = 247.768 at function evaluation 6
Data points           = 61
Degrees of freedom    = 59
Change in statistic   = 4074.79
p0.c0          3.2661       +/- 0.193009
p0.c1          2162.19      +/- 65.8445
>>> result = resample_data(1, niter=10)
p0.c0 : avg = 4.159973865314249 , std = 1.0575403309799554
p0.c1 : avg = 1943.5489865678633 , std = 268.64478808013547
>>> print(result['p0.c0'])
[5.856479033432613, 3.8252624107243465, ... 2.8704270612985345]
>>> print(result['p0.c1'])
[1510.049972062868, 1995.4742750432902, ... 2235.9753113309894]

Display the PDF of the parameter values of the p0.c0 component from a run with 5000 iterations:

>>> sample = resample_data(1, 5000)
p0.c0 : avg = 3.966543284267264 , std = 0.9104639711036427
p0.c1 : avg = 1988.8417667057342 , std = 220.21903089622705
>>> plot_pdf(sample['p0.c0'], bins=40)

The samples used for the analysis are returned as the samples key (as a 2D NumPy array of size number of iterations by number of data points), that can be used if further analysis is desired. In this case, the distribution of the first bin is shown as a CDF:

>>> sample = resample_data(1, 5000)
>>> samples = sample['samples']
>>> plot_cdf(samples[:, 0])