Cash

class sherpa.stats.Cash(name='cash')[source] [edit on github]

Bases: Likelihood

Poisson Log-likelihood function.

Counts are sampled from the Poisson distribution, and so the best way to assess the quality of model fits is to use the product of individual Poisson probabilities computed in each bin i, or the likelihood L:

L = (product)_i [ M(i)^D(i)/D(i)! ] * exp[-M(i)]

where M(i) = S(i) + B(i) is the sum of source and background model amplitudes, and D(i) is the number of observed counts, in bin i.

The Cash statistic 1 is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), and (4) multiplying by two:

C = 2 * (sum)_i [ M(i) - D(i) log M(i) ]

The factor of two exists so that the change in cash statistic from one model fit to the next, (Delta)C, is distributed approximately as (Delta)chi-square when the number of counts in each bin is high. One can then in principle use (Delta)C instead of (Delta)chi-square in certain model comparison tests. However, unlike chi-square, the cash statistic may be used regardless of the number of counts in each bin.

The magnitude of the Cash statistic depends upon the number of bins included in the fit and the values of the data themselves. Hence one cannot analytically assign a goodness-of-fit measure to a given value of the Cash statistic. Such a measure can, in principle, be computed by performing Monte Carlo simulations. One would repeatedly sample new datasets from the best-fit model, fit them, and note where the observed Cash statistic lies within the derived distribution of Cash statistics. Alternatively, the cstat statistic can be used.

Notes

The background should not be subtracted from the data when this statistic is used. It should be modeled simultaneously with the source.

The Cash statistic function evaluates the logarithm of each data point. If the number of counts is zero or negative, it’s not possible to take the log of that number. The behavior in this case is controlled by the truncate and trunc_value settings in the .sherpa.rc file:

if truncate is True (the default value), then log(trunc_value) is used whenever the data value is <= 0. The default is trunc_value=1.0e-25.
when truncate is False an error is raised.

References

1: “Parameter estimation in astronomy through application of the likelihood ratio”, Cash, W. 1979, ApJ 228, 939 http://adsabs.harvard.edu/abs/1979ApJ…228..939C

Methods Summary

`calc_stat`(data, model)	Return the statistic value for the data and model.
`calc_staterror`(data)	Return the statistic error values for the data.
`goodness_of_fit`(statval, dof)	Return the reduced statistic and q value.

Methods Documentation

calc_stat(data, model) [edit on github]

Return the statistic value for the data and model.

Parameters

data (sherpa.data.Data or sherpa.data.DataSimulFit) – The data set, or sets, to use.
model (sherpa.models.model.Model or sherpa.models.model.SimulFitModel) – The model expression, or expressions. If a sherpa.models.model.SimulFitModel is given then it must match the number of data sets in the data parameter.

Returns

statval (number) – The value of the statistic.
fvec (array of numbers) – The per-bin “statistic” value.

static calc_staterror(data) [edit on github]

Return the statistic error values for the data.

Parameters: data (scalar or 1D array of numbers) – The data values.
Returns: staterror – The errors for the input data values (matches the data argument).
Return type: scalar or array of numbers

goodness_of_fit(statval, dof) [edit on github]

Return the reduced statistic and q value.

The reduced statisitc is conceptually simple, as it is just statistic / degrees-of-freedom, but it is not meaningful for all statistics, and it is only valid if there are any degrees of freedom.

Parameters

statval (float) – The statistic value. It is assumed to be finite.
dof (int) – The number of degrees of freedom, which may be 0 or negative.

Returns

rstat (float or NaN or None) – The reduced statistic. If the statistic does not support a goodness of fit then the return value is None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.
qval (float or NaN or None) – The q value. If the statistic does not support a goodness of fit then the return values are None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.