CStat
- class sherpa.stats.CStat(name='cstat')[source] [edit on github]
Bases:
Likelihood
Poisson Log-likelihood function (XSPEC style).
This is equivalent to the XSPEC implementation of the Cash statistic 1 except that it requires a model to be fit to the background. To handle the background in the same manner as XSPEC, use the WStat statistic.
Counts are sampled from the Poisson distribution, and so the best way to assess the quality of model fits is to use the product of individual Poisson probabilities computed in each bin i, or the likelihood L:
L = (product)_i [ M(i)^D(i)/D(i)! ] * exp[-M(i)]
where M(i) = S(i) + B(i) is the sum of source and background model amplitudes, and D(i) is the number of observed counts, in bin i.
The cstat statistic is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), (4) adding an extra data-dependent term (this is what makes it different to
Cash
, and (5) multiplying by two:C = 2 * (sum)_i [ M(i) - D(i) + D(i)*[log D(i) - log M(i)] ]
The factor of two exists so that the change in the cstat statistic from one model fit to the next, (Delta)C, is distributed approximately as (Delta)chi-square when the number of counts in each bin is high. One can then in principle use (Delta)C instead of (Delta)chi-square in certain model comparison tests. However, unlike chi-square, the cstat statistic may be used regardless of the number of counts in each bin.
The inclusion of the data term in the expression means that, unlike the Cash statistic, one can assign an approximate goodness-of-fit measure to a given value of the cstat statistic, i.e. the observed statistic, divided by the number of degrees of freedom, should be of order 1 for good fits.
Notes
The background should not be subtracted from the data when this statistic is used. It should be modeled simultaneously with the source.
The cstat statistic function evaluates the logarithm of each data point. If the number of counts is zero or negative, it’s not possible to take the log of that number. The behavior in this case is controlled by the
truncate
andtrunc_value
settings in the .sherpa.rc file:if
truncate
isTrue
(the default value), thenlog(trunc_value)
is used whenever the data value is <= 0. The default istrunc_value=1.0e-25
.when
truncate
isFalse
an error is raised.
References
- 1
The description of the Cash statistic (
cstat
) in https://heasarc.gsfc.nasa.gov/xanadu/xspec/manual/XSappendixStatistics.html
Methods Summary
calc_stat
(data, model)Return the statistic value for the data and model.
calc_staterror
(data)Return the statistic error values for the data.
goodness_of_fit
(statval, dof)Return the reduced statistic and q value.
Methods Documentation
- calc_stat(data, model) [edit on github]
Return the statistic value for the data and model.
- Parameters
data (
sherpa.data.Data
orsherpa.data.DataSimulFit
) – The data set, or sets, to use.model (
sherpa.models.model.Model
orsherpa.models.model.SimulFitModel
) – The model expression, or expressions. If asherpa.models.model.SimulFitModel
is given then it must match the number of data sets in the data parameter.
- Returns
statval (number) – The value of the statistic.
fvec (array of numbers) – The per-bin “statistic” value.
- static calc_staterror(data) [edit on github]
Return the statistic error values for the data.
- Parameters
data (scalar or 1D array of numbers) – The data values.
- Returns
staterror – The errors for the input data values (matches the data argument).
- Return type
scalar or array of numbers
- goodness_of_fit(statval, dof) [edit on github]
Return the reduced statistic and q value.
The reduced statisitc is conceptually simple, as it is just statistic / degrees-of-freedom, but it is not meaningful for all statistics, and it is only valid if there are any degrees of freedom.
- Parameters
- Returns
rstat (float or NaN or None) – The reduced statistic. If the statistic does not support a goodness of fit then the return value is
None
. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.qval (float or NaN or None) – The q value. If the statistic does not support a goodness of fit then the return values are
None
. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.