Chi2

class sherpa.stats.Chi2(name='chi2')[source] [edit on github]

Bases: Stat

A Gaussian Log-likelihood function.

It is assumed that the counts are sampled from the Gaussian (Normal) distribution and so the best way to assess the quality of model fit is to use the product of individual Gaussian probabilities computed in each bin i, or the likelihood:

L = (prod)_i 1/(sigma^2 sqrt(2 pi)) exp[(N(i) - M(i))^2/2 sigma(i)^2]

where M(i) = S(i) + B(i) is the sum of source and background model amplitudes, and N(i) is the total number of observed counts in bin i.

The chi-square statistic is:

chi^2 = (sum)_i [ [ N(i,S) - B(i,x,pB) - S(i,x,pS) ]^2 / sigma(i)^2 ]

where N(i,S) is the total number of observed counts in bin i of the on-source region; B(i,x,pB) is the number of predicted background model counts in bin i of the on-source region (zero for background-subtracted data), rescaled from bin i of the off-source region, and computed as a function of the model argument x(i) (e.g., energy or time) and set of background model parameter values pB; S(i,x,pS) is the number of predicted source model counts in bin i, as a function of the model argument x(i) and set of source model parameter values pS; and sigma(i) is the error in bin i.

Note that there are several weightings of this statistics depending on calculation of sigma(i). N(i,S) contains the background counts and in a case of background subtraction the number of contributing background counts needs to be estimated from the background, so an off-source region. In such case, N(i,B) is the total number of observed counts in bin i of the off-source region; A(B) is the off-source “area”, which could be the size of the region from which the background is extracted, or the length of a background time segment, or a product of the two, etc.; and A(S) is the on-source “area”. These terms may be defined for a particular type of data: for example, PHA data sets A(B) to BACKSCAL * EXPOSURE from the background data set and A(S) to BACKSCAL * EXPOSURE from the source data set.

There are different ways of defining the sigma(i) terms, supported by the sub-classes.

Notes

It is assumed that there is a one-to-one mapping between a given background region bin and a given source region bin. For instance, in the analysis of PHA data, it is assumed that the input background counts spectrum is binned in exactly the same way as the input source counts spectrum, and any filter applied to the source spectrum automatically applied to the background spectrum. This means that the user cannot, for example, specify arbitrary background and source regions in two dimensions and get correct results. This limitation only applies to backgrounds included included as part of the data set - e.g. as with PHA files - and can be avoided by treating the background as a separate data set.

Methods Summary

`calc_chisqr`(data, model)	Return the chi-square value for each bin.
`calc_stat`(data, model)	Return the statistic value for the data and model.
`calc_staterror`(data)	Return the statistic error values for the data.
`goodness_of_fit`(statval, dof)	Return the reduced statistic and q value.

Methods Documentation

calc_chisqr(data, model)[source] [edit on github]

Return the chi-square value for each bin.

Parameters

data (sherpa.data.Data or sherpa.data.DataSimulFit) – The data set, or sets, to use.
model (sherpa.models.model.Model or sherpa.models.model.SimulFitModel) – The model expression, or expressions. If a sherpa.models.model.SimulFitModel is given then it must match the number of data sets in the data parameter.

Returns

chisqr – The per-bin chi-square values.

Return type

array of numbers

calc_stat(data, model)[source] [edit on github]

Return the statistic value for the data and model.

Parameters

data (sherpa.data.Data or sherpa.data.DataSimulFit) – The data set, or sets, to use.
model (sherpa.models.model.Model or sherpa.models.model.SimulFitModel) – The model expression, or expressions. If a sherpa.models.model.SimulFitModel is given then it must match the number of data sets in the data parameter.

Returns

statval (number) – The value of the statistic.
fvec (array of numbers) – The per-bin “statistic” value.

static calc_staterror(data)[source] [edit on github]

Return the statistic error values for the data.

Parameters: data (scalar or 1D array of numbers) – The data values.
Returns: staterror – The errors for the input data values (matches the data argument).
Return type: scalar or array of numbers

goodness_of_fit(statval, dof) [edit on github]

Return the reduced statistic and q value.

The reduced statisitc is conceptually simple, as it is just statistic / degrees-of-freedom, but it is not meaningful for all statistics, and it is only valid if there are any degrees of freedom.

Parameters

statval (float) – The statistic value. It is assumed to be finite.
dof (int) – The number of degrees of freedom, which may be 0 or negative.

Returns

rstat (float or NaN or None) – The reduced statistic. If the statistic does not support a goodness of fit then the return value is None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.
qval (float or NaN or None) – The q value. If the statistic does not support a goodness of fit then the return values are None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.