WStat

class sherpa.stats.WStat(name='wstat')[source] [edit on github]

Bases: Likelihood

Poisson Log-likelihood function including background (XSPEC style).

This is equivalent to the XSPEC implementation of the W statistic for CStat [1], and includes the background data in the fit statistic. If a model is being fit to the background then the CStat statistic should be used.

The following description is taken from [1].

Suppose that each bin in the background spectrum is given its own parameter so that the background model is b_i = f_i. A standard fit for all these parameters would be impractical; however there is an analytical solution for the best-fit f_i in terms of the other variables which can be derived by using the fact that the derivative of the likelihood (L) will be zero at the best fit. Solving for the f_i and substituting gives the profile likelihood:

W = 2 sum_(i=1)^N t_s m_i + (t_s + t_b) f_i -
    S_i ln(t_s m_i + t_s f_i) - B_i ln(t_b f_i) -
    S_i (1- ln(S_i)) - B_i (1 - ln(B_i))

where:

f_i = (S_i + B_i - (t_s + t_b) m_i + d_i) / (2 (t_s + t_b))
d_i = sqrt([(t_s + t_b) m_i - S_i - B_i]^2 +
           4(t_s + t_b) B_i m_i)

If any bin has S_i and/or B_i zero then its contribution to W (W_i) is calculated as a special case. So, if S_i is zero then:

W_i = t_s m_i - B_i ln(t_b / (t_s + t_b))

If B_i is zero then there are two special cases. If m_i < S_i / (t_s + t_b) then:

W_i = - t_b m_i - S_i ln(t_s / (t_s + t_b))

otherwise:

W_i = t_s m_i + S_i (ln(S_i) - ln(t_s m_i) - 1)

In practice, it works well for many cases but for weak sources can generate an obviously wrong best fit. It is not clear why this happens although binning to ensure that every bin contains at least one count often seems to fix the problem. In the limit of large numbers of counts per spectrum bin a second-order Taylor expansion shows that W tends to:

sum_(i=1)^N ( [S_i - t_s m_i - t_s f_i]^2 / (t_s (m_i + f_i)) +
              [B_i - t_b f_i]^2 / (t_b f_i) )

which is distributed as chi^2 with N - M degrees of freedom, where the model m_i has M parameters (include the normalization).

References

Methods Summary

`calc_stat`(data, model)	Return the statistic value for the data and model.
`calc_staterror`(data)	Return the statistic error values for the data.
`goodness_of_fit`(statval, dof)	Return the reduced statistic and q value.

Methods Documentation

calc_stat(data, model)[source] [edit on github]

Return the statistic value for the data and model.

Parameters:

data (sherpa.data.Data or sherpa.data.DataSimulFit) – The data set, or sets, to use.
model (sherpa.models.model.Model or sherpa.models.model.SimulFitModel) – The model expression, or expressions. If a sherpa.models.model.SimulFitModel is given then it must match the number of data sets in the data parameter.

Returns:

statval (number) – The value of the statistic.
fvec (array of numbers) – The per-bin “statistic” value.

static calc_staterror(data) [edit on github]

Return the statistic error values for the data.

Parameters:: data (scalar or 1D array of numbers) – The data values.
Returns:: staterror – The errors for the input data values (matches the data argument).
Return type:: scalar or array of numbers

goodness_of_fit(statval, dof) [edit on github]

Return the reduced statistic and q value.

The reduced statisitc is conceptually simple, as it is just statistic / degrees-of-freedom, but it is not meaningful for all statistics, and it is only valid if there are any degrees of freedom.

Parameters:

statval (float) – The statistic value. It is assumed to be finite.
dof (int) – The number of degrees of freedom, which may be 0 or negative.

Returns:

rstat (float or NaN or None) – The reduced statistic. If the statistic does not support a goodness of fit then the return value is None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.
qval (float or NaN or None) – The q value. If the statistic does not support a goodness of fit then the return values are None. If it does then NaN is returned if either the number of degrees of freedom is 0 (or less), or the statistic value is less than 0.