Simulating data

A simple case

Simulating a data set normally involves:

  1. evaluate the model

  2. add in noise

This may need to be repeated several times for complex models, such as when different components have different noise models or the noise needs to be added before evaluation by a component.

The model evaluation would be performed using the techniques described in this section, and then the noise term can be handled with sherpa.utils.poisson_noise() or routines from NumPy or SciPy to evaluate noise.

>>> import numpy as np
>>> from sherpa.models.basic import Polynom1D
>>> x = np.arange(10, 100, 12)
>>> mdl = Polynom1D('mdl')
>>> mdl.offset = 35
>>> mdl.c1 = 0.5
>>> mdl.c2 = 0.12
>>> ymdl = mdl(x)
>>> rng = np.random.default_rng(235)  # a "repeatable" set of random values
>>> from sherpa.utils import poisson_noise
>>> ypoisson = poisson_noise(ymdl, rng=rng)
>>> ypoisson = poisson_noise(ymdl, rng=rng)
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, ymdl, label="model")
[<matplotlib.lines.Line2D object at ...>]
>>> plt.plot(x, ypoisson, ".", label="with noise")
[<matplotlib.lines.Line2D object at ...>]
>>> plt.legend()
<matplotlib.legend.Legend object at ...>

(png, hires.png, pdf)

../_images/simulate-1.png

X-ray data (DataPHA)

In principle, the same steps apply when simulating PHA data (DataPHA objects), however, the mechanics are a little more complicated because we need to account for the instrumental response (ARF and RMF) and possibly also the background, which may contribute to the source spectrum that we want to simulate. Readers not interested in X-ray data analysis may want to skip this section.

Sherpa offers a dedicated function sherpa.astro.fake.fake_pha() for simulations of PHA data. A DataPHA object needs to be set up with the responses for the source and an exposure time. Usually, the easiest setup is to read in a PHA file from a real observation taken in a configuration close to what we want to simulate. This PHA file that already specifies the response and backgrounds so that they can be automatically loaded. Also, the exposure time, background scaling etc. will be taken from the header of the file, but we can change these values if we want. When new data is simulated later, the counts in data will be overwritten, but all other information stays the same. In this example, we use a datafile from Sherpa’s test data files (which are large and not installed by default but can be downloaded).

>>> # Set the directory where the data is stored.
>>> # In this case, we use Sherpa's test data files
>>> from sherpa.utils.testing import get_datadir
>>> data_dir = get_datadir() + '/'
>>> # Load the data from a PHA file
>>> from sherpa.astro.io import read_pha
>>> data = read_pha(data_dir + '9774_bg.pi')
>>> data.exposure = 5e5  # Simulate longer exposure than in the original data

Alternatively, we first create a DataPHA object with the correct channel numbers, responses, and exposure time. In this example, we initially set the exposure time and then add the remaining information line by line:

>>> import numpy as np
>>> from sherpa.astro.io import read_arf, read_rmf
>>> from sherpa.astro.data import DataPHA
>>> data = DataPHA(name='any', channel=None, counts=None, exposure=10000.)
>>> data.set_arf(read_arf(data_dir + '9774.arf'))
>>> data.set_rmf(read_rmf(data_dir + '9774.rmf'))
>>> # By convention, channel numbers usually start at 1
>>> data.channel = np.arange(1, data.get_rmf().detchans + 1)
>>> data.counts = np.zeros_like(data.channel)

Next, we set up a model. In this case, we start with a powerlaw source where the slope and normalization of that powerlaw are already known, e.g. from the literature. We then add a weak emission line. Our simulation will show us if this emission line would be detectable in a real observation:

>>> from sherpa.models.basic import PowLaw1D, Gauss1D
>>> pl = PowLaw1D()
>>> line = Gauss1D()
>>> pl.gamma = 1.8
>>> pl.ampl = 2e-05
>>> line.pos = 6.7
>>> line.ampl = .0003
>>> line.fwhm = .1
>>> srcmdl = pl + line

We will need to wrap this model with the response (ARF and RMF), which takes care of converting from channels to energy:

>>> resp = data.get_full_response()
>>> full_model = resp(srcmdl)
>>> print(full_model)
apply_rmf(apply_arf(10000.0 * (powlaw1d + gauss1d)))
  Param        Type          Value          Min          Max      Units
  -----        ----          -----          ---          ---      -----
  powlaw1d.gamma thawed          1.8          -10           10
  powlaw1d.ref frozen            1 -3.40282e+38  3.40282e+38
  powlaw1d.ampl thawed        2e-05            0  3.40282e+38
  gauss1d.fwhm thawed          0.1  1.17549e-38  3.40282e+38
  gauss1d.pos  thawed          6.7 -3.40282e+38  3.40282e+38
  gauss1d.ampl thawed       0.0003 -3.40282e+38  3.40282e+38

The simplest case: Simulate the source spectrum only

With this model, it is now easy to run the simulation, which will calculate the expected number of counts in each spectral channel (where the number and width of the channels is given by the responses) and then draw from a Poisson distribution with this expected number. Thus, the simulated number of counts in each channel is always an integer and includes Poisson noise - running fake_pha() twice with identical settings will give slightly different answers. With default settings, the input model is convolved with the RMF and multiplied by the ARF, and properly scaled for the exposure time.

>>> from sherpa.astro.fake import fake_pha
>>> fake_pha(data, full_model)

We bin the counts into bins of at least 5 counts per bin and display an image of the simulated spectrum (see X-ray data (DataPHA) for details):

>>> data.set_analysis('energy')
>>> data.notice(0.3, 8)
>>> data.group_counts(5)
>>> from sherpa.plot import DataPlot
>>> nobgplot = DataPlot()
>>> nobgplot.prepare(data)

We can overplot the model (both grouped and ungrouped):

>>> from sherpa.astro.plot import ModelPHAHistogram, ModelHistogram
>>> nobgplot.plot(xlog=True, ylog=True, label='Simulated without background')
>>> m1plot = ModelPHAHistogram()
>>> m1plot.prepare(data, full_model)
>>> m1plot.overplot(alpha=0.5, label="Grouped model")
>>> m2plot = ModelHistogram()
>>> m2plot.prepare(data, full_model)
>>> m2plot.overplot(alpha=0.5, label="Ungrouped model")

(png, hires.png, pdf)

../_images/simulate-8.png

Sometimes, more than one response is needed, e.g. in Chandra LETG/HRC-S different orders of the grating overlap on the detector, so they all contribute to the observed source spectrum. The models that are passed into fake_pha() can be arbitrarily complex and can contain multiple components. For example, fake_pha() can simulate models which may include such components as the instrumental background (which should not be folded through the ARF) or arbitrary other components:

>>> fake_pha(data, model=inst_bkg + resp(srcmodel))

Adding background

Weak spectral features can be hidden in a large (instrumental or astrophysical) background, thus it can be important to include background in the simulation.

Sample background from a PHA file

One way to include background is to sample it from a DataPHA object. To do so, a background need to be loaded into the dataset before running the simulation and, if not done before, the scale of the background scaling has to be set:

>>> data.set_background(read_pha(data_dir + '9774_bg.pi'))
>>> data.backscal = 9.6e-06
>>> fake_pha(data, full_model)
>>> data.set_analysis('energy')
>>> data.notice(0.3, 8)
>>> data.group_counts(5)
>>> samplebgplot = DataPlot()
>>> samplebgplot.prepare(data)
>>> nobgplot.plot(label='No Background',linestyle="solid")
>>> samplebgplot.plot(overplot=True, label='Sample Background', linestyle="solid")

(png, hires.png, pdf)

../_images/simulate-9.png

The fake_pha function simulates the source spectrum as above, but then it samples from the background PHA. For each bin, it treats the background count number as the expected value and performs a Poisson draw. The background drawn from the Poisson distribution is than added to the simulated source spectrum (the sum of two Poisson distributions is a Poisson distribution again). This works best if the background is well exposed and has a large number or counts.

More than one background

If more than one background is set up, then the expected backgrounds will be averaged before the Poisson draw.

Background models

When the number of counts in the background is low, the above procedure amplifies the noise. The experimental background already contains noise just from Poisson statistics. Using the observed count numbers as background value and then performing a Poisson draw on them again gives a higher level of noise than a real observation would have. To avoid this problem, a background model may be used instead. In this example, we assume that the background is dominated by an astrophysical component, so it gets folded through the response:

>>> from sherpa.models.basic import Const1D
>>> bkgmdl = Const1D('bmdl')
>>> bkgmdl.c0 = 1e-5
>>> fake_pha(data, resp(srcmdl + bkgmdl))
>>> data.set_analysis('energy')
>>> data.notice(0.3, 8)
>>> data.group_counts(5)
>>> modelbgplot = DataPlot()
>>> modelbgplot.prepare(data)
>>> nobgplot.plot(xlog=True, ylog=True, label='No Background', linestyle="solid")
>>> samplebgplot.plot(overplot=True, label='Sample Background', linestyle="solid")
>>> modelbgplot.plot(overplot=True, label='Model Background', linestyle="solid")

(png, hires.png, pdf)

../_images/simulate-10.png

Reference/API

Simulate PHA datasets

The fake_pha routine is used to create simulated sherpa.astro.data.DataPHA data objects.

Functions

fake_pha(data, model[, is_source, ...])

Simulate a PHA data set from a model.