Optimizing functions

The sherpa.optmethods module provides classes that let you optimize models when applyed to data instances, and this is handled by the fit module. You can apply the optimizers directly to functions if you do not need to take advantage of Sherpa’s model and data classes.

The optimizers in sherpa.optmethods.optfcts are functions which follow the same interface:

optimizer(cb, start, lolims, hilims, **kwargs)

where cb is a callable that given a set or parameters returns the current statistic, start is the starting positions of the parameters, lolims and hilims are the bounds of the parameters, and the remaining keyword arguments are specific to the optimizer.

These optimizers can be compared to SciPy’s optimiaztion routines.

Examples

The following imports have been made:

>>> import numpy as np
>>> from matplotlib import pyplot as plt
>>> from matplotlib import colors
>>> from mpl_toolkits.mplot3d import Axes3D
>>> from astropy.table import Table
>>> from sherpa.optmethods import optfcts

A simple function

The function we want to optimize is a version of the Rosenbrock function, where a=2 and b=10, so the best-fit location is at:

x = a = 2
y = a**2 = 4

Note

The Sherpa test suite contains a number of test cases based on functions like this, taken from More, J. J., Garbow, B. S., and Hillstrom, K. E.. Testing unconstrained optimization software. United States: N. p., 1978. Web. doi:10.2172/6650344..

The SciPy optimization tutorial uses a=1 and b=100.

The function is:

def rosenbrock(x, y):
    a, b = 2, 10
    return (a - x)**2 + b * (y - x**2)**2

We can look at the surface close to the best-fit location What does the function look like?

>>> y, x = np.mgrid[-2:5.1:0.1, -3:3.1:0.1]
>>> surface = rosenbrock(x, y)
>>> print(surface.shape)
(71, 61)
>>> print(surface.min())
0.0
>>> print(surface.max())
1235.0

This can be visualized as:

>>> fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
>>> surf = ax.plot_surface(x, y, surface,
...                        cmap=cm.plasma_r, alpha=0.8,
...                        vlim=0, vmax=300)
>>> ax.set_xlabel('x')
>>> ax.set_ylabel('y')
>>> ax.view_init(30, -100)
../_images/rosenbrock-surface.png

In order to optimize this function we need to determine how we want to define the “search surface”; that is the value that the optimizer is going to try and minimize. For this dataset, where the minimum is 0, we can use the square of the function (i.e. a least-squares model). Now, the function we pass to the minimizer requires two values - the “statistic” and a per-parameter breakdown of the parameter - but for now we are going to ignore the latter, as it’s only needed for some optimizers. This gives us:

def to_optimize1(args):
    x = args[0]
    y = args[1]
    pa = (2 - x)**2
    pb = 10 * (y - x**2)**2
    stat = pa**2 + pb**2
    return stat, None

The function can then be minimized by passing the routine, the starting point, and the bounds, to the optimizer - in this case the minim routine:

>>> start = [-1.2, 1]
>>> lo = [-100, -100]
>>> hi = [100, 100]
>>> res = optfcts.minim(to_optimize1, start, lo, hi)

The return value is a tuple which contains a sucess flag, the best-fit parameters, the value at this location, a message, and a dictionary with optimizer-specific information:

>>> print(f"Success: {res[0]}")
Success: True
>>> print(f"Message: {res[3]}")
Message: successful termination
>>> print(f"extra:   {res[4]}")
extra:   {'info': 0, 'nfev': 80}
>>> print(f"best-fit location: {res[1]}")
best-fit location: [2.00219515 4.00935405]
>>> print(f"          minimum: {res[2]}")
          minimum: 3.3675019403007895e-11

So, the correct location is (2, 4) and we can see we got close.

As the different optimizers use the same interface we can easily try different optimizers:

>>> tbl = Table(names=['method', 'stat0', 'x', 'y'],
...             dtype=[str, float, float, float])
>>> for method in [optfcts.minim, optfcts.neldermead, optfcts.lmdif, optfcts.montecarlo]:
...     res = method(to_optimize1, start, lo, hi)
...     if res[0]:
...         tbl.add_row([method.__name__, res[2], res[1][0], res[1][1]])
...     else:
...         print(f"Failed {method.__name__}: {res[3]}")
...
Failed lmdif: improper input parameters
>>> print(tbl)
    method           stat0                 x                  y
  ---------- --------------------- ------------------ ------------------
     minim 3.3675019403007895e-11 2.0021951482261993 4.009354048420242
neldermead  1.269579878170357e-16  2.000079242383145  4.00028638884874
montecarlo  5.028337191787579e-16  2.000144877499188 4.000607622717231

In this particular case lmdif failed, and this is because the to_optimize1 function returned None as its second argument. For the other cases we can see that they all found the minimum location.

In order to use lmdif we need the per-parameter statistic, which we can get with a small tweak:

def to_optimize2(args):
    x = args[0]
    y = args[1]
    pa = (a - x)**2
    pb = b * (y - x**2)**2
    stat = pa**2 + pb**2
    return stat, [pa, pb]

This lets us use lmdif:

>>> res2 = optfcts.lmdif(to_optimize2, start, lo, hi)
>>> res2[0]
True
>>> res2[1]
[1.99240555 3.9690085 ]

The lmdif optimizer is one of those that returns a number of elements in the “extra” dictionary:

>>> res2[4]
{'info': 0, 'nfev': 514, 'covar': array([[4.75572913e+03, 1.44740585e+06],
       [1.44740585e+06, 4.43876953e+08]]), 'num_parallel_map': 0}

Optimizing a model

We can re-create the Fitting the NormGauss1D and Lorentz1D models section - at least for the NormGauss1D fit.

The normalized gaussian model can be expressed as

def ngauss(x, ampl, pos, fwhm):
    term = 4 * np.log(2)
    numerator = ampl * np.exp(-term * (x - pos)**2 / fwhm**2)
    denominator = np.sqrt(np.pi / term) * fwhm
    return numerator / denominator

and the data we are going to fit is read from a file:

>>> d = np.loadtxt('test_peak.dat')
>>> x = d[:, 0]
>>> y = d[:, 1]

In this case we want to minimize the least-squares difference between the model - evaluated on the x axis - and the y values.

def cb(pars):
    model = ngauss(x, pars[0], pars[1], pars[2])
    delta = model - y
    statval = (delta * delta).sum()
    # Keep a record of the parameters we've visited
    store.add_row([statval] + list(pars))
    return statval, delta

Note

The use of store is not required here; it just lets the user find out how the parameter space was searched, which will be discussed below. Users of the full Sherpa system to model and fit data can take advantage of the outfile argument of the sherpa.fit.Fit.fit method to save the per-iteration parameter values.

This function would normally be written as a class or a closure to ensure that the use of global terms like x, y, ngauss, and store do not cause problems.

For the starting point and bounds let us use an estimate - in Sherpa parameter terms a guess - based on the data:

>>> start = [y.max(), (x[0] + x[-1]) / 2, (x[-1] - x[0]) / 10]
>>> lows = [0, x[0], 0]
>>> his = [1e4, x[-1], x[-1] - x[0]]

which can be used with the neldermead optimizer:

>>> store = Table(names=['stat', 'ampl', 'pos', 'fwhm'])
>>> flag, bestfit, statval, msg, opts = optfcts.neldermead(cb, start, lows, his)
>>> flag
True
>>> print(bestfits)
[30.31354039  9.24277042  2.90156672]
>>> statval
29.994315740189727
>>> opts
{'info': True, 'nfev': 267}
>>> len(store)
267

and displayed:

>>> plt.plot(x, y, label='Data', alpha=0.5)
>>> plt.plot(x, ngauss(x, *start), label='Start')
>>> plt.plot(x, ngauss(x, *bestfit), label='Best fit', c='k')
>>> plt.legend()
../_images/normgauss1d-example.png

This result matches the Fitting the NormGauss1D and Lorentz1D models result. Note that at this point you are close to replicating the main parts of Sherpa (but with a lot-less functionality)!

One tweak that was added to the cb routine, compared to to_optimize1 and to_optimize2, is the ability to store the parameter values at each iteration:

>>> print(store)
       stat               ampl               pos                fwhm
------------------ ------------------ ----------------- -------------------
1995.1782885013076          10.452393              10.0                 2.0
1830.6327671752738          11.652393              10.0                 2.0
3522.2622560146397          10.452393              11.2                 2.0
  2156.39741128647          10.452393              10.0                 3.2
 1764.261448064872 11.252393000000001               8.8                 2.8
 2715.012206450446 11.652393000000004 7.600000000000001  3.1999999999999993
 1366.678033289349 11.785726333333333               9.2   1.333333333333333
  4770.95772056462          12.452393 8.799999999999997 0.39999999999999947
1504.4139142274962 12.674615222222219 8.666666666666668  2.0888888888888886
2731.7112487850172 12.156096703703703 7.777777777777779   2.148148148148148
1379.1243615097005 11.778318925925927 9.444444444444445   2.037037037037037
1946.1190085808503 12.906713987654314 9.407407407407408  0.8395061728395059
 1495.009488013458  11.66597324691358 8.951851851851853  2.3098765432098762
1684.8390859851572  10.81206378189301 9.730864197530863  1.6979423868312757
1355.3671486446174 12.208977362139915 8.932716049382716  1.9911522633744854
1399.8451502742626 12.182708500685873 9.432921810699586   1.264471879286694
1279.2923719811783 12.053524687242799 9.312654320987653  1.5258230452674897
1610.2269471113636 12.253833329218104 8.852469135802467  1.1965020576131686
  1294.57012665695 11.897197526748972  9.29645061728395  1.8269032921810702
   1304.4764033764 12.320740050754459 9.161213991769547  2.2292524005486967
1391.2754856701451 11.971997481024239  9.58082990397805  1.7301668952903522
1285.6978966131558 12.149732391860997  9.09474451303155   1.925905921353452
 1393.127377914764  11.74622968648072  9.30801897576589  1.2898357719859779
1272.3465874443266 12.177112459686025 9.197915237768633   1.994398243408017
1238.4310470386065 12.356382165777575 9.107092097241276  1.8038481811715688
1241.4603983878433 12.585974485291878 9.012412837219937   1.792320625666818
               ...                ...               ...                 ...
 30.06395831055505 30.212991086763584 9.244215467610896   2.884083338173955
 30.01817372053474 30.398954377628876  9.24317281485574   2.904866075865189
 30.04367764579035 30.354983581046667 9.237479777065492   2.908685892005043
 30.01084181691227 30.353975570593022 9.239861257457108   2.905079691779978
 30.02346960997154 30.367629772282577 9.242083552984774   2.913048992873203
30.005505202917142  30.35033752084683 9.241560130704027   2.900240251358774
30.015978227327043  30.28083759383391 9.239221271971669  2.9010697048577017
 30.00357680322557  30.31036678978265 9.240209157692686  2.9020187976095735
29.995983764926883 30.296743880329522 9.242764762860688  2.8988154928293106
30.005633018056866 30.268128035197773 9.244216515562478   2.895683393353977
 30.00947198755381 30.264652836397516 9.241868977718209   2.902705094230872
29.997881922784604   30.3289163497345 9.241637342457572  2.9008564620767983
29.997993892783974  30.31698994076293 9.244172073906142  2.9001516576248894
 29.99494948606653 30.315334153017858 9.243181344852779  2.9006184426210604
29.997958649779395 30.289385582999984 9.243773224071393  2.9011553135205057
29.995260397879527  30.31903365805087 9.242171312861029   2.900931174937725
 29.99445003074114  30.31162163928815 9.242571790663618    2.90098720958342
 30.00122895716913 30.305366261844505  9.24324187163635  2.8966596789798125
29.994507347858953  30.31287271477688 9.242437774469073   2.901852715704142
 29.99459765965769 30.310996101543786 9.242638798760893  2.9005544565230594
29.994365579932794  30.31410343389737 9.242809559660927   2.901235579162601
29.994674729702172 30.315953186413875  9.24230454366505   2.901391945320934
29.994710843829058   30.3048082975532 9.242601268664881  2.9003341042667263
29.994741915725644 30.317183905534364 9.242676328856904   2.900774808779393
29.995132223487687  30.30603901667369 9.242973053856733  2.8997169677251855
29.995115838183946 30.307888769190196 9.242468037860858  2.8998733338835176
29.994315740189727 30.313540392830394 9.242770421408574  2.9015667179738505
Length = 267 rows

We can use this to see how the optimizer worked, color-coding each point by the statistic (using a log-normalized scale as we go from \(\gt 2000\) to \(\sim 30\)):

>>> fig, ax = plt.subplots(subplot_kw={"projection": "3d"})
>>> vmin = store['stat'].min()
>>> vmax = store['stat'].max()
>>> norm = colors.LogNorm(vmin=vmin, vmax=vmax)
>>> ax.plot(store['ampl'], store['pos'], store['fwhm'], c='k', alpha=0.4)
>>> scatter = ax.scatter(store['ampl'], store['pos'], store['fwhm'],
...                      c=store['stat'], norm=norm)
>>> ax.set_xlabel('ampl')
>>> ax.set_ylabel('pos')
>>> ax.set_zlabel('fwhm')
>>> cbar = fig.colorbar(scatter, shrink=0.5. orientation='horizontal')
>>> cbar.set_label('least-squares statistic')
../_images/normgauss1d-trail.png