This post summarizes a lecture I have given several times to undergrads and early graduate students on model fitting. I will (hopefully) highlight the flexibility of the fitting process and a simple guide to work through most problems. I’m going to assume a basic understanding of the statistics involved, and my example problem will be in Python.

Imagine I give you a dataset and I ask you to fit a model to it. This might be a supernova light curve, a stellar spectrum or a galaxy rotation model. Whatever problem you’re facing, we can break it down into **MOO**:

1. * Model* which you need to choose to fit to the data!

2.

3.

We’re going to break down each step using a simple astronomical example. For each step, I’ll provide code to solve the problem in a few ways, so I encourage you to mix-and-match!

Let’s start with an example **model** and dataset to fit. In astronomy, a common problem is to fit Gaussian-like models to spectral lines. These models typically have three parameters: height, width and central wavelength. You can play with two models in this example: the Lorentzian and the Gaussian. The Lorentzian profile looks like:

```
def my_model(lam,center,height,width):
return height * (width/2.)**2 / ((lam - center)**2 + (width/2.)**2)
```

and the Gaussian profile is:

```
def my_model(lam,center,height,width):
return height * np.exp(-0.5 * (lam - center)**2/width**2)
```

Choose the *right* model can be very challenging. In many cases, the model will be physically motivated. For example, we can think of a physical model to describe the light curve of a transiting planet. Other times, our models are *data-driven*. For example, the famous M-sigma relation relates the stellar velocity dispersion of a galaxy bulge with the mass of its supermassive black hole. The physical basis of this relation is uncertain, but we can see that a linear model connects the two galactic features.

Back to our problem: I am going to generate some fake data which we can fit our model to by adding white ( uncorrelated) noise:

```
my_lam = np.arange(6400,6700,1)
NOISE = 5.0
my_data = my_model(my_lam,6563,100,20) + np.random.normal(0, NOISE,len(my_lam))
my_sigmas = np.zeros(len(my_lam)) + NOISE
plt.plot(my_lam,my_data)
plt.xlabel(r'$\lambda$')
plt.ylabel('Flux')
plt.title('Beautiful Data')
plt.show()
```

Our spectral line has a *true* height of 100 flux units, a width of 20 Angstrom and a central wavelength of 6563 Angstrom. You can play with the `NOISE`

parameter to change the signal-to-noise of the data. Try setting it to something very large, like `NOISE = 50`

!

Now that we have our models (and some data!), let’s choose a **metric** to quantify how *well* our model fits our dataset. This metric has many names: objective function, cost function, loss function, utility function, etc. (I called it an **objective function** so I could draw a cow.) The metric that most people are familiar with is called **chi-squared**:

```
def metric_chi2(theta,lam,data,sigmas):
center, height, width = theta
return np.sum((my_model(lam,center,height,width)-data)**2/(sigmas)**2)
```

Remember that we want to *minimize* chi-squared values to choose the best model. Chi-squared assumes that our noise is uncorrelated and Gaussian, which is true in many cases. But we can easily choose other objective functions:

```
def metric_residuals(theta,lam,data,sigmas):
center, height, width = theta
return np.sum(np.abs(my_model(lam,center,height,width)-data))
```

```
def metric_log_likelihood(theta,lam,data,sigmas):
center, height, width = theta
return -0.5*(np.sum((data-my_model(lam,center,height,width))**2/sigmas**2 + np.log(sigmas**2)))
```

`metric_residuals`

computes the sum of the absolute values of the residuals of our model, and `metric_log_likelihood`

computes the log likelihood of our model. Remember that we want to *maximize* the log likelihood, but we want to *minimize* sums of residuals.

Finally, let’s fit our data by choosing an *optimization method* to optimize our *objective function*. We’re going to start with the easiest thing we can think of and work our way up. First, let’s literally just pick random values for the height, width and center of our spectral line. We’ll choose values 50 times, and the *best* model will be our winner. this is called a *random grid search*:

```
center_choices = np.random.uniform(low=6500, high=6600, size=50)
height_choices = np.random.uniform(low=80, high=120, size=50)
width_choices = np.random.uniform(low=10, high=30, size=50)
lowest_chi = np.inf
best_c = 0
best_h = 0
best_w = 0
for i,c in enumerate(center_choices):
h = height_choices[i]
w = width_choices[i]
theta = [c,h,w]
current_chi = metric_log_likelihood(theta,my_lam,my_data,my_sigmas)
if current_chi < lowest_chi:
best_c = c
best_h = h
best_w = w
lowest_chi = current_chi
print "Chi2",lowest_chi / len(my_lam)
plt.plot(my_lam,my_data)
plt.plot(my_lam,my_model(my_lam,best_c,best_h,best_w))
plt.show()
```

```
Chi2 2.89601950512
```

Our reduced chi-squared is about 3, which is pretty bad. Let’s try to do better! We will use `scipy`

’s `minimize`

function, which uses a form of *gradient descent* to minimize the objective function:

```
from scipy.optimize import minimize
p0 = [6560,90,5]
bnds = ((6500, 6650), (0, None),(0,None))
res = minimize(metric_log_likelihood, x0=p0,args=(my_lam,my_data,my_sigmas), options={'gtol': 1e-3, 'disp': True},bounds=bnds)
plt.plot(my_lam,my_data)
plt.plot(my_lam,my_model(my_lam,res.x[0],res.x[1],res.x[2]))
plt.show()
print "Chi2",metric_chi2(res.x,my_lam,my_data,my_sigmas)/len(my_lam)
```

```
Chi2 1.02930073003
```

Great! Our reduced chi-squared value is about 1, which means that we’ve found a good fit to the data. Play around with combining different *objective functions* and *optimization methods*. Is there any metric which seems to work best for this problem?

Unfortunately, we do not have error estimates on our models using these optimization methods with the code provided. We can use an even fancier optimization technique called a Markov-Chain Monte Carlo to estimate the best-fit model *and* error bars on the model parameters by sampling from our likelihood function. We’ll use the popular `emcee`

:

```
ndim, nwalkers = 3, 100
p0 = [6560,90,5]
pos = [p0 + np.random.randn(ndim) for i in range(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, metric_log_likelihood, args=(my_lam, my_data, my_sigmas))
sampler.run_mcmc(pos, 1000)
samples = sampler.chain[:, 500:, :].reshape((-1, ndim))
corner.corner(samples, labels=["center", "height", "width"])
plt.show()
plt.plot(samples[:,0])
plt.xlabel('Iteration Number')
plt.ylabel('Height')
plt.title('A Sample Chain')
plt.show()
print "CENTER",np.percentile(samples[:,0],[16,50,84])
print "HEIGHT",np.percentile(samples[:,1],[16,50,84])
print "WIDTH",np.percentile(samples[:,2],[16,50,84])
```

```
CENTER [ 6562.78917939 6562.96188268 6563.14312425]
HEIGHT [ 100.46597986 102.28547484 103.99994744]
WIDTH [ 19.37125165 19.84305208 20.3369578 ]
```

The `corner`

plot shows the best-fit parameter posterior distributions (assuming uniform priors), which we have used to estimate both the best-fit values and the 1-sigma error bars. Below this plot is a sample chain from the MCMC that shows a walker traversing through the posterior.

Try adding a nonuniform prior to the *objective function*. Does the MCMC produce different results?

We fit our model! Again, I highly recommend switching out the **MOO** components for this example and understanding what effects this has on your final model. Each component of **MOO** has special considerations, but it’s important to remember that *you* have the power to change each step to fit your needs!

So what should you do when faced with real data? My biggest recommendation to new researchers is to do the *easiest* thing first. Choose a simple model, objective function and optimization function. For example, I typically begin model fits by minimizing chi-squared with `minimizer`

.

Feel free to share!