moo
This post summarizes a lecture I have given several times to undergrads and early graduate students on model fitting. I will (hopefully) highlight the flexibility of the fitting process and a simple guide to work through most problems. I’m going to assume a basic understanding of the statistics involved, and my example problem will be in Python.
Using MOO
Imagine I give you a dataset and I ask you to fit a model to it. This might be a supernova light curve, a stellar spectrum or a galaxy rotation model. Whatever problem you’re facing, we can break it down into MOO:
- Model which you need to choose to fit to the data!
- Ojective Function which is a metric that you will choose to quantify how well the model fits the data.
- Optimization Method which you will use to pick the best model.
We’re going to break down each step using a simple astronomical example. For each step, I’ll provide code to solve the problem in a few ways, so I encourage you to mix-and-match!
The Model
Let’s start with an example model and dataset to fit. In astronomy, a common problem is to fit Gaussian-like models to spectral lines. These models typically have three parameters: height, width and central wavelength. You can play with two models in this example: the Lorentzian and the Gaussian. The Lorentzian profile looks like:
def my_model(lam,center,height,width):
return height * (width/2.)**2 / ((lam - center)**2 + (width/2.)**2)
and the Gaussian profile is:
def my_model(lam,center,height,width):
return height * np.exp(-0.5 * (lam - center)**2/width**2)
Choose the right model can be very challenging. In many cases, the model will be physically motivated. For example, we can think of a physical model to describe the light curve of a transiting planet. Other times, our models are data-driven. For example, the famous M-sigma relation relates the stellar velocity dispersion of a galaxy bulge with the mass of its supermassive black hole. The physical basis of this relation is uncertain, but we can see that a linear model connects the two galactic features.
Back to our problem: I am going to generate some fake data which we can fit our model to by adding white ( uncorrelated) noise:
my_lam = np.arange(6400,6700,1)
NOISE = 5.0
my_data = my_model(my_lam,6563,100,20) + np.random.normal(0, NOISE,len(my_lam))
my_sigmas = np.zeros(len(my_lam)) + NOISE
plt.plot(my_lam,my_data)
plt.xlabel(r'$\lambda$')
plt.ylabel('Flux')
plt.title('Beautiful Data')
plt.show()
Our spectral line has a true height of 100 flux units, a width of 20 Angstrom and a central wavelength of 6563 Angstrom. You can play with the NOISE
parameter to change the signal-to-noise of the data. Try setting it to something very large, like NOISE = 50
!
The Objective Function
Now that we have our models (and some data!), let’s choose a metric to quantify how well our model fits our dataset. This metric has many names: objective function, cost function, loss function, utility function, etc. (I called it an objective function so I could draw a cow.) The metric that most people are familiar with is called chi-squared:
def metric_chi2(theta,lam,data,sigmas):
center, height, width = theta
return np.sum((my_model(lam,center,height,width)-data)**2/(sigmas)**2)
Remember that we want to minimize chi-squared values to choose the best model. Chi-squared assumes that our noise is uncorrelated and Gaussian, which is true in many cases. But we can easily choose other objective functions:
def metric_residuals(theta,lam,data,sigmas):
center, height, width = theta
return np.sum(np.abs(my_model(lam,center,height,width)-data))
def metric_log_likelihood(theta,lam,data,sigmas):
center, height, width = theta
return -0.5*(np.sum((data-my_model(lam,center,height,width))**2/sigmas**2 + np.log(sigmas**2)))
metric_residuals
computes the sum of the absolute values of the residuals of our model, and metric_log_likelihood
computes the log likelihood of our model. Remember that we want to maximize the log likelihood, but we want to minimize sums of residuals.
The Optimization Method
Finally, let’s fit our data by choosing an optimization method to optimize our objective function. We’re going to start with the easiest thing we can think of and work our way up. First, let’s literally just pick random values for the height, width and center of our spectral line. We’ll choose values 50 times, and the best model will be our winner. this is called a random grid search:
center_choices = np.random.uniform(low=6500, high=6600, size=50)
height_choices = np.random.uniform(low=80, high=120, size=50)
width_choices = np.random.uniform(low=10, high=30, size=50)
lowest_chi = np.inf
best_c = 0
best_h = 0
best_w = 0
for i,c in enumerate(center_choices):
h = height_choices[i]
w = width_choices[i]
theta = [c,h,w]
current_chi = metric_log_likelihood(theta,my_lam,my_data,my_sigmas)
if current_chi < lowest_chi:
best_c = c
best_h = h
best_w = w
lowest_chi = current_chi
print "Chi2",lowest_chi / len(my_lam)
plt.plot(my_lam,my_data)
plt.plot(my_lam,my_model(my_lam,best_c,best_h,best_w))
plt.show()
Chi2 2.89601950512
Our reduced chi-squared is about 3, which is pretty bad. Let’s try to do better! We will use scipy
’s minimize
function, which uses a form of gradient descent to minimize the objective function:
from scipy.optimize import minimize
p0 = [6560,90,5]
bnds = ((6500, 6650), (0, None),(0,None))
res = minimize(metric_log_likelihood, x0=p0,args=(my_lam,my_data,my_sigmas), options={'gtol': 1e-3, 'disp': True},bounds=bnds)
plt.plot(my_lam,my_data)
plt.plot(my_lam,my_model(my_lam,res.x[0],res.x[1],res.x[2]))
plt.show()
print "Chi2",metric_chi2(res.x,my_lam,my_data,my_sigmas)/len(my_lam)
Chi2 1.02930073003
Great! Our reduced chi-squared value is about 1, which means that we’ve found a good fit to the data. Play around with combining different objective functions and optimization methods. Is there any metric which seems to work best for this problem?
Unfortunately, we do not have error estimates on our models using these optimization methods with the code provided. We can use an even fancier optimization technique called a Markov-Chain Monte Carlo to estimate the best-fit model and error bars on the model parameters by sampling from our likelihood function. We’ll use the popular emcee
:
ndim, nwalkers = 3, 100
p0 = [6560,90,5]
pos = [p0 + np.random.randn(ndim) for i in range(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, metric_log_likelihood, args=(my_lam, my_data, my_sigmas))
sampler.run_mcmc(pos, 1000)
samples = sampler.chain[:, 500:, :].reshape((-1, ndim))
corner.corner(samples, labels=["center", "height", "width"])
plt.show()
plt.plot(samples[:,0])
plt.xlabel('Iteration Number')
plt.ylabel('Height')
plt.title('A Sample Chain')
plt.show()
print "CENTER",np.percentile(samples[:,0],[16,50,84])
print "HEIGHT",np.percentile(samples[:,1],[16,50,84])
print "WIDTH",np.percentile(samples[:,2],[16,50,84])
CENTER [ 6562.78917939 6562.96188268 6563.14312425]
HEIGHT [ 100.46597986 102.28547484 103.99994744]
WIDTH [ 19.37125165 19.84305208 20.3369578 ]
The corner
plot shows the best-fit parameter posterior distributions (assuming uniform priors), which we have used to estimate both the best-fit values and the 1-sigma error bars. Below this plot is a sample chain from the MCMC that shows a walker traversing through the posterior.
Try adding a nonuniform prior to the objective function. Does the MCMC produce different results?
Conclusion
We fit our model! Again, I highly recommend switching out the MOO components for this example and understanding what effects this has on your final model. Each component of MOO has special considerations, but it’s important to remember that you have the power to change each step to fit your needs!
So what should you do when faced with real data? My biggest recommendation to new researchers is to do the easiest thing first. Choose a simple model, objective function and optimization function. For example, I typically begin model fits by minimizing chi-squared with minimizer
.