Least-squares and timing noise

Figure 4 from the paper: residuals and spectrum
Working with long-term pulsar timing data sets is a nuisance because of so-called "timing noise". Not only is this noise above and beyond the usual observational uncertainties, perhaps because it is torque noise, it tends to be strongest at low frequencies (it is very "red"). Often so much so that leakage from the very lowest frequencies dominates at all analysis frequencies. Various people, myself included, have tried various approaches for dealing with this noise, but a recent arxiv paper shows real promise:

Pulsar timing analysis in the presence of correlated noise
Pulsar timing observations are usually analysed with least-square-fitting procedures under the assumption that the timing residuals are uncorrelated (statistically "white"). Pulsar observers are well aware that this assumption often breaks down and causes severe errors in estimating the parameters of the timing model and their uncertainties. Ad hoc methods for minimizing these errors have been developed, but we show that they are far from optimal. Compensation for temporal correlation can be done optimally if the covariance matrix of the residuals is known using a linear transformation that whitens both the residuals and the timing model. We adopt a transformation based on the Cholesky decomposition of the covariance matrix, but the transformation is not unique. We show how to estimate the covariance matrix with sufficient accuracy to optimize the pulsar timing analysis. We also show how to apply this procedure to estimate the spectrum of any time series with a steep red power-law spectrum, including those with irregular sampling and variable error bars, which are otherwise very difficult to analyse.
I'd like to look at it in more detail and try some of the techniques on test data.

Like essentially all scientific data, pulsar time-of-arrival measurements are subject to scatter due to observing-system limitations. So when trying to determine key parameters one already uses linear least-squares fitting. The idea in this paper is to treat the timing noise as simply additional noise in the fitting (previous approaches, including my own, have mired in the swamp of trying to fit the timing noise). The problem is that with non-white noise, you expect correlations between errors on successive points - that is, if noise artificially raises one data point, it will probably also raise the next by a similar amount. Ordinary least-squares fitting ignores these correlations and gives dubious fits and wrong error bars. But there is a reasonably well-known way to deal with this sort of correlated error bars.

I'm going to digress a little into the mechanics of least-squares fitting here. Linear least-squares fitting begins with a set of parameters x, a matrix A expressing how each parameter affects each data point, and a set of observed data points b. The goal, then, is to attempt to minimize |Ax-b|. In the most basic kind of least-squares fitting, that absolute value sign simply means the sum of the squares of the components; in other words, all data points are treated equally. numpy provides the handy function numpy.linalg.lstsq to do exactly this; it returns a certain amount of extra information we don't really care about today, but in particular it returns x.

What if some data points are more important than others? If you have an estimate of the error bars on each data point, those with smaller error bars should be weighted more heavily. Fortunately, this can be accomplished without a new function by simply scaling all elements of Ax-b before taking the norm as before. Let's write down a matrix with the uncertainty of each data point on the diagonal as D; then we want to use numpy.linalg.lstsq with D-1A and D-1b. Simple enough.

Now what if the errors are correlated between data points? Well, to start with we need a description. Error bars on each data point aren't enough any more - they describe an n-dimensional Gaussian distribution whose level ellipses are necessarily axis-aligned. If they're not axis-aligned, you need some more elaborate description. The best approach is a covariance matrix. If the errors really are correlated, then this is simply D2. Any correlation between errors appears as off-diagonal entries in the matrix.

A brief aside: covariance matrices also arise in least-squares fitting when trying to describe the uncertainties on the fitted parameters. That's not what we're talking about here; concretely, if you have n data points and m parameters, you hopefully have n>>m, and your covariance matrix describing the fitted parameters will be a relatively small m by m matrix, while your covariance matrix describing the data will be a much larger n by n matrix.

Once you have this covariance matrix C, carrying out least-squares fitting turns out not to be terribly difficult. Conveniently, covariance matrices are symmetric and positive definite, so you can compute "square roots" for them using the Cholesky decomposition: you get an upper-triangular matrix U such that L TL = C. This matrix, or rather its inverse, serves to rotate and scale the Gaussian distribution so that it is uncorrelated and has unit variance in every direction. Thus it can be (and is) used to generate samples from a covariant Gaussian distribution. For us, it can be used to transform the least-squares fitting problem on A, x, and b into a standard one on L-1A, x, and L-1b. So we can fit curves to our data in a way that takes into account our correlated noise - if we have an estimate of the covariance matrix C.

Leaving aside for the moment how we get this estimate, what does this let us do? Well, most obviously, it lets us fit for pulsar parameters and estimate the uncertainty of the results. The timing noise will still contaminate (for example) the one-year sinusoid we fit to measure the pulsar position, but now at least we know how much contamination there is and we weight the data points appropriately. We can also, with sufficient determination, apply the process to multiple pulsars at once and look for the long-term quadrupole footprint of a gravitational-wave background. So, great. But how do we estimate C?

To get a decent estimate of C we need some sort of model for the timing noise. The time-honoured model is to assume some observational uncorrelated noise per observation, plus timing noise whose amplitude behaves like f for some index α. The everything-looks-like-a-nail approach to getting C from this model is to simply simulate it many times and measure the covariance matrix. You have to run the simulation very many times because variance and covariance estimators converge much more slowly than mean estimators do, but this is simulation, so who cares? You can also, if it helps, introduce a low-frequency cutoff in your spectral model. On the other hand, I figured out in my own efforts that if all you want is a power-law spectral model, you can work out the covariances analytically in terms of the hypergeometric 2F1 function. Adding white noise is easy, since covariance matrices add. Unfortunately, the hypergeometric 2F1 is extremely obscure and while scipy does include it, when I tried to use it I found some nasty bugs. I think they're fixed now, but really, the blunt instrument of simulation kind of appeals. In any case, given a model, we can construct a covariance matrix. But where do we get a model?

Unfortunately, no good theoretical model for timing noise exists, and there's some evidence that not just the amplitude but also the spectral index α may vary from pulsar to pulsar and maybe even from epoch to epoch for an individual pulsar. So one has to figure out the model from the data. In this paper they make two key observations: first of all, the fitting is not all that sensitive to the exact model you choose. This isn't too surprising; after all, ignoring or fiddling the error bars in ordinary least-squares fitting doesn't make a huge difference in the results. Their second observation is that once you have a guess at the covariance matrix you can try to measure the spectrum: you simply do a least-squares fit of a sinusoid at each frequency of interest. If your covariance matrix is wrong, your spectral estimate will still be slightly wrong. You may even still see overwhelming leakage from the lowest frequencies - but if you do it will be fairly obvious, and this is a sign you need to increase α. So the procedure becomes: take a guess at α, plot the spectrum, and if it's too steep crank α up. If your α is high enough, your spectral plot will be pretty accurate and you can simply read off your model parameters.

So, in summary: if you've got a data set that suffers from nasty timing noise, you can use this least-squares fitting technique to develop a model of the timing noise and then fit any parameters you like to the result.

This is all implemented in software; in particular, tempo2 can use a covariance matrix you specify. I don't know what there is in terms of user-friendly tools for finding the right covariance matrix, but it need not be particularly complicated.

This seems great, but I still have some questions about this whole process. In particular, I worry because the gravitational-wave background is also supposed to have a red spectrum. Will this model-finding process tend to soak up the gravitational-wave signal? Gravitational-wave signals should be correlated between pulsars, while timing noise should not, but there's still a risk that this process will cost us precious signal.

No comments: