Anthony Smith's Research Blog
Surveying the Universe
Surveying the Universe
8 Nov 2012
It's been a while since I posted anything here. And today won't mark a significant change in that pattern, I'm afraid. But my python-IDL interface, pIDLy, has been mentioned elsewhere a couple of times, so I thought I might as well put the links up here.
I'm not doing much python or IDL these days, but pIDLy is on GitHub if anyone wants to develop it further...
11 Jun 2012
One of my most important roles within HerMES has been to generate the LaTeX for the lists of authors and affiliations in the astronomy journal papers (previous posts here and here). HerMES is a big consortium, so some of the papers have something like 100 authors (with each author contributing, say, 10 or 20 words, and the first author having the job of putting them in the right order). When the list of authors is determined by complicated rules of the consortium, and when each author has to be associated with one or more institutes, with the correct superscript numbers after their names, and everything in the correct order, then having a script to do that for you can save a lot of time.
As part of my new GitHub spree, I've made my HerMES authorship script more presentable, and it's now freely available for all the world in the authors_list repository. Just download the files, use the "authors_data" template to fill in the names and institutes of the members of your consortium, then follow the instructions in the usage message for the "authors_list" script. It's written in Python, but runs as a command-line script. So far, it can produce output for MNRAS, A&A and ApJ, but it wouldn't take a Python wiz to adapt it for other journals.
Feedback and improvements to the script are welcome!
7 Jun 2012

I thought I'd better figure out what Git and GitHub are. Git is a revision control system, enabling you to keep track of changes to computer software (for example). GitHub is a web-based hosting service for Git repositories. (The GitHub logo is the "octocat", by the way.)
It's all very simple to use too. My python/IDL interface, pIDLy, is now on GitHub, and it wasn't hard to set things up.
One of the points of GitHub is to make sharing and collaborating easier. So I'm kind of hoping that someone out there will want to do some work improving pIDLy, seeing as I'm spending most of my time on Java these days. I do get occasional emails from pIDLy users, so it is being used. But it's still quite easy to break it... The latest version on GitHub should be very slightly better than version 0.2.4, which I made four years ago and haven't touched since.
1 Jun 2012
That's right, not object-oriented, but aspect-oriented. What's that all about then?
Computers store data, and computer programs do stuff to the data. You can write computer programs in that way: basically as a list of things to do to some data. But those programs end up very messy (as I know from experience!). So we need a better way of representing what is going on.
Enter object-oriented programming (OOP). We represent what is going on under the surface in terms of objects. An object is the basic unit of a program. It hides (encapsulates) some data, and if you ask it nicely, it will do something and tell you what you need to know. A computer program is now a collection of objects that interact with each other to perform some complex task. This is much better, as the software developer can focus on crafting elegant objects (or, more accurately, kinds of objects, or classes). If these are well designed, the program will be robust, and you can change little bits of it without worrying that the whole thing will break.
But wait: what if we can't represent all that a program needs to do in terms of individual objects? What if there are some concerns that are relevant for every element of the program? For example, a typical program might be littered with statements dealing with logging, security, or handling errors. These are concerns that cut across the basic functionality of a program.
Enter aspect-oriented programming (AOP). This provides a way of separating out those cross-cutting concerns into separate aspects, which provide additional commands to be executed at particular points (or particular kinds of points) in the program execution.
I'm going to give it a try with the Java development I'm working on. I'll let you know how it goes. It should be useful for testing, allowing me to keep track of how many times a particular method is executed, what the values of the parameters are, how much time is spent on different steps, and what is making it go wrong, for example.
Fortunately, it's dead easy to get started. For Java, the most widely-used AOP extension is AspectJ. If you have a Java project in Eclipse, simply install the AspectJ Development Tools plugin, convert the project to an AspectJ project, and that's it. Here's "Hello world" using an aspect:
public class HelloWorld { private void sayHello() { System.out.println("Hello"); } public static void main(String[] args) { new HelloWorld().sayHello(); } } public aspect World { pointcut greeting(): execution(* HelloWorld.sayHello(..)); after() returning: greeting() { System.out.println("World!"); } } |
The sayHello method executes, and then the aspect interrupts the execution and prints "World!".
Part of the attraction of AOP to me is that it has resonances with the philosophy of Herman Dooyeweerd, who worked on a model of reality called the Theory of Aspects, in which the whole of reality can be described in terms of (around) 15 (cross-cutting) aspects. In fact, I first heard of AOP via Andrew Basden's page on the topic.
31 May 2012
Regular visitors to this, my boring technical blog, may be forgiven for thinking that nothing is going on. This is my fourth post here in two years, so that's an average of one post every eight months (think about it!). Now, I do want this blog to be boring in content (most people would think I'm succeeding in that), but not necessarily boring in that I never post anything. So here's an attempt to bring things up to speed...
What have I been doing for the past few years? Since October 2008 I've been working on data processing for the Herschel Space Observatory. In terms of the data, I've done a lot with HerMES, but more recently the focus has been on the software, working on a big Herschel software package, HIPE (about, download). As such—and much to my horror—I seem to be becoming a software developer. Java, to be more specific. Here are some things I've been working on recently:
6 Feb 2012
You have a function that you want to approximate as an N-dimensional (multivariate) Gaussian (normal) distribution. What do you do?
If you are me, you spend a couple of weeks deriving stuff, and then finally figure out the easy way.
But hopefully you are not me, so here is the easy way, to save you the bother.
First, find the peak of your function, and put a hat on the value(s) or the parameter(s) at this point. Now your (N-dimensional) function can be approximated as
![f(x) \simeq f(\hat{x})\exp \left[-\frac{1}{2} (x-\hat{x})^\mathrm{T} \Sigma^{-1} (x-\hat{x}) \right]](http://www.anthonysmith.me.uk/research/wp-content/plugins/latex/cache/tex_4ee981f1c61eb31439a1f4f78d44ed59.gif)
where
is the covariance matrix, and our task it to figure out what it is.
But fear not, help is at hand. If we find the following Hessian matrix, and if we assume that the covariance matrix is symmetric, we have

That's it. Easy. Now you can even integrate the function

27 Sep 2011
Well, it's finally here: HerMES: point source catalogues from deep Herschel-SPIRE observations, by yours truly and lots of other people otherwise known as "Al".
Some of the catalogues are available here, if you're interested.
The biggest challenge was confusion (about which there is a great deal of confusion). For example, in this HerMES image, everything you can see is light from galaxies, and each galaxy produces a small round blob in the image. The number of galaxies is so high, that light from any one galaxy is confused with light from numerous neighbouring galaxies.
So when the computer looks at the image and spots a blob with brightness 30 mJy, what does that mean? Does it mean there is one galaxy there with brightness 30 mJy? Or perhaps there are 10 galaxies, with brightnesses of 15 mJy or lower?
It's difficult to tell. But what you can do (and what we did in the paper) is to stick fake galaxies (blobs) into the image, and see what difference that makes. So if we have a fake galaxy with brightness 30 mJy, and we put it in at a random position, what effect will that have, on average? What is the probability that it will be detected? (This is approaching what we mean by the completeness of the catalogue, although it is far from easy to know what is meant by those simple words "it" and "detected"!) What brightness will it be measured to have, on average? 20 mJy? 30 mJy? 40 mJy? 100 mJy?
For very bright galaxies, it's relatively straightforward. But for fainter things, it can do your head in...
7 May 2010
In case anyone else using the MNRAS LaTeX class has been tearing their hair out inserting line breaks manually into long author (and institution) lists, using \newauthor, here's how I did it:
\author[J.~Bloggs et al.] {\parbox{\textwidth}{J.~Bloggs,$^{1}$\thanks{E-mail: \texttt{j.bloggs@no.where}} J.~Bloggs,$^{2,3}$ J.~Bloggs,$^{3}$ J.~Bloggs,$^{4}$ J.~Bloggs,$^{5}$ J.~Bloggs,$^{6}$ %... J.~Bloggs$^{99}$ and J.~Bloggs$^{100}$}\vspace{0.4cm}\\ \parbox{\textwidth}{$^{1}$Department of Something, Somewhere\\ $^{2}$Department of Something, Somewhere\\ $^{3}$Department of Something, Somewhere\\ $^{4}$Department of Something, Somewhere\\ $^{5}$Department of Something, Somewhere\\ $^{6}$Department of Something, Somewhere\\ %... $^{99}$Department of Something, Somewhere\\ $^{100}$Department of Something, Somewhere}} |
Next: a Python script to generate the above from a simple list of authors and affiliations.
30 Apr 2010
Here's a simple bit of statistics for a Friday lunchtime. You count the number of galaxies in a certain area on the sky (with the galaxies satisfying some specific properties, if you like). What is the true number density? Let the expected number be
(the true number density multiplied by the area on the sky) and the measured number be
. Then, in true Bayesian fashion, what we want is

Now, for the prior,
, we assume a prior which is flat on a logarithmic scale. That is, we guess (before making the observation) that the expected number is as likely to lie between 1 and 10 as it is to lie between 1000 and 10,000. (The alternative, a flat prior on a linear scale, would mean that we guess the true density is just as likely to lie between 10,001 and 10,010 as it is to lie between 1 and 10, which is ridiculous.) So
. The likelihood,
is given by the Poisson distribution. So, ignoring the normalizing factor of
,


And this is the Gamma distribution. Easy peasy.
9 Mar 2010
I've just learned that the Herschel Science Archive has been opened up to the world, so any old Tom, Dick or Harry can download the data and start writing their own Nature papers. Well, okay, most of the data is (are) proprietary, but there's quite a bit of public data on there. Here are some links.
19 Jan 2010
You (both of you) might well be interested in the new Astropython site, which looks excellent. Here's the site's own description:
Research in astronomy includes the analysis of astronomical images, parsing and manipulation of large catalogs, statistical yet often visual inference, and the creation of data visualizations for publication and dissemination of results.
The purpose of this web site is to act as a community knowledge base for performing this research with the open source Python language. It provides a forum for general discussion, advice, or relevant news items, collecting lists of useful resources, users' code snippets or scripts, and longer tutorials on specific topics. The topics within these pages are presented in a list view with the ability to sort by date or topic. A traditional "blog" view of the most recently posted topics is visible from the site Home page.
23 Sep 2009
Data volumes from multiple sky surveys have grown from gigabytes into terabytes during the past decade, and will grow from terabytes into tens (or hundreds) of petabytes in the next decade. ... For astronomy to effectively cope with and reap the maximum scientific return from existing and future large sky surveys, facilities, and data-producing projects, we need our own information science specialists. We therefore recommend the formal creation, recognition, and support of a major new discipline, which we call Astroinformatics. ... Now is the time for the recognition of Astroinformatics as an essential methodology of astronomical research. The future of astronomy depends on it.
4 Sep 2009
They're coming thick and fast now.
Here's a Python function to accompany the previous post. It's not maximally efficient, but should make sense...
from scipy import stats def gaussian_pixel(minxy, maxxy, sigma, meanxy=(0.,0.), norm=None): """Return the value of a pixel sampling a 2D Gaussian, normalized such that the area under the Gaussian is 1 (default) or such that the peak is given by norm.""" x1, y1 = minxy x2, y2 = maxxy x0, y0 = meanxy if norm is None: norm = 1. / 2 / math.pi / sigma ** 2 return norm * 2 * math.pi * sigma ** 2 / (x2 - x1) / (y2 - y1) * ( (1 - stats.erfc((x2 - x0) / math.sqrt(2) / sigma)) / 2. - (1 - stats.erfc((x1 - x0) / math.sqrt(2) / sigma)) / 2.) * ( (1 - stats.erfc((y2 - y0) / math.sqrt(2) / sigma)) / 2. - (1 - stats.erfc((y1 - y0) / math.sqrt(2) / sigma)) / 2.) |
4 Sep 2009
Yesterday I said that the PRF for a map in Jy/beam (or similar) should be normalized so that that peak is 1. But this is true only for an idealised (not pixelated) PRF, or if the map has infinitesimally small pixels.
If the pixels are larger than infinitesimal, as is generally the case, then the maximum value of the pixelated PRF will be the average value over the pixel, which will be less than 1.
For example, if the PRF is a two-dimensional Gaussian, centred on
, with standard deviation
, then the value in a pixel with
and
will be



Ugh. Let's make that simpler. For a PRF centred on
, and a pixel
, this is

As an example, the fairly-Gaussian beam for the Herschel Space Observatory SPIRE instrument has an FWHM of around 18", which corresponds to a standard deviation of around
. If we make a Jy/beam map with pixel size 6", then the peak value for a 1 Jy point source in the centre of a pixel will be

3 Sep 2009
You have a map and you know what a point source looks like. How do you filter the map so that the value of each pixel is now the most likely flux of a point source centred on that pixel? (An isolated point source, to be more precise.)
Easy.
First, find
, which is the point response function (PRF), telling you what a point source of flux 1 will look like in the map. This may be normalized so that the peak is 1 (if your map is in Jy/beam or similar), or so that
(if your map is in Jy/pixel or similar). If your map is in MJy/sr ... well, figure it out and add a comment below. Basically, if you normalize your PRF correctly, you won't need to worry about the map units in what follows. Phew.
Now the measured value of each pixel around the point source,
, will be

is the flux of the source and
is the noise, drawn from a normal distribution with mean zero and standard deviation
.
Now the badness of the fit is measured by the
, which is given by

,
will be at a minimum, so 

, we find the maximum likelihood solution 
Now just do this for each pixel in the map (corresponding to a point source centred on each pixel) and you're done.
Worked example.
is 0.5, 1.0 and 0.5, for three adjacent pixels (you'll have realised that the map is in Jy/beam or similar), and
is 1, 2 and 1 Jy/beam, for three adjacent pixels, with the same (tiny!) value of
for each pixel (in this case, we can ignore the value of
in what follows). So the flux at the central pixel is estimated to be

This is an example of a matched filter (I haven't read the page, but hopefully including the link will make me look clever). And, given that point sources are under no particular obligation to align themselves with the centres of the pixels of your map,
can easily be re-estimated for a source with a certain offset from the pixel centre.
13 May 2009
You have an image. Each pixel has a value with some uncertainty. How do you visualize the uncertainty in each pixel? Like this:
Here's the Python code
import numpy as np from matplotlib import pyplot as plt class FlickerImage(object): def __init__(self, im, err): self.im = im.copy() self.err = err.copy() finite = np.isfinite(self.im + self.err) self.vmin = (self.im - 2 * self.err)[finite].min() self.vmax = (self.im + 2 * self.err)[finite].max() self.im[np.invert(finite)] = self.vmax self.err[np.invert(finite)] = 0 def flicker(self): fg = plt.imshow(np.zeros(self.im.shape), interpolation='nearest', vmin=self.vmin, vmax=self.vmax) while True: ran = np.random.normal(size=im.shape) fg.set_data(im + err * ran) plt.draw() |
And here's an example script:
import pyfits f = pyfits.open('file.fits') im = f["IMAGE"].data err = f["ERROR"].data flicker_image = FlickerImage(im, err) flicker_image.flicker() |
1 Apr 2009
Here's an easy way to display FITS images (or any array) in DS9 using Python (with PyFITS, NumPy and Numdisplay, which is part of stsci_python). First launch DS9, then in Python:
import numdisplay import pyfits arr = pyfits.getdata('file.fits') numdisplay.display(arr) |
Easy!
Alternatively, the Kapteyn package seems excellent, and uses Python's matplotlib for displaying images. It requires WCSLIB to run, though, so the installation process is a bit longer [update: apparently that bit is no longer true - 17 Oct 2011].
A third option is to use python-sao:
import pysao import pyfits ds9 = pysao.ds9() f = pyfits.open('file.fits') ds9.view(f[0]) |
Easy again! And the WCS information is preserved, which doesn't seem to be the case with Numdisplay.
18 Mar 2009
Two methods of approximating a point-spread function in IDL:
1. StarFinder seems to do a great job at finding point sources in crowded fields. It includes a routine for generating the Airy pattern. For a 51 x 51 array, with the peak at [25, 25], and an FWHM of 8.0 pixels, this is the command:
psf = airy_pattern(51, 51, 25, 25, 2./8.0) isurface, psf |
The output looks something like this:

2. The IDL Astronomy User's Library contains a routine, psf_gaussian, that produces a Gaussian PSF. To produce the same as above, the command would be:
psf = psf_gaussian(npix=51, fwhm=8.0, /double) isurface, psf |
... which produces something like this:

3 Mar 2009
I've finally made the transition from IDL 6.4 to IDL 7. Here are my handy hints...
To transition to IDL Workbench: