Surveying the Universe
Anthony
This user hasn't shared any biographical information
Homepage: http://www.anthonysmith.me.uk
Posts by Anthony
MNRAS line wrapping in authors lists
May 7th
In case anyone else using the MNRAS LaTeX class has been tearing their hair out inserting line breaks manually into long author (and institution) lists, using \newauthor, here’s how I did it:
\author[J.~Bloggs et al.] {\parbox{\textwidth}{J.~Bloggs,$^{1}$\thanks{E-mail: \texttt{j.bloggs@no.where}} J.~Bloggs,$^{2,3}$ J.~Bloggs,$^{3}$ J.~Bloggs,$^{4}$ J.~Bloggs,$^{5}$ J.~Bloggs,$^{6}$ %... J.~Bloggs$^{99}$ and J.~Bloggs$^{100}$}\vspace{0.4cm}\\ \parbox{\textwidth}{$^{1}$Department of Something, Somewhere\\ $^{2}$Department of Something, Somewhere\\ $^{3}$Department of Something, Somewhere\\ $^{4}$Department of Something, Somewhere\\ $^{5}$Department of Something, Somewhere\\ $^{6}$Department of Something, Somewhere\\ %... $^{99}$Department of Something, Somewhere\\ $^{100}$Department of Something, Somewhere}}
Next: a Python script to generate the above from a simple list of authors and affiliations.
Bayesian number counts
Apr 30th
Here’s a simple bit of statistics for a Friday lunchtime. You count the number of galaxies in a certain area on the sky (with the galaxies satisfying some specific properties, if you like). What is the true number density? Let the expected number be
(the true number density multiplied by the area on the sky) and the measured number be
. Then, in true Bayesian fashion, what we want is

Now, for the prior,
, we assume a prior which is flat on a logarithmic scale. That is, we guess (before making the observation) that the expected number is as likely to lie between 1 and 10 as it is to lie between 1000 and 10,000. (The alternative, a flat prior on a linear scale, would mean that we guess the true density is just as likely to lie between 10,001 and 10,010 as it is to lie between 1 and 10, which is ridiculous.) So
. The likelihood,
is given by the Poisson distribution. So, ignoring the normalizing factor of
,


And this is the Gamma distribution. Easy peasy.
Herschel for everyone
Mar 9th
I’ve just learned that the Herschel Science Archive has been opened up to the world, so any old Tom, Dick or Harry can download the data and start writing their own Nature papers. Well, okay, most of the data is (are) proprietary, but there’s quite a bit of public data on there. Here are some links.
Astropython
Jan 19th
You (both of you) might well be interested in the new Astropython site, which looks excellent. Here’s the site’s own description:
Research in astronomy includes the analysis of astronomical images, parsing and manipulation of large catalogs, statistical yet often visual inference, and the creation of data visualizations for publication and dissemination of results.
The purpose of this web site is to act as a community knowledge base for performing this research with the open source Python language. It provides a forum for general discussion, advice, or relevant news items, collecting lists of useful resources, users’ code snippets or scripts, and longer tutorials on specific topics. The topics within these pages are presented in a list view with the ability to sort by date or topic. A traditional “blog” view of the most recently posted topics is visible from the site Home page.
Astroinformatics
Sep 23rd
Data volumes from multiple sky surveys have grown from gigabytes into terabytes during the past decade, and will grow from terabytes into tens (or hundreds) of petabytes in the next decade. … For astronomy to effectively cope with and reap the maximum scientific return from existing and future large sky surveys, facilities, and data-producing projects, we need our own information science specialists. We therefore recommend the formal creation, recognition, and support of a major new discipline, which we call Astroinformatics. … Now is the time for the recognition of Astroinformatics as an essential methodology of astronomical research. The future of astronomy depends on it.
Pixelating a 2-D Gaussian with Python
Sep 4th
They’re coming thick and fast now.
Here’s a Python function to accompany the previous post. It’s not maximally efficient, but should make sense…
from scipy import stats def gaussian_pixel(minxy, maxxy, sigma, meanxy=(0.,0.), norm=None): """Return the value of a pixel sampling a 2D Gaussian, normalized such that the area under the Gaussian is 1 (default) or such that the peak is given by norm.""" x1, y1 = minxy x2, y2 = maxxy x0, y0 = meanxy if norm is None: norm = 1. / 2 / math.pi / sigma ** 2 return norm * 2 * math.pi * sigma ** 2 / (x2 - x1) / (y2 - y1) * ( (1 - stats.erfc((x2 - x0) / math.sqrt(2) / sigma)) / 2. - (1 - stats.erfc((x1 - x0) / math.sqrt(2) / sigma)) / 2.) * ( (1 - stats.erfc((y2 - y0) / math.sqrt(2) / sigma)) / 2. - (1 - stats.erfc((y1 - y0) / math.sqrt(2) / sigma)) / 2.)
On the normalization of PRFs
Sep 4th
Yesterday I said that the PRF for a map in Jy/beam (or similar) should be normalized so that that peak is 1. But this is true only for an idealised (not pixelated) PRF, or if the map has infinitesimally small pixels.
If the pixels are larger than infinitesimal, as is generally the case, then the maximum value of the pixelated PRF will be the average value over the pixel, which will be less than 1.
For example, if the PRF is a two-dimensional Gaussian, centred on
, with standard deviation
, then the value in a pixel with
and
will be



Ugh. Let’s make that simpler. For a PRF centred on
, and a pixel
, this is

As an example, the fairly-Gaussian beam for the Herschel Space Observatory SPIRE instrument has an FWHM of around 18″, which corresponds to a standard deviation of around
. If we make a Jy/beam map with pixel size 6″, then the peak value for a 1 Jy point source in the centre of a pixel will be

No big deal really…
Estimating the flux of a point source
Sep 3rd
You have a map and you know what a point source looks like. How do you filter the map so that the value of each pixel is now the most likely flux of a point source centred on that pixel? (An isolated point source, to be more precise.)
Easy.
First, find
, which is the point response function (PRF), telling you what a point source of flux 1 will look like in the map. This may be normalized so that the peak is 1 (if your map is in Jy/beam or similar), or so that
(if your map is in Jy/pixel or similar). If your map is in MJy/sr … well, figure it out and add a comment below. Basically, if you normalize your PRF correctly, you won’t need to worry about the map units in what follows. Phew.
Now the measured value of each pixel around the point source,
, will be

is the flux of the source and
is the noise, drawn from a normal distribution with mean zero and standard deviation
.
Now the badness of the fit is measured by the
, which is given by

,
will be at a minimum, so 

, we find the maximum likelihood solution 
Now just do this for each pixel in the map (corresponding to a point source centred on each pixel) and you’re done.
Worked example.
is 0.5, 1.0 and 0.5, for three adjacent pixels (you’ll have realised that the map is in Jy/beam or similar), and
is 1, 2 and 1 Jy/beam, for three adjacent pixels, with the same (tiny!) value of
for each pixel (in this case, we can ignore the value of
in what follows). So the flux at the central pixel is estimated to be

This is an example of a matched filter (I haven’t read the page, but hopefully including the link will make me look clever). And, given that point sources are under no particular obligation to align themselves with the centres of the pixels of your map,
can easily be re-estimated for a source with a certain offset from the pixel centre.
Visualizing noisy images
May 13th
You have an image. Each pixel has a value with some uncertainty. How do you visualize the uncertainty in each pixel? Like this:
Here’s the Python code
import numpy as np from matplotlib import pyplot as plt class FlickerImage(object): def __init__(self, im, err): self.im = im.copy() self.err = err.copy() finite = np.isfinite(self.im + self.err) self.vmin = (self.im - 2 * self.err)[finite].min() self.vmax = (self.im + 2 * self.err)[finite].max() self.im[np.invert(finite)] = self.vmax self.err[np.invert(finite)] = 0 def flicker(self): fg = plt.imshow(np.zeros(self.im.shape), interpolation='nearest', vmin=self.vmin, vmax=self.vmax) while True: ran = np.random.normal(size=im.shape) fg.set_data(im + err * ran) plt.draw()
And here’s an example script:
import pyfits f = pyfits.open('file.fits') im = f["IMAGE"].data err = f["ERROR"].data flicker_image = FlickerImage(im, err) flicker_image.flicker()
Python, FITS and DS9
Apr 1st
Here’s an easy way to display FITS images (or any array) in DS9 using Python (with PyFITS, NumPy and Numdisplay, which is part of stsci_python). First launch DS9, then in Python:
import numdisplay import pyfits arr = pyfits.getdata('file.fits') numdisplay.display(arr)
Easy!
Alternatively, the Kapteyn package seems excellent, and uses Python’s matplotlib for displaying images. It requires WCSLIB to run, though, so the installation process is a bit longer.
A third option is to use python-sao:
import pysao import pyfits ds9 = pysao.ds9() f = pyfits.open('file.fits') ds9.view(f[0])
Easy again! And the WCS information is preserved, which doesn’t seem to be the case with Numdisplay.
PSFs in IDL
Mar 18th
Two methods of approximating a point-spread function in IDL:
1. StarFinder seems to do a great job at finding point sources in crowded fields. It includes a routine for generating the Airy pattern. For a 51 x 51 array, with the peak at [25, 25], and an FWHM of 8.0 pixels, this is the command:
psf = airy_pattern(51, 51, 25, 25, 2./8.0) isurface, psf
The output looks something like this:

2. The IDL Astronomy User’s Library contains a routine, psf_gaussian, that produces a Gaussian PSF. To produce the same as above, the command would be:
psf = psf_gaussian(npix=51, fwhm=8.0, /double) isurface, psf
… which produces something like this:

The eclipse of IDL 7
Mar 3rd
I’ve finally made the transition from IDL 6.4 to IDL 7. Here are my handy hints…
- IDL Workbench rocks! (This is because it is basically Eclipse, which is a proper development environment, unlike that hideous old IDLDE.)
- Another reason for using IDL Workbench (for me at least, and for now) is that IDL help doesn’t seem to work if Java 6 is the default (as it is on my Mac), but the help does work if launched through the IDL Workbench.
To transition to IDL Workbench:
- Import your code as described on David Fanning’s page – fret not, it’s easy and harmless
- Preferences -> IDL -> Startup file, if you have one, and
- Preferences -> IDL -> Paths -> Insert… for me it was just my idl folder, including all sub-folders, to mimic my $IDL_PATH environment variable.
The science of galaxy formation…
Aug 26th
…is the title of a provocative article by Gerry Gilmore(*) on today’s astro-ph. There’s a bit about the scientific method, such as:
The appropriate scientific methodology with which to address such questions is itself problematic: how does one apply what many consider the “traditional scientific method”, involving objective analysis of independent repeated experiments as a test of theory, when the Universe does not allow us to experiment, in the traditional laboratory physics sense; when we have no useful predictive theory for much of astrophysics; and when the nature of the Universe may restrict our observation to only a very small part of an unobservable larger whole? More specifically, is the observational test of prediction how science actually operates? Is that how astrophysics operates?
Good stuff. But the most cutting remarks come in his assessment of the current approach to modelling galaxy formation:
Such a long list of observations all inconsistent with apparently fundamental features of galaxy formation models suggests two approaches. In one approach, new complex physics (“feedback”) must be added, to “improve” agreement with observation. The appearances are to be saved. In another, common assumptions in the galaxy simulations could be examined further.
With the reference to the saving of appearances, the allusion is to Ptolemy’s epicycles: making a misguided model seem more plausible by making it more contrived.
The specific problem Gilmore sees with cosmological simulations is the suppression of the “ultraviolet divergence”, i.e., small-scale perturbations, by “numerical smoothing (‘finite resolution’)”: “It is unlikely that Nature does it that way.” He suggests that many of the inconsistencies between galaxy formation models and observations could be a result of this poor handling of the small-scale power spectrum.
(*) Disclaimer: I will not be held responsible for any damage sustained to your eyes as a result of following links on this page.
Papers: your personal library of science
Jul 22nd
Looking for a piece of software for your Mac that will allow you to:
- keep track of PDFs of academic papers,
- search for papers using Google Scholar, ADS, arXiv, …,
- search your personal library in an instant,
- read papers full-screen,
- add notes to papers,
- organise the papers using collections and smart collections,
- interact with BibTeX databases and citation keys,
- and do all the above in something that looks and feels like iTunes?
Here it is: Papers by mekentosj.com.
Galaxy Zoo: the independence of morphology and colour
Jun 3rd
Galaxies come in two types: red, elliptical galaxies that reside in high-density regions, and blue, spiral galaxies that reside in low-density regions. Right?
Actually, no.
At least, not according to this Galaxy Zoo paper, on the independence of morphology and colour (or here).
First of all, there’s a sizeable population of galaxies that blatantly refuse to allow their colour to determine what shape they should be. There are red galaxies with beautiful spiral morphology and blue galaxies with plain old elliptical morphology.
Okay, but we know that red galaxies like to hang out in crowded places, and that elliptical galaxies are similarly gregarious, so clearly there’s some connection between being red and being well-rounded?
Nope, wrong again!
The main reason that we see more red galaxies in dense environments is that the fraction of spiral galaxies that are red changes, and the fraction of elliptical galaxies that are blue changes. So in sparsely populated bits of the universe, most of the spiral galaxies are blue, but in densely populated regions, most of the spiral galaxies are red. It’s similar for elliptical galaxies. In low-density regions, a large fraction (not quite half) of the elliptical galaxies are blue, whereas in dense environments the vast majority of elliptical galaxies are red.
So the morphology-density relation has really very little (directly) to do with the colour-density relation.
Moral: “elliptical/spiral” doesn’t mean “red/blue”!
UKIDSS paper submitted
Jun 3rd
Well, the deed has been done, and the paper has finally been submitted to MNRAS and to astro-ph. You can read it if you really want to: Luminosity and surface brightness distribution of K-band galaxies from the UKIDSS Large Area Survey. Here’s a picture from the paper:
This is the K-band luminosity function: the number of galaxies per volume as a function of their luminosity, with low luminosity at the left and high luminosity at the right. It’s far from perfect, but hopefully a step in the right direction. There’s quite a bit of incompleteness (missing galaxies) and uncertainty (due to small numbers of galaxies and large-scale structure) at the faint end (left-hand side of the plot). But perhaps more interesting is the disagreement at the bright end (right-hand side). All of the previous results shown on the plot used 2MASS imaging, so this might explain the different results we have found. Specifically, it could be that (1) we use Petrosian magnitudes rather than Kron or total magnitudes, (2) UKIDSS photometry is better than 2MASS photometry, (3) the evolution corrections are different, (4) something else or (5) any combination of the above.
Evolution of Schechter function … so?
Apr 4th
This is some work in progress: K-band luminosity function from the UKIDSS Large Area Survey (LAS, black dots), showing the number of galaxies per unit volume depending on the luminosity of the galaxies, from faint (left) to bright (right). I.e., there are lots more small galaxies than big galaxies.
I’ve fit several Schechter functions to the data. This is a convenient way of describing the luminosity function in terms of three numbers: the slope of the faint end (alpha), the luminosity brighter than which the number of galaxies drops off rapidly (M-star) and the number of galaxies per unit volume at M-star (phi-star). To fit the Schechter functions I’ve used only a portion of the data, as shown in the figure. For example, for the green curve, I’ve used only the black points brighter than (to the right of) absolute magnitude -21.
Now here’s the point. At high redshift, it is possible to see only the brightest galaxies. So we would be able to plot only the black points towards the right-hand side of the figure. But what effect would this have on the Schechter function? Even if we assume the luminosity function does not vary with redshift, our Schechter function fits would! In fact, if we relied on the Schechter function fit to tell us how the galaxy population varied with redshift (a silly thing to do, but people do it all the time), we would infer that the high-redshift galaxy population was (1) brighter (2) more dominated by small galaxies and (3) less abundant than the low-redshift galaxy population.
(Now (1) and (3) are probably true, but we don’t need the Schechter function to tell us. Not so sure about (2).)
Moral: don’t rely on the Schechter function!
A galaxy being emitted by a star
Feb 28th
pIDLy: IDL within Python
Jan 31st
Now Python and IDL can talk to each other (okay, Python talks to IDL and IDL does what it’s told), using pIDLy (pronounce as you please). I experimented with a few other solutions available online but couldn’t get them to work. So I cobbled this one together with surprisingly little trouble, thanks largely to pexpect.




I am a research fellow in