Herschel for everyone

I’ve just learned that the Herschel Science Archive has been opened up to the world, so any old Tom, Dick or Harry can download the data and start writing their own Nature papers. Well, okay, most of the data is (are) proprietary, but there’s quite a bit of public data on there. Here are some links.

Astropython

You (both of you) might well be interested in the new Astropython site, which looks excellent. Here’s the site’s own description:

Research in astronomy includes the analysis of astronomical images, parsing and manipulation of large catalogs, statistical yet often visual inference, and the creation of data visualizations for publication and dissemination of results.

The purpose of this web site is to act as a community knowledge base for performing this research with the open source Python language. It provides a forum for general discussion, advice, or relevant news items, collecting lists of useful resources, users’ code snippets or scripts, and longer tutorials on specific topics. The topics within these pages are presented in a list view with the ability to sort by date or topic. A traditional “blog” view of the most recently posted topics is visible from the site Home page.

Astroinformatics

Data volumes from multiple sky surveys have grown from gigabytes into terabytes during the past decade, and will grow from terabytes into tens (or hundreds) of petabytes in the next decade. … For astronomy to effectively cope with and reap the maximum scientific return from existing and future large sky surveys, facilities, and data-producing projects, we need our own information science specialists. We therefore recommend the formal creation, recognition, and support of a major new discipline, which we call Astroinformatics. … Now is the time for the recognition of Astroinformatics as an essential methodology of astronomical research. The future of astronomy depends on it.

Astroinformatics: A 21st Century Approach to Astronomy

Pixelating a 2-D Gaussian with Python

They’re coming thick and fast now.

Here’s a Python function to accompany the previous post. It’s not maximally efficient, but should make sense…

from scipy import stats
def gaussian_pixel(minxy, maxxy, sigma, meanxy=(0.,0.), norm=None):
    """Return the value of a pixel sampling a 2D Gaussian,
    normalized such that the area under the Gaussian is 1
    (default) or such that the peak is given by norm."""
    x1, y1 = minxy
    x2, y2 = maxxy
    x0, y0 = meanxy
    if norm is None:
        norm = 1. / 2 / math.pi / sigma ** 2
    return norm * 2 * math.pi * sigma ** 2 / (x2 - x1) / (y2 - y1) * (
        (1 - stats.erfc((x2 - x0) / math.sqrt(2) / sigma)) / 2.
        - (1 - stats.erfc((x1 - x0) / math.sqrt(2) / sigma)) / 2.) * (
        (1 - stats.erfc((y2 - y0) / math.sqrt(2) / sigma)) / 2.
        - (1 - stats.erfc((y1 - y0) / math.sqrt(2) / sigma)) / 2.)

On the normalization of PRFs

Yesterday I said that the PRF for a map in Jy/beam (or similar) should be normalized so that that peak is 1. But this is true only for an idealised (not pixelated) PRF, or if the map has infinitesimally small pixels.

If the pixels are larger than infinitesimal, as is generally the case, then the maximum value of the pixelated PRF will be the average value over the pixel, which will be less than 1.

For example, if the PRF is a two-dimensional Gaussian, centred on (x_0, y_0), with standard deviation \sigma, then the value in a pixel with x_1 < x < x_2 and y_1 < y < y_2 will be

A = \frac{1}{(x_2 - x_1)(y_2 - y_1)} \int_{x_1}^{x_2} \int_{y_1}^{y_2} e^{- \left( \frac{(x-x_0)^2 + (y-y_0)^2}{2\sigma^2} \right)} \mathrm{d}x \mathrm{d}y.
which is
A = \frac{2 \pi \sigma^2}{(x_2 - x_1)(y_2 - y_1)} \left( \frac{1}{2} \mathrm{erf} \left( \frac{x_2 - x_0}{\sqrt{2}\sigma} \right) - \frac{1}{2} \mathrm{erf} \left( \frac{x_1 - x_0}{\sqrt{2}\sigma} \right) \right)
 \times \left( \frac{1}{2} \mathrm{erf} \left( \frac{y_2 - y_0}{\sqrt{2}\sigma} \right) - \frac{1}{2} \mathrm{erf} \left( \frac{y_1 - y_0}{\sqrt{2}\sigma} \right) \right).

Ugh. Let’s make that simpler. For a PRF centred on (0,0), and a pixel (\pm r, \pm r), this is

A = \frac{\pi \sigma^2}{2 r^2} \left( \mathrm{erf} \left( \frac{r}{\sqrt{2}\sigma} \right) \right)^2.

As an example, the fairly-Gaussian beam for the Herschel Space Observatory SPIRE instrument has an FWHM of around 18″, which corresponds to a standard deviation of around 18. If we make a Jy/beam map with pixel size 6″, then the peak value for a 1 Jy point source in the centre of a pixel will be

A = \frac{\pi (7.64

No big deal really…

Estimating the flux of a point source

You have a map and you know what a point source looks like. How do you filter the map so that the value of each pixel is now the most likely flux of a point source centred on that pixel? (An isolated point source, to be more precise.)

Easy.

First, find P_i, which is the point response function (PRF), telling you what a point source of flux 1 will look like in the map. This may be normalized so that the peak is 1 (if your map is in Jy/beam or similar), or so that \sum P_i = 1 (if your map is in Jy/pixel or similar). If your map is in MJy/sr … well, figure it out and add a comment below. Basically, if you normalize your PRF correctly, you won’t need to worry about the map units in what follows. Phew.

Now the measured value of each pixel around the point source, d_i, will be

d_i = f P_i + n_i,
where f is the flux of the source and n_i is the noise, drawn from a normal distribution with mean zero and standard deviation \sigma_i.

Now the badness of the fit is measured by the \chi^2, which is given by

\chi^2 = \sum_i \left( \frac{d_i - fP_i}{\sigma_i} \right)^2.
At the maximum likelihood value of the flux, f, \chi^2 will be at a minimum, so
\frac{\mathrm{d}\chi^2}{\mathrm{d}f} = 0.
Hence
\sum_i (-2d_i + 2 f P_i^2) / \sigma_i^2 = 0.
Solving this for f, we find the maximum likelihood solution
f = \frac{\sum_i d_i P_i / \sigma_i^2}{\sum_i P_i^2 / \sigma_i^2}.

Now just do this for each pixel in the map (corresponding to a point source centred on each pixel) and you’re done.

Worked example. P_i is 0.5, 1.0 and 0.5, for three adjacent pixels (you’ll have realised that the map is in Jy/beam or similar), and d_i is 1, 2 and 1 Jy/beam, for three adjacent pixels, with the same (tiny!) value of \sigma_i for each pixel (in this case, we can ignore the value of \sigma_i in what follows). So the flux at the central pixel is estimated to be

f = (0.5 \times 1 + 1.0 \times 2 + 0.5 \times 1) / (0.5^2 + 1.0^2 + 0.5^2) = 2 \,\mathrm{Jy},
which is no surprise, since the maximum value of the map in Jy/beam is 2 at that position.

This is an example of a matched filter (I haven’t read the page, but hopefully including the link will make me look clever). And, given that point sources are under no particular obligation to align themselves with the centres of the pixels of your map, P_i can easily be re-estimated for a source with a certain offset from the pixel centre.

Visualizing noisy images

You have an image. Each pixel has a value with some uncertainty. How do you visualize the uncertainty in each pixel? Like this:

flicker_image

Here’s the Python code

import numpy as np
from matplotlib import pyplot as plt
 
class FlickerImage(object):
    def __init__(self, im, err):
        self.im = im.copy()
        self.err = err.copy()
        finite = np.isfinite(self.im + self.err)
        self.vmin = (self.im - 2 * self.err)[finite].min()
        self.vmax = (self.im + 2 * self.err)[finite].max()
        self.im[np.invert(finite)] = self.vmax
        self.err[np.invert(finite)] = 0
    def flicker(self):
        fg = plt.imshow(np.zeros(self.im.shape),
                        interpolation='nearest',
                        vmin=self.vmin,
                        vmax=self.vmax)
        while True:
            ran = np.random.normal(size=im.shape)
            fg.set_data(im + err * ran)
            plt.draw()

And here’s an example script:

import pyfits
f = pyfits.open('file.fits')
im = f["IMAGE"].data
err = f["ERROR"].data
flicker_image = FlickerImage(im, err)
flicker_image.flicker()

Python, FITS and DS9

Here’s an easy way to display FITS images (or any array) in DS9 using Python (with PyFITS, NumPy and Numdisplay, which is part of stsci_python). First launch DS9, then in Python:

import numdisplay
import pyfits
arr = pyfits.getdata('file.fits')
numdisplay.display(arr)

Easy!

Alternatively, the Kapteyn package seems excellent, and uses Python’s matplotlib for displaying images. It requires WCSLIB to run, though, so the installation process is a bit longer.

A third option is to use python-sao:

import pysao
import pyfits
ds9 = pysao.ds9()
f = pyfits.open('file.fits')
ds9.view(f[0])

Easy again! And the WCS information is preserved, which doesn’t seem to be the case with Numdisplay.

PSFs in IDL

Two methods of approximating a point-spread function in IDL:

1. StarFinder seems to do a great job at finding point sources in crowded fields. It includes a routine for generating the Airy pattern. For a 51 x 51 array, with the peak at [25, 25], and an FWHM of 8.0 pixels, this is the command:

psf = airy_pattern(51, 51, 25, 25, 2./8.0)
isurface, psf

The output looks something like this:

PSF Airy pattern (StarFinder/IDL)

2. The IDL Astronomy User’s Library contains a routine, psf_gaussian, that produces a Gaussian PSF. To produce the same as above, the command would be:

psf = psf_gaussian(npix=51, fwhm=8.0, /double)
isurface, psf

… which produces something like this:

PSF Gaussian (IDL Astronomy User's Library)

The eclipse of IDL 7

I’ve finally made the transition from IDL 6.4 to IDL 7. Here are my handy hints…

  • IDL Workbench rocks! (This is because it is basically Eclipse, which is a proper development environment, unlike that hideous old IDLDE.)
  • Another reason for using IDL Workbench (for me at least, and for now) is that IDL help doesn’t seem to work if Java 6 is the default (as it is on my Mac), but the help does work if launched through the IDL Workbench.

To transition to IDL Workbench:

  1. Import your code as described on David Fanning’s page – fret not, it’s easy and harmless
  2. Preferences -> IDL -> Startup file, if you have one, and
  3. Preferences -> IDL -> Paths -> Insert… for me it was just my idl folder, including all sub-folders, to mimic my $IDL_PATH environment variable.

The science of galaxy formation…

…is the title of a provocative article by Gerry Gilmore(*) on today’s astro-ph. There’s a bit about the scientific method, such as:

The appropriate scientific methodology with which to address such questions is itself problematic: how does one apply what many consider the “traditional scientific method”, involving objective analysis of independent repeated experiments as a test of theory, when the Universe does not allow us to experiment, in the traditional laboratory physics sense; when we have no useful predictive theory for much of astrophysics; and when the nature of the Universe may restrict our observation to only a very small part of an unobservable larger whole? More specifically, is the observational test of prediction how science actually operates? Is that how astrophysics operates?

Good stuff. But the most cutting remarks come in his assessment of the current approach to modelling galaxy formation:

Such a long list of observations all inconsistent with apparently fundamental features of galaxy formation models suggests two approaches. In one approach, new complex physics (“feedback”) must be added, to “improve” agreement with observation. The appearances are to be saved. In another, common assumptions in the galaxy simulations could be examined further.

With the reference to the saving of appearances, the allusion is to Ptolemy’s epicycles: making a misguided model seem more plausible by making it more contrived.

The specific problem Gilmore sees with cosmological simulations is the suppression of the “ultraviolet divergence”, i.e., small-scale perturbations, by “numerical smoothing (‘finite resolution’)”: “It is unlikely that Nature does it that way.” He suggests that many of the inconsistencies between galaxy formation models and observations could be a result of this poor handling of the small-scale power spectrum.

(*) Disclaimer: I will not be held responsible for any damage sustained to your eyes as a result of following links on this page.

Papers: your personal library of science

Looking for a piece of software for your Mac that will allow you to:

  • keep track of PDFs of academic papers,
  • search for papers using Google Scholar, ADS, arXiv, …,
  • search your personal library in an instant,
  • read papers full-screen,
  • add notes to papers,
  • organise the papers using collections and smart collections,
  • interact with BibTeX databases and citation keys,
  • and do all the above in something that looks and feels like iTunes?

Here it is: Papers by mekentosj.com.

Galaxy Zoo: the independence of morphology and colour

Galaxies come in two types: red, elliptical galaxies that reside in high-density regions, and blue, spiral galaxies that reside in low-density regions. Right?

Actually, no.

At least, not according to this Galaxy Zoo paper, on the independence of morphology and colour (or here).

First of all, there’s a sizeable population of galaxies that blatantly refuse to allow their colour to determine what shape they should be. There are red galaxies with beautiful spiral morphology and blue galaxies with plain old elliptical morphology.

Okay, but we know that red galaxies like to hang out in crowded places, and that elliptical galaxies are similarly gregarious, so clearly there’s some connection between being red and being well-rounded?

Nope, wrong again!

The main reason that we see more red galaxies in dense environments is that the fraction of spiral galaxies that are red changes, and the fraction of elliptical galaxies that are blue changes. So in sparsely populated bits of the universe, most of the spiral galaxies are blue, but in densely populated regions, most of the spiral galaxies are red. It’s similar for elliptical galaxies. In low-density regions, a large fraction (not quite half) of the elliptical galaxies are blue, whereas in dense environments the vast majority of elliptical galaxies are red.

So the morphology-density relation has really very little (directly) to do with the colour-density relation.

Moral: “elliptical/spiral” doesn’t mean “red/blue”!

UKIDSS paper submitted

Well, the deed has been done, and the paper has finally been submitted to MNRAS and to astro-ph. You can read it if you really want to: Luminosity and surface brightness distribution of K-band galaxies from the UKIDSS Large Area Survey. Here’s a picture from the paper:

UKIDSS K-band Luminosity Function

This is the K-band luminosity function: the number of galaxies per volume as a function of their luminosity, with low luminosity at the left and high luminosity at the right. It’s far from perfect, but hopefully a step in the right direction. There’s quite a bit of incompleteness (missing galaxies) and uncertainty (due to small numbers of galaxies and large-scale structure) at the faint end (left-hand side of the plot). But perhaps more interesting is the disagreement at the bright end (right-hand side). All of the previous results shown on the plot used 2MASS imaging, so this might explain the different results we have found. Specifically, it could be that (1) we use Petrosian magnitudes rather than Kron or total magnitudes, (2) UKIDSS photometry is better than 2MASS photometry, (3) the evolution corrections are different, (4) something else or (5) any combination of the above.

Evolution of Schechter function … so?

Schechter function

This is some work in progress: K-band luminosity function from the UKIDSS Large Area Survey (LAS, black dots), showing the number of galaxies per unit volume depending on the luminosity of the galaxies, from faint (left) to bright (right). I.e., there are lots more small galaxies than big galaxies.

I’ve fit several Schechter functions to the data. This is a convenient way of describing the luminosity function in terms of three numbers: the slope of the faint end (alpha), the luminosity brighter than which the number of galaxies drops off rapidly (M-star) and the number of galaxies per unit volume at M-star (phi-star). To fit the Schechter functions I’ve used only a portion of the data, as shown in the figure. For example, for the green curve, I’ve used only the black points brighter than (to the right of) absolute magnitude -21.

Now here’s the point. At high redshift, it is possible to see only the brightest galaxies. So we would be able to plot only the black points towards the right-hand side of the figure. But what effect would this have on the Schechter function? Even if we assume the luminosity function does not vary with redshift, our Schechter function fits would! In fact, if we relied on the Schechter function fit to tell us how the galaxy population varied with redshift (a silly thing to do, but people do it all the time), we would infer that the high-redshift galaxy population was (1) brighter (2) more dominated by small galaxies and (3) less abundant than the low-redshift galaxy population.

(Now (1) and (3) are probably true, but we don’t need the Schechter function to tell us. Not so sure about (2).)

Moral: don’t rely on the Schechter function!

A galaxy being emitted by a star

Star emitting galaxy

Why is the universe so crowded? This kind of thing is really messing up my data!

Makes me want to work with simulations…

pIDLy: IDL within Python

Now Python and IDL can talk to each other (okay, Python talks to IDL and IDL does what it’s told), using pIDLy (pronounce as you please). I experimented with a few other solutions available online but couldn’t get them to work. So I cobbled this one together with surprisingly little trouble, thanks largely to pexpect.

IDL code miscellany

IDLdoc 3.0 (more info here) gives my badly-written bits of IDL the deceptive appearance of being well designed, useful and user-friendly. So I’ve made a few available here for your enjoyment.

UKIDSS at ESO

ESOJust back from my first visit to Garching (near Munich). ESO, to be more specific. The reason for the visit: a three-day workshop on Science from UKIDSS.

Here’s the gist of it. Lots of good results already, lots of work in progress, and a sense that UKIDSS has come of age: the needle-in-a-haystack hunters now have enough hay (they hope!) to find some record-breaking needles (the smallest, nearest or furthest known luminiferous objects in the Universe) and the (Galactic or extra-Galactic) Gallup pollers have now canvassed enough individuals (stars or galaxies) to be reasonably confident about the views of the whole population.

I’m one of the extra-Galactic Gallup pollers. Some slides from the talk I gave on the final morning are on my (small but growing!) publications page.

Next tasks:

  • Investigate the problem with deblending of large galaxies
  • Write paper
  • Write thesis
  • Get job

Filtering astro-ph with CosmoCoffee

One of the things mentioned in Sarah Bridle’s talk at YAM last week was a filter for arXiv.org provided by CosmoCoffee. I decided to sample it this week.

CosmoCoffee

After creating an account on CosmoCoffee, you will need to edit the keywords in your profile to reflect your interests (well, I did!). Then click on Arxiv new filter and you’re off!

Here are my settings:

  • Arxives in order of interest: astro-ph
  • Arxiv New search key strings: galaxy (redshift )?survey, luminosity (function|density), surface brightness, UKIDSS, UKIRT, VISTA, SDSS, Sloan, WFCAM, near infrared, stellar mass, star formation (rate|history), galax, Bayes, redshift, astro-ph, ADS, extragalactic

And here are the results:

  • Monday: 54 new on astro-ph, of which 21 made it through the filter. These were not only filtered but also sorted by CosmoCoffee so the most relevant were listed first. Very useful.
  • Tuesday: 76 on astro-ph; 27 on CosmoCoffee. It missed The Future of Cosmology by George Efstathiou, which was a fun read. But I can’t think of any way to adjust the CosmoCoffee filter to catch papers like this, without catching loads of other cosmology papers. But in the full astro-ph listings I skimmed over the paper on globular clusters and their host galaxies, which was ranked highly by CosmoCoffee.
  • Wednesday: 42 on astro-ph; 14 on CosmoCoffee, filtered and sorted just right.
  • Thursday: 52 on astro-ph; 22 on CosmoCoffee. Hmm, wish I knew more about dwarf galaxies.
  • Friday: 30 on astro-ph; 7 on CosmoCoffee (must be getting near Christmas). Glad I skimmed through astro-ph, as my filter settings excluded this fascinating article on the history of dark energy. Apparently Newton thought of it (or something like it) 320 years ago!

Conclusion: based on this week’s experience, I’m likely to miss interesting and relevant papers if I use either astro-ph or CosmoCoffee … so I’ll use both! Start each day adagio on CosmoCoffee, accelerando poco a poco, then prestissimo through astro-ph.