Cataloguing astrophotography data for fun and profit

I’ve written a lot about my telescope setup and what I do with it, but not much about what comes after that.

Mostly, that’s a lot of work in PixInsight to process all the data. I take monochrome images – so every image is just looking through a filter of red, green, blue, or a blend (“luminance”). To turn those into nice colour photos requires a few basic steps:

  • Get rid of all the rubbish images and rank the remaining ones by quality – sharpness of stars, noise levels, and so on
  • Process those images to remove sensor defects (“hot pixels”) and then align them on the best image so that the pixel coordinates in each image line up with the same astronomical coordinates
  • Stack those images, filter by filter, combining all the signal and reducing the noise
  • Combine those “master” (better names would be good) images to form an RGB colour image
  • Post-process for colour balance and any artistic exaggeration

The first few steps above involve every raw photo I take for a given target. I usually take images with an exposure time of about 2 minutes, depending on the target and filter, so that adds up over some nights of imaging. For instance, I have around 450 frames of the Crab Nebula; for mosaic projects I’ve got thousands of frames.

The scope after a lovely clear night

This gets a bit tedious, especially if I want to re-process stuff over and over as I add data over the nights, so I made some tools!

Observing with September

I made a basic Python toolset which uses PostgreSQL as a datastore as well as a NFS share (which rests on my big ZFS NAS) for images. This toolset ingests all the data I produce, every day around 6AM, once the telescope’s computer has pushed any new data across to the store.

It’s called September because in the UK that’s really when we get to start observing properly again, and also after the Observer in Fringe, an excellent scifi TV series.

The “clever bits” are the use of Q3C as a PostgreSQL extension to provide spherical indexing, so I can query based on RA/DEC of an object:

september=# select count(*) from subframes 
  where q3c_radial_query(ra, dec, 83.6,22,5);

This is really helpful because due to some Fun KStars Bugs, sometimes filenames produced by observation software don’t match the reality of the observed target.

So when ingesting data, I use the solve-field utility to re-analyse the frame and store the coordinate data from that solution in the database. This has the added bonus that – since it’s asynchronous with the observation, unlike the automatic plate solving that KStars does to point the telescope, I can be super precise in the analysis and take my time – this is all running on a small server at home which can just chug through it all.

Then the other clever bit is on the egress. Now I have this big database and a folder structure full of files, how do I get it out? Just query by object? Works great till you try doing mosaics. So instead, I search by the object coordinates, cluster with scikit-learn, and export each cluster of images into separate folders, ready to process.

Centroids of images captured around a central target in a 2×3 mosaic
One image package, ready to go

The system also indexes all the calibration files, and takes a guess as to which files relate to which images, so when it exports the data it also spits out the calibration files required to process. Basically, I can copy this directory to a fast local disk on a machine running PixInsight – in the case of this mosaic it’s only about 50 gigabytes – and then get on with processing.

Automating PixInsight

The eagle-eyed will have spotted the metadata files in that package. A lot of the initial processing is kinda repetitive – I don’t need to deal with it all manually. At some point I need to be in the loop to review things visually – though some of that I think I can automate. But I can deal with the rest in code!

PixInsight is a very powerful tool, and is starting to get some features for remote automation as a server, but it’s early days. However, the built in scripting function is very well established by now.

I’ve written a September companion tool for PixInsight which basically takes that folder, with its metadata file, and automates entirely all of the processing up to the point where a human needs to review it – which is basically before registration and stacking.

Could I have written a tool that formatted everything for existing tools like WeightedBatchPreProcessing? Yes, but it’s hard, and matching calibration data to files is still manual. Plus, I know what I need in the way of processing, so I don’t need WBPP’s insane configurability.

Some of the script I’ve written for PI

All it does is automate my manual steps, and that turns out to be pretty straightforward to do. Automating PI’s user interface and interacting with files and so on under Javascript is a bit clunky and PI doesn’t so much have documentation as a big pile of scripts.

This has worked pretty well for a mosaic project I’m working on, but I’m definitely now thinking more about what I can do to automate the dataflow out of the back of my telescope. It’s practically automated for capture – I go outdoors, pop the cover off, power it up and hit go in KStars on my pre-configured schedule – so it would be lovely to have the pre-processing automated as far as possible.

I’d like to get things like star counting and checks against a star catalogue working – I’m going to load in Gaia EDR2 with the same Q3C indexing (limited to stars I can see). This will let me throw away frames that are clearly just cloudy or otherwise unusable.

Ideally, if I can figure out how, I’d love to run PixInsight as an engine on a headless box – just as a network service for things like image analysis. That way I can get as much of the pre-processing done as possible to reduce the friction in my workflow.

What would be really good, though, would be to close the loop. Already I sit down, work out which clusters of images in my mosaic have enough “good” frames versus others, and then update my KStars scheduling job files to add extra observation time to the areas which need more work – but it’s time-consuming, manual, and clunky. It would be awesome to be able to simply point September at a list of targets and target exposure times, and have it spit out tasking – RA, Dec, filters and times – for KStars to automatically consume. Once I’ve got that, my observatory is completely automatic!

Apart from the bit where I have to take the cover off and then trudge out at 2AM to put it back on. But that’s another project involving rather a bit more woodwork.

All wrapped up for the night.

If it’s of interest, I’ll try and tidy this up enough to put on Github/lab – it’s not complex code, it’s just fiddly. I’m pondering about wrapping it up in k8s or some other container environment so it can do clever things in an always-on state like monitoring INDI for new image data etc, but I don’t want to run this on AWS/GCP and given I only have one server at home it’s a bit pointless to be able to distribute it, so it’s just a pile of Python scripts and a bit of Javascript for PI at the moment. Shout if this sounds interesting!