Galaxy Zoo Talk

Galaxy Zoo's machine-learning competition (with prize money!)

  • JeanTate by JeanTate

    This is the topic of the most recent GZ blog entry, by Kyle Willett, on December 23, 2013: Announcing Galaxy Zoo’s machine-learning competition (with prize money!). It's a really cool idea, and this para summarizes it very well:

    Here’s how the competition works. On the Kaggle website, competitors will be given a large set of JPG galaxy images (taken from Galaxy Zoo 2), as well as a big text file with a few dozen variables for each image. These data are a modified version of the classifications that citizen scientists generated in GZ2 (and published in Willett et al. 2013). The goal for competitors is to come up with an algorithm that will predict what those classifications should be based only on the picture. These algorithms are submitted to Kaggle and tested against a second, private set of GZ2 images and classifications. The highest scores on the new set will win the prize money.

    I haven't - yet - downloaded the JPG galaxy images and the big text file (I intend to), but one aspect of this challenge has me intrigued: to what extent will the winner win because the algorithm best mimics purely human quirks, as opposed to identifying real features of the real galaxies, otherwise too deep in the noise for human eyes and brains to notice?

    As a way to explain what I mean, consider the question of whether spiral galaxies have an intrinsic 'handedness', a cosmic preference for appearing to 'wind left' vs 'wind right' (if you're unfamiliar with this, check out this GZ forum thread for an intro and various analyses). It turns out that the Galaxy Zoo classifications - the original ones, not those in GZ2 - do contain a purely human quirk, a 'handedness bias' that is due to some combo of how humans' brains are wired and experimental design (the Land+2008 paper which investigated this could not completely disentangle these two). So in any machine learning challenge, this purely human bias would need to be reproduced or modeled, to be sure of being among 'the best' algorithms (I wrote about this in Curious Pattern in Longo's 2011 Net Handedness Asymmetries (in SDSS Galaxies)).

    Of course, the GZ2 classifications do not contain votes for 'CW' or 'ACW', so that particular human quirk is not contained in the data. However, it would surely be naive to think (assume) that they are completely free of all such biases.

    Perhaps the winning submission (algorithm) will be one that correctly identifies, and models, a human quirk that the GZ Science Team itself is unaware of?

    Posted