Galaxy Zoo Talk

Computers vs. Humans in Galaxy Classification

  • williamaskew by williamaskew

    Computers vs. Humans in Galaxy Classification. Article and citation

    http://aasnova.org/2016/04/27/computers-vs-humans-in-galaxy-classification/

    Posted

  • Budgieye by Budgieye moderator

    Thank you for posting that. I'll put it in The Index. It is always difficult to decide between fuzzy spirals and ellipticals, so good luck to the computer software.

    I liked the part....

    " upcoming surveys like the Large Synoptic Survey Telescope (LSST), which will produce billions of galaxy images when it comes online?"

    Posted

  • williamaskew by williamaskew in response to Budgieye's comment.

    I caught that too. exciting

    Posted

  • JeanTate by JeanTate

    The paper the PR is based on is "Computer-generated visual morphology catalog of ~3,000,000 SDSS galaxies", by Kuminski & Shamir (last revised 27 Mar 2016, arXiv:1602.06854). Here's the abstract:

    We applied computer analysis to classify the broad morphological type of ~3,000,000 SDSS galaxies. The catalog provides for each galaxy the DR8 object ID, right ascension, declination, and the certainty of the automatic classification to spiral or elliptical. The certainty of the classification allows controlling the accuracy of a subset of galaxies by sacrificing some of the least certain classifications. The accuracy of the catalog was tested using galaxies that were classified by the manually annotated Galaxy Zoo catalog. The results show that the catalog contains ~900,000 spiral galaxies and ~600,000 elliptical galaxies with classification certainty that has a statistical agreement rate of ~98% with Galaxy Zoo debiased 'superclean' dataset. That also demonstrates the ability of computers to turn large datasets of galaxy images into structured catalogs of galaxy morphology. The catalog can be downloaded at this http URL , and can be accessed through public tables on CAS: public.broadMorph.LargeGM, public.broadMorph.LargeWnnGM, and public.broadMorph.SpectraGM. The image analysis software that was used to create the catalog is also publicly available.

    Shamir has been working on this kind of thing for many years, and this is nowhere his first paper on computer classification of SDSS galaxies, explicitly referencing GZ ...

    Posted

  • Budgieye by Budgieye moderator

    How long to do a billion classifications? If Zooites did a million a month, then it would take a thousand months, or about 83 years to do them.

    Thank you for the link. JeanTate It would be nice if computers could do the boring classifications, but I suppose eventually, humans will have to check them.

    Posted

  • JeanTate by JeanTate in response to Budgieye's comment.

    How long to do a billion classifications?

    The PR says this:

    How do we handle the data from upcoming surveys like the Large Synoptic Survey Telescope (LSST), which will produce billions of galaxy images when it comes online?

    But how realistic is this? Sure, the LSST will scan considerably more of the sky than SDSS did, but certainly not more than ~ten times more, say.

    Sure, LSST data will be 'deeper' than SDSS, and perhaps even DECaLS, so fainter galaxies will be images, but nowhere near as deep as Hubble.

    Too, the LSST will likely have a better angular resolution than SDSS (which, from memory, averaged ~1.4"), but not that much better.

    I think it's far more likely that the LSST will produce clearly classifiable images of perhaps only 10-30 times more than SDSS did. The vast majority of 'new' galaxy images, from the LSST, will be formless blobs, like the majority of those with z > 0.25 in GZ1's Table 3

    GZ1 classified ~1 million galaxies; of those with Petrosian radii < 4", say, how many did we zooites classify, unambiguously, as spiral or elliptical? I don't know, but I think far less than 1%.

    Posted

  • mlpeck by mlpeck

    @JeanTate should appreciate this. The official published version of Kuminski and Shamir (2016) is freely available. The web version is at http://dx.doi.org/10.3847/0067-0049/223/2/20 and the pdf at iopscience.iop.org/article/10.3847/0067-0049/223/2/20/pdf. If the links fail just scroll down to the bottom of the PR notice linked in the OP.

    I wonder who deserves credit for lifting the paywall on this one. The authors I suppose -- Shamir at least seems pretty committed to open science.

    Posted

  • mlpeck by mlpeck in response to JeanTate's comment.

    But how realistic is this? Sure, the LSST will scan considerably more
    of the sky than SDSS did, but certainly not more than ~ten times more,
    say.

    Sure, LSST data will be 'deeper' than SDSS, and perhaps even DECaLS,
    so fainter galaxies will be images, but nowhere near as deep as
    Hubble.

    Too, the LSST will likely have a better angular resolution than SDSS
    (which, from memory, averaged ~1.4"), but not that much better.

    Here's the semitechnical introduction to LSST: Ivezic et al. 2014: LSST: FROM SCIENCE DRIVERS TO REFERENCE DESIGN AND ANTICIPATED DATA PRODUCTS.

    A couple of factoids gleaned from scanning the paper: The telescope will have about 50 times the etendue (basically a measure of throughput) of SDSS. It will have about double the resolution and final coadded images will go about 5 magnitudes deeper than SDSS. They expect to produce about the same amount of imaging data per night (āˆ¼ 15TB) as SDSS did during its lifetime.

    Of mild personal interest to me it seems the main data archive site will be in Champaign, Illinois.

    Posted

  • mlpeck by mlpeck

    I thought Kuminski & Shamir's Table 5, which lists a sample of galaxies with high spiral certainty and redshifts z > 0.4, to be quite astonishing if it held up. Did it?

    No. After downloading their catalogs, getting the positions of the objects listed in table 5 and checking them in SDSS I found:

    • All objects have spectra.
    • Four of those spectra have too low S/N to get reliable redshifts (and were flagged as unreliable)
    • One had an erroneous redshift and was unflagged. That was object 2, which I estimate has a real redshift of z ā‰ˆ 0.05.
    • The other 3 had good redshifts that are not even close to the ones reported in table 5. I also checked the photometric z's and those also weren't even close.

    Here is the first example from their table, which I think almost anyone could tell by eye can't plausibly be at zā‰ˆ0.4. SDSS DR12 finder chart image:

    enter image description here

    Spectrum:

    enter image description here

    Here is a list of positions for the 8 objects, suitable for copying and pasting into the SDSS Image list tool:

    ra,dec
    150.415105059985,14.9493573885173
    233.250874831461,54.5108059513879
    209.133107839662,17.7329636625755
    230.818436308016,22.5139790744938
    2.75225726580075,1.00445855104255
    227.04366937471,11.8507513905237
    125.713226326747,50.3512054310701
    211.887449182686,0.630959897650294

    Posted

  • JeanTate by JeanTate

    Thanks @mlpeck.

    I read this paper ~the day it first appeared on arXiv, and downloaded the catalogs. One thing the authors made, sorta, clear is that some of the z > ~0.3 galaxies, even with spectra, are in fact much closer. One example is of a class I wrote about, many years' ago now, in a GZ forum OOTD post: distant galaxies behind local irregulars/dwarfs ... the SDSS spectrum is of the distant background object, being the brightest blob in the (sometimes quite large) foreground one, and may show some of the foreground's emission lines.

    Like all papers of this kind, you need to be very careful when you start looking at individual objects. Even GZ! šŸ˜®

    Posted

  • JeanTate by JeanTate in response to mlpeck's comment.

    A good way to get a handle on how many good galaxy images will be found in the LSST data is SDSS' Stripe 82^. We GZooites did three (Tables 7, 8, and 9 in GZ2) separate classification exercises using Stripe 82 data. None of the ones we did tried to produce a resolution significantly better than the worst ~third of the co-added data, so while what we classified was certainly deeper - the faint outer edges of many galaxies became clearly 'visible', for example - I don't think there was much improvement in the granularity of the classifications (I haven't checked this in detail, though I have had occasion to check some individual galaxies).

    The biggest addition the LSST will make, IMHO, is covering the southern sky, and the galactic plane; a GZ2-style classification exercise might have ~1-3 million targets (GZ2 had ~300k), with lots of confusion close to the galactic plane (made worse by having essentially no spectroscopic redshifts), and an original GZ-style exercise perhaps as many as 10 million targets (GZ1 had ~900k targets).

    LSST and SDSS (and DECaLS) overlaps would be interesting to study! When the current GZ/DECaLS results are published, a comparison with the GZ1 and GZ2 classifications (especially any in Stripe 82) may be quite an eye-opener.

    If the LSST co-adds are done in 'the best possible way', like Fliri&Trujillo (2016) say, maybe an awful lot of fainter, but bigger-than-mere-blobs, objects might be available for a future GZ-like classification exercise? šŸ˜ƒ

    ^for those who don't know, SDSS imaged a chunk of sky, called "Stripe 82" dozens of times; several different versions of co-adding all the scans were published

    Posted