Galaxy Zoo Talk

Galaxy Morphology: Human concept of shape wrong approach? New paper

  • JeanTate by JeanTate

    Galaxy Morphology: Human concept of shape wrong approach? New paper is a thread started by zutopian on July 01, 2014:

    Here is a new paper.:

    Estimating the distribution of Galaxy Morphologies on a continuous space

    The incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.

    Giuseppe Vinci, Peter Freeman, Jeffrey Newman, Larry Wasserman, Christopher Genovese
    (Submitted on 29 Jun 2014)
    http://arxiv.org/abs/1406.7536

    While it didn't attract much attention, I think it's perfect for this new section of GZ Talk! 😃

    Posted

  • vrooje by vrooje admin, scientist

    I've only skimmed it, but as I understand it these authors are using a statistical technique called dictionary learning (where the method learns on the fly what the important "vocabulary" for the problem is, instead of assuming it in advance) and sparse coding (where you use as few dictionary elements as possible while still modeling the data as closely as possible) to analyze galaxy images. It looks like a collaboration between at least one astronomer and a number of statisticians.

    I think this is an interesting approach and it's always good to see a rigorous statistical approach applied to astrophysics, especially morphology. They seem to be basing their analysis only on the single-band images and not on any other metadata such as color or redshift, which is probably a good thing. And I also agree that boolean categorization (e.g. forcing things to be either "spiral" or "elliptical") does lead to loss of information and could potentially be obscuring a more complete understanding. I do like the idea of characterizing a galaxy in a more flexible way by saying "here are the important vectors to consider [the basis images the program has learned] and this galaxy is this specific combination of those". However, the paper doesn't give much consideration to the fact that some of the existing categories in galaxy morphology are based on known underlying physics, and as such are quite valuable. I'm not quite sure I'm ready to completely throw away the category of "spiral", for example, when that for me is code for some very specific physical processes.

    On the other hand, there are other methods that try to do this sort of thing using pre-selected basis images that are really quite separate from the physics, such as wavelets and fourier modes. In those cases you may be able to perfectly and completely model a galaxy, but you might also have no idea how to physically interpret those results. The basis images that this method learns are probably clues to some combination of physics and what statisticians call "measurement error," by which they mean anything that gets in between the true data and what we actually observe. Separating the underlying physics from the measurement error is a difficult problem and as far as I can see a treatment of that is beyond the scope of this paper.

    That the paper doesn't deal much with physics is not a surprise as it's a statistics paper, and this is an interesting statistical method from which I suspect physical conclusions may be able to be drawn after additional analysis. I'm not sure whether the authors could make good on their implication that this method would allow for new understanding versus something like Galaxy Zoo, though.

    The paper doesn't mention Galaxy Zoo, and I wonder what the authors would say if I were to posit that GZ, while still making some underlying assumptions about categories, chooses those categories on a physically motivated basis and then allows for a much better exploration of that parameter space than typical visual classification methods. In other words, GZ takes an approach very similar to the italicized statement in the second paragraph above. I think it would be fun to compare methods with these authors, and we might learn a lot from an exercise like that.

    Just my first thoughts!

    Posted

  • zutopian by zutopian

    Thanks for briefly presenting the paper and for your interesting comments!

    Posted

  • Peter_Dzwig by Peter_Dzwig

    I started to leave a similar comment to Vrooje's. Human Classification isn't ideal, but it is the best that we have got. We have to remember that human classification is limited by what we understand at the time. Think of the advances in astronomy of the last twenty - thirty year. That alone has changed our perceptions. Just look at what GZ has taught us about the bucket classification of "irregulars". Historically astronomers have stuck things in there which we "didn't fit" elsewhere, but Zooites have found whole new classes of objects hiding there.

    Is ML (Machine Learning) an alternative? Any ML/AI technique requires some kind of training set and that training set, even with Dictionary Learning, has to be selected at some level by humans. Some areas of ML, particularly neural nets and genetic algorithms are notoriously difficult to "audit" so that you can understand the decisions that they have made.

    None of which is to say that they aren't a valuable tool if you have an initial set that you can characterise well and a large dataset which you need to search for similar objects. But it won't (it can't) catch them all. But then nor can humans.

    Posted