Galaxy Zoo Talk

Are these two galaxies similarly 'smooth and rounded with no sign of a disk'?

  • JeanTate by JeanTate

    A -> enter image description here <- A

    B -> enter image description here <- B

    These come from Kaggle: Galaxy Zoo - The Galaxy Challenge, which is the topic of a recent GZ blog post: Announcing Galaxy Zoo’s machine-learning competition (with prize money!). Specifically, they are in a Kaggle forum thread, by gregl: is the input data now 100% correct?:

    In the context of the issue from last week leading to the reset of this competition, I am looking more carefully into the data and am I not sure I fully understand the classification process.

    Let's take an example : as far as my picture similarity metric is concerned, galaxies 100122 and 454922 are pretty close to each other. Looking at the pictures confirms this. However, looking them up in the training solution file, their class1 is already different. 454922 is classified 93.5% of the time in class1.2 while 100122 is 73.8% in class 1.1. The galaxies with such a high value for class 1.2 don't usually look like 454922 at all.

    Am I missing something or could there still be errors in the data set?

    The references to 'class1', 'class1.2', and 'class1.1' may be a bit obscure, but they refer to the top level question in Galaxy Zoo 2 ("Is the galaxy simply smooth smooth and rounded, with no sign of features or disk?") and two of its three answers ("smooth" - class1.1, and "features or disk" - class1.2); for more on how the GZ2 questions/decision tree are described, for the Kaggle challenge, see here.

    What do you, zooites who regularly hang out here and read threads like this, think?

    Specifically, with nothing else to guide you, do you agree that we - zooites collectively - made the right call in saying that A is more 'smooth' than 'features of disk' (by a ~3:1 margin)? that B is almost certainly (~94%) 'features or disk'? If not, why not?

    Posted