Galaxy Zoo Talk

How to 'unpack' the Schawinski+ (2010) FITS?

  • JeanTate by JeanTate

    In the Galaxy Zoo data page, under the heading AGN host galaxies, one reads:

    This sample is presented in the Galaxy Zoo 1 paper on AGN host galaxies (Schawinski et al., 2010, ApJ, 711, 284). It is a volume-limited sample of galaxies (0.02 < z < 0.05, Mz < –19.5 AB) with emission line classifications, stellar masses, velocity dispersions and GZ1 morphological classifications. When using this sample, please cite Schawinski et al. (2010) and Lintott et al. (2008, 2011).

    Download here: http://galaxy-zoo-1.s3.amazonaws.com/schawinski_GZ_2010_catalogue.fits.gz

    I have successfully downloaded that file, and unzipped it. However, when I tried reading it using TOPCAT, I find that the FITS has (per TOPCAT) just one row, with 15 columns! 😮 Using the Table Browser, I find that the columns have the same names as given on the GZ data page ("OBJID", "RA", "DEC", etc), and that each cell apparently holds all the records, concatenated and separated by a comma.

    There are a few tricks I have not yet tried, so I may be able to 'decrypt' this; however, if any reader knows of a sure-fired way ...

    Posted

  • c_cld by c_cld

    see past discussion http://www.galaxyzooforum.org/index.php?topic=281570.msg650401#msg650401 of same pb on Moses catalog from Schawinski's data stored in column-major order

    Posted

  • JeanTate by JeanTate in response to C_cld's comment.

    Thanks C_cld, I had forgotten about that thread.

    I'll try Fv (and let you - and all other readers - know how it worked; may be several days yet though).

    Posted

  • mlpeck by mlpeck

    I got it.

    This is a quick and dirty job using tools I'm not familiar with (the fits module in astropy). The header line in the linked csv file is messed up but it should otherwise match the contents of the original. The first column looks like a DR7 ID. It was stored as a character string.

    https://www.dropbox.com/s/mi8slgc2ss7ce70/sch_10.csv?dl=0

    Posted

  • c_cld by c_cld in response to JeanTate's comment.

    Like @mlpeck before going to Fv you need to transpose the Schavinsky array.

    A quick Python code could be

    from astropy.io import fits

    hdulist = fits.open("C:/Users/......./schawinski_GZ_2010_catalogue.fits") # to be changed to your own path

    scidata = hdulist[1].data

    len(scidata)

    1

    scidata.names # columns of the row

    ['OBJID', 'RA', 'DEC', 'REDSHIFT', 'GZ1_MORPHOLOGY', 'BPT_CLASS', 'U', 'G', 'R', 'I', 'Z', 'SIGMA', 'SIGMA_ERR', 'LOG_MSTELLAR', 'L_O3']

    OBJ = 'DR' + scidata.field('OBJID') # prefix alpha preserving type 'string' to avoid automatic conversion error to type 'float' in export csv

    RA = scidata.field('RA')

    DEC = scidata.field('DEC')

    REDSHIFT = scidata.field('REDSHIFT')

    GZ1_MORPHOLOGY = scidata.field('GZ1_MORPHOLOGY')

    BPT_CLASS = scidata.field('BPT_CLASS')

    U = scidata.field('U')

    G = scidata.field('G')

    R = scidata.field('R')

    I = scidata.field('I')

    Z = scidata.field('Z')

    SIGMA = scidata.field('SIGMA')

    SIGMA_ERR = scidata.field('SIGMA_ERR')

    LOG_MSTELLAR = scidata.field('LOG_MSTELLAR')

    L_O3 = scidata.field('L_O3')

    Catalog = zip( OBJ[0], RA[0], DEC[0], REDSHIFT[0], GZ1_MORPHOLOGY[0], BPT_CLASS[0], U[0], G[0], R[0], I[0], Z[0], SIGMA[0], SIGMA_ERR[0], LOG_MSTELLAR[0], L_O3[0] ) # transposition

    len(Catalog)

    47675

    Catalog[0]

    ('DR587722982288785725', 207.09436829000001, -0.76696659, 0.026111800223588898, 0, 2, 17.560625, 16.288317, 15.694989, 15.387244, 15.154959, 41.062729, 5.588398, 9.9080954, 1.8852608528679285e+39)

    Catalog[47674]

    ('DR588298664651718676', 199.37548229000001, 47.783146080000002, 0.026882499456405601, 1, 0, 16.65407, 14.847166, 14.082373, 13.692595, 13.418469, 209.21176, 3.515888, 11.17951, 0.0)

    Import csv

    Filepath = "C:/Users/……../catalog.csv" ## to be changed to your own path/name export catalog

    fieldnames = scidata.names

    with open(filepath, "wb") as csvfile:

    ... writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    ... writer.writeheader()

    ... write = csv.writer(csvfile)

    ... write.writerows(Catalog)

    csvfile.close()

    same result as @mlpeck's file but DR7 ObjId prefixed by DR (avoid excel float truncation) 😃

    Posted

  • vrooje by vrooje admin, scientist

    Yeah, these tables were all written in IDL, which often stores FITS tables in a format that TOPCAT doesn't like. The solution is to use IDL or Python or similar to read the file, then re-write it back in a way that won't make TOPCAT think it's 1 row of arrays. I haven't e-mailed the creator of TOPCAT to see whether he has a fix for this (he must have dealt with it before) but that might be a useful solution, if you haven't already found one.

    Side note: as mlpeck says, definitely store the SDSS object ID as a string and not any kind of number. If you have, say, a csv file instead of FITS, or otherwise don't specify the data type, some programs will try to read it as a Long integer and will occasionally do horrible things like rounding, or putting the number in scientific notation, or just truncating it altogether. Best avoided. 😃

    Posted

  • JeanTate by JeanTate in response to vrooje's comment.

    I emailed Mark Taylor, and got a very quick response; he is OK with me quoting him:

    thanks for passing this on. Contrary to vrooje's assumption,
    I haven't dealt with it before!

    I've made an adjustment which I think ought to cause topcat to
    work happily with files like this. You can find a pre-release
    version (bc43afb6) here:

    ftp://andromeda.star.bris.ac.uk/pub/star/topcat/pre/topcat-full_colfits.jar

    It appears to work on the file you mention. If you know of files
    that might be similar I'd be grateful if you or other GZ people
    could try it out to see if it behaves as expected. If it does,
    I'll make sure this fix is in the next public release.

    Posted

  • JeanTate by JeanTate in response to JeanTate's comment.

    I finally got a chance to try it out (Mark's prerelease version that is). It works, yay!

    I also tried it out on the MOSES catalog (see this GZ forum post for details), and TOPCAT could open that just fine too.

    A big thanks to Mark Taylor!

    Posted

  • vrooje by vrooje admin, scientist

    I'm so glad this fix is going to be released soon. I've downloaded it and will try it when I have a chance! 😃

    Posted