How to 'unpack' the Schawinski+ (2010) FITS?

In the Galaxy Zoo data page, under the heading AGN host galaxies, one reads:

This sample is presented in the Galaxy Zoo 1 paper on AGN host galaxies (Schawinski et al., 2010, ApJ, 711, 284). It is a volume-limited sample of galaxies (0.02 < z < 0.05, M_z < –19.5 AB) with emission line classifications, stellar masses, velocity dispersions and GZ1 morphological classifications. When using this sample, please cite Schawinski et al. (2010) and Lintott et al. (2008, 2011).

Download here: http://galaxy-zoo-1.s3.amazonaws.com/schawinski_GZ_2010_catalogue.fits.gz

I have successfully downloaded that file, and unzipped it. However, when I tried reading it using TOPCAT, I find that the FITS has (per TOPCAT) just one row, with 15 columns! 😮 Using the Table Browser, I find that the columns have the same names as given on the GZ data page ("OBJID", "RA", "DEC", etc), and that each cell apparently holds all the records, concatenated and separated by a comma.

There are a few tricks I have not yet tried, so I may be able to 'decrypt' this; however, if any reader knows of a sure-fired way ...

Posted March 3, 2015 8:08 PM

by c_cld

see past discussion http://www.galaxyzooforum.org/index.php?topic=281570.msg650401 #msg650401 of same pb on Moses catalog from Schawinski's data stored in column-major order

Posted March 4, 2015 8:26 AM

by JeanTate in response to C_cld's comment.

Thanks C_cld, I had forgotten about that thread.

I'll try Fv (and let you - and all other readers - know how it worked; may be several days yet though).

Posted March 4, 2015 1:00 PM

by mlpeck

I got it.

This is a quick and dirty job using tools I'm not familiar with (the fits module in astropy). The header line in the linked csv file is messed up but it should otherwise match the contents of the original. The first column looks like a DR7 ID. It was stored as a character string.

https://www.dropbox.com/s/mi8slgc2ss7ce70/sch_10.csv?dl=0

Posted March 4, 2015 4:46 PM

by c_cld in response to JeanTate's comment.

Like @mlpeck before going to Fv you need to transpose the Schavinsky array.

A quick Python code could be

from astropy.io import fits

hdulist = fits.open("C:/Users/......./schawinski_GZ_2010_catalogue.fits") # to be changed to your own path

scidata = hdulist[1].data

len(scidata)

1

scidata.names # columns of the row

['OBJID', 'RA', 'DEC', 'REDSHIFT', 'GZ1_MORPHOLOGY', 'BPT_CLASS', 'U', 'G', 'R', 'I', 'Z', 'SIGMA', 'SIGMA_ERR', 'LOG_MSTELLAR', 'L_O3']

OBJ = 'DR' + scidata.field('OBJID') # prefix alpha preserving type 'string' to avoid automatic conversion error to type 'float' in export csv

RA = scidata.field('RA')

DEC = scidata.field('DEC')

REDSHIFT = scidata.field('REDSHIFT')

GZ1_MORPHOLOGY = scidata.field('GZ1_MORPHOLOGY')

BPT_CLASS = scidata.field('BPT_CLASS')

U = scidata.field('U')

G = scidata.field('G')

R = scidata.field('R')

I = scidata.field('I')

Z = scidata.field('Z')

SIGMA = scidata.field('SIGMA')

SIGMA_ERR = scidata.field('SIGMA_ERR')

LOG_MSTELLAR = scidata.field('LOG_MSTELLAR')

L_O3 = scidata.field('L_O3')

Catalog = zip( OBJ[0], RA[0], DEC[0], REDSHIFT[0], GZ1_MORPHOLOGY[0], BPT_CLASS[0], U[0], G[0], R[0], I[0], Z[0], SIGMA[0], SIGMA_ERR[0], LOG_MSTELLAR[0], L_O3[0] ) # transposition

len(Catalog)

47675

Catalog[0]

('DR587722982288785725', 207.09436829000001, -0.76696659, 0.026111800223588898, 0, 2, 17.560625, 16.288317, 15.694989, 15.387244, 15.154959, 41.062729, 5.588398, 9.9080954, 1.8852608528679285e+39)

Catalog[47674]

('DR588298664651718676', 199.37548229000001, 47.783146080000002, 0.026882499456405601, 1, 0, 16.65407, 14.847166, 14.082373, 13.692595, 13.418469, 209.21176, 3.515888, 11.17951, 0.0)

Import csv

Filepath = "C:/Users/……../catalog.csv" ## to be changed to your own path/name export catalog

fieldnames = scidata.names

with open(filepath, "wb") as csvfile:

... writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

... writer.writeheader()

... write = csv.writer(csvfile)

... write.writerows(Catalog)

csvfile.close()

same result as @mlpeck's file but DR7 ObjId prefixed by DR (avoid excel float truncation) 😃

Posted March 6, 2015 9:46 AM

by vrooje admin, scientist

Yeah, these tables were all written in IDL, which often stores FITS tables in a format that TOPCAT doesn't like. The solution is to use IDL or Python or similar to read the file, then re-write it back in a way that won't make TOPCAT think it's 1 row of arrays. I haven't e-mailed the creator of TOPCAT to see whether he has a fix for this (he must have dealt with it before) but that might be a useful solution, if you haven't already found one.

Side note: as mlpeck says, definitely store the SDSS object ID as a string and not any kind of number. If you have, say, a csv file instead of FITS, or otherwise don't specify the data type, some programs will try to read it as a Long integer and will occasionally do horrible things like rounding, or putting the number in scientific notation, or just truncating it altogether. Best avoided. 😃

Posted March 6, 2015 11:36 AM

by JeanTate in response to vrooje's comment.

I emailed Mark Taylor, and got a very quick response; he is OK with me quoting him:

thanks for passing this on. Contrary to vrooje's assumption,
I haven't dealt with it before!

I've made an adjustment which I think ought to cause topcat to
work happily with files like this. You can find a pre-release
version (bc43afb6) here:

ftp://andromeda.star.bris.ac.uk/pub/star/topcat/pre/topcat-full_colfits.jar

It appears to work on the file you mention. If you know of files
that might be similar I'd be grateful if you or other GZ people
could try it out to see if it behaves as expected. If it does,
I'll make sure this fix is in the next public release.

Posted March 7, 2015 6:15 PM

by JeanTate in response to JeanTate's comment.

I finally got a chance to try it out (Mark's prerelease version that is). It works, yay!

I also tried it out on the MOSES catalog (see this GZ forum post for details), and TOPCAT could open that just fine too.

A big thanks to Mark Taylor!

Posted March 13, 2015 2:06 PM

by vrooje admin, scientist

I'm so glad this fix is going to be released soon. I've downloaded it and will try it when I have a chance! 😃

Posted March 28, 2015 5:56 PM