The Catalogue Files & Directories

2.1 The Directory Tree

Each catalogue available at CDS is made of several files stored in a directory of a Unix-like file system.

The directory tree naming conventions exactly follow the standards adopted at CDS in the mid 70's: astronomical catalogues have been assigned a chronological number in categories numbered I to IX (see tree) reflecting the main scientific interest of the catalogue; this numbering system is shared by the CDS and the participating Data centers, mainly NSSDC-ADC (Astronomical Data Center at NASA Space Science Data Center).

Directory Tree of Catalogues at CDS

I/number

Astrometric Catalogues

II/number

Photometric Catalogues (except Radio)

III/number

Spectroscopic Catalogues

IV/number

Cross-Identifications

V/number

Combined Data

VI/number

Miscellaneous Catalogues

VII/number

Non-stellar Objects

VIII/number

Radio Catalogues

IX/number

High Energy Catalogues

J/abbr/Volume/first_page

Publications ordered by Journals, with abbr:

A+A	A&A
A+AS	A&A Suppl.
AJ	Astron. J.
ApJ	Astrophys. J.
ApJS	Astrophys. J., Suppl.
MNRAS	Mon. Not. R. Astron. Soc.
PASP	Publ. Astron. Soc. Pacific
AZh	Astron. Zhurnal (Russia)
PAZh	Pis'ma Astron. Zhurnal (Russia)
AN	Astronomische Nachrichten
AcA	Acta Astronomica
BaltA	Baltic Astronomy
other	Form J/other/abbr/Volume.first_page
	for other journals, abbr being written as the bibcode

The explosion of incoming catalogues from the beginning of 1993 due to the electronic publication (see Chapter 1) lead us to introduce the J category: within this category, the catalogue designation maps the reference of the published paper, e.g. J/A+AS/97/729 for the article published in A&A Suppl. 97, page 729.

Within this new J section, there is therefore no need for an agreement for the numbering of catalogues between data centers; finding out where a catalogue is stored, knowing its reference, is straightforward. But catalogues do not have to stay in this J section for ever: later, more ``consistent'' catalogues could be generated from one or several publications — typically a catalogue is created which is a merging of the results published as a set of several papers.

2.2 File Naming Conventions

All files making up the catalogue or publication are stored in a directory named according to the conventions described above. A description file, which contains the required information needed to understand the origin and the contents of a catalogue, is named ReadMe. The contents of this important file is described in std.

A file named =obsolete=, if existing, means that the catalogue is obsolete — typically is an outdated version. The contents of this file indicates which catalogue can be used instead of the obsolete version.

Besides these 2 ``special'' files — ReadMe always present and =obsolete= existing only for outdated catalogues — the data files are named according to the following rules:

filenames should be compatible with MS-DOS limitations: filename is written name.extension, with at most 8 characters for name and 3 characters for extension; only alphanumeric characters, plus the minus sign and the underscore, are allowed; and case is not significant — filenames are normally displayed in lowercase letters only.
for files corresponding to published material, the names are consistent with the published paper, and we use tablen.extension to refer to the table numbered n in the published paper, fign.extension for the figure numbered n, etc.
if the rule above cannot be applied, we use mnemonic names like main, catalog or data for the main part of the catalogue, refs for the references, notes for the notes, etc...
the extension is related to the format of the file, with the following conventions:
- [.csv] for files containing tabular data as character-separated values i.e. columns separated by a special character, generally the semi-colon ; (see also the .tsv extension)
- [.dat] for files containing the data in plain ascii form. The exact structure of such files — the column layout — is normally described in the ReadMe file.
- [.fit] for FITS files
- [.fih] for FITS headers, i.e.the top part of FITS files containing the keywords with embedded newlines.
- [.gif] for data files containing images in GIF format
- [.jpg] for data files containing images in JPEG format
- [.mpg] for data files containing video sequences in MPEG format
- [.ori] for the original files, when modifications had to be performed and the original files have to be available
- [.pdf] for Adobe's PDF format
- [.ps ] for PostScript files
- [.sam] for samples files, when the whole catalogue can't be stored in the FTP directories. The total number of records is then indicated as the first number in the explanation. See e.g. the USNO Catalogue (I/252)
- [.sty] for style files related to TeX or LaTeX definitions.
- [.tar] for files in Tape ARchive format (Unix), allowing many files to be archived as a single file.
- [.tgz] for Gnu-zipped Tape ARchive format (Unix), a short-hand of the .tar.gz suffix.
- [.tex] for files in plain TeX or in LaTeX.
- [.txt] for files containing text in plain ascii form.
- [.tsv] for files containing tabular data as tab-separated values, i.e. columns separated by the TAB character (see also the .csv extension).

Files may also be Unix–compressed or Gnu-zip compressed; a .Z suffix is appended to the filenames described above in case of Unix compression (the uncompress Unix program has to be used), and a .gz or .z in case of gzip compression (the gunzip public-domain program has to be applied).

Large files may also be cut into pieces, generally not larger than 10 Megabytes. In this case, a numeric suffix of 2 or 3 digits can be added; an example can be found for the Tycho-2 Catalogue, where the data file was split into 20 parts named tyc2.dat.00, tyc2.dat.01, ⋅⋅⋅ tyc2.dat.19.

2.3 Catalogue Subdirectories

It may happen that some catalogues contain a large number of files, as in Catalogue III/166 which contains about 80 stellar spectra corresponding to some standard spectral types. These data files made of just 2-column tables were saved in a subdirectory named sp, and the characteristics of each of these 80 files containing spectra are summarized in a table named spectra.dat which is described in the ReadMe file. In other words, it is possible to describe files with a level of indirection, as a table which details characteristics of files stored in one or several subdirectories.

2.4 Data files

Data files in principle contain only the data, without titles, headers, commments, etc. However introductory comments stored at the beginning of the data files being handy, a possibility of specifying this feature has been added in the Byte-by-byte Description. Two possible ways exist for introductory comments in data files:

by specifying a number of introductory lines, e.g. the first 20 lines are comments.
by specifying a character used for introductory comments, e.g. the first lines having a # as their leftmost character represent introductory comments.

Data files may also contain empty lines – empty lines are ignored wherever they are in the file.

2.5 Index Files

A set of files summarizing the catalogues currently available at CDS is updated regularly (normally on a weekly basis):

cats.all: lists all catalogues (flat ascii)
cats.lis: provides only basic information about each catalogue
cats.tex: is the LaTeX version used for publication in the Bulletin d'Information du CDS
cats.dvi: is the dvi translation of cats.tex which can used for remote display e.g.via XMosaic
cats.new: contains the same information as cats.all, for catalogues acquired during the last month;

Note that a facility exists to query this index remotely: the findcat program, which is a part of the cdsclient package, described in the cdsclient package, described in Chapter 4. /srv/httpd/Pages/catstd/catstd-2.htx