| COW |
Data Set Hosting: Review Procedures |
The criteria for release of an updated
data set by COW include basic data standards of internal data set consistency,
comparability to and combatibility with existing data sets, and high quality
data. These criteria can be met by carefully following the coding rules
defined at the beginning of a project, and working with the COW
entral office to ensure consistency of the data set format and
structure.
When a host believes that a data set is ready for release as
an updated COW data set, he or she will submit the data to COW. A series of checks will then be undertaken before
a version number is assigned and the updated data released.
- A series of automated checks will be conducted to ensure that all countries and years
have been included in the data set (where a data set is cross-national and cross-time),
that all data points are unique (no duplicate records or values), and that country codes
and data points included in the data set match the Correlates of War National System
Membership lists.
- Variable names and value codes will be examined for uniqueness, descriptive accuracy,
and consistency. For example, where possible variable names must match names from
prior data sets, and must accurately describe of the variables' content. Dummy
variables will be coded as 0=no, and 1=yes. Missing value codes will be consistent
(typically -9 when possible) and clearly described in the documentation. Names and
categories deemed unique will be checked for uniqueness.
- A review of procedures will be done to ensure that coding rules have been followed.
- Spot checks of individual data points collected by the individual host will be conducted
to verify data values and source identification.
- Documentation will be reviewed, and source lists will be examined to ensure that every
new data point can be traced to a point of origin.
- The format of the data set (e.g. unit of analysis [country-year, monad-year], file type
[Excel file, Access file, flat text]) will be examined and made consistent with other data
sets.
- In case of problems, the data set may be updated by COW, or may be returned
to the host for further work.
- The COW advisory board may be routinely consulted on issues of data set
structure, coding rules, case coding, and other issues that arise in the
course
of data set review.
- In the case of disagreement between the host and COW about the release
status of the data set, whether such disagreements concern issues of format
or substantive coding decisions, the COW advisory board is available for
consultation and problem resolution.
- The target for final data set release by the COW project and advisory
board is no more than six months after a candidate final release data set
is submitted.