Penn State Event Data

From visone manual
Revision as of 10:50, 21 August 2013 by Lerner (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Penn State Event Data Project (formerly Kansas Event Data System) is a long term project aimed at collecting events among political actors. The events are extracted from regular news reports in a semi-automatic fashion. Specifically, we use data encoding events in or around the Persian Gulf region in the time from 1979 to 1999 available at http://eventdata.psu.edu/data.dir/gulf.html under the link Gulf data coded from full stories. This data set consists of more than 304 thousand events; the first lines in the file look like this.

 790415	USA	SAU	042	ENDORSE     
 790415	SAU	USA	081	MAKE AGREEME
 790415	EEC	UNK	031	MEET        
 790415	KEN	TAZ	211	SEIZE POSSES
 790415	BEL	ZAR	032	VISIT       
 ...

The file encodes a tab-separated table whose rows have the following components (from left to right).

  • A 6-digit number encodes the time when the event happened, given by the day. For instance, the string 790415 refers to April 15, 1979.
  • The source actor is the one who initiates the event. This can be a country (e.g., SAU for Saudi Arabia), an organization (e.g., UNO), or - depending on the data set - even an individual person.
  • The target actor is the recipient of the event and is coded in the same way as the source.
  • The event type is a three-digit number giving the event code as defined in the World Event/Interaction Survey (WEIS) project. It specifies what happened in the event.
  • The last column is a textual description of the WEIS event type.

A slightly modified version of the file (in which we added a column for the event type and one for the event weight) is made available in Gulf_events_preprocessed.zip. Analysis of the resulting event file is illustrated in the tutorial on event networks. The remainder of this page explains how the preprocessing steps have been done.

We edit the file by adding a header (i.e., column labels separated by tab characters) in the first line. Useful column labels are, for instance, Time, Source, Target, WEIS.code, and Description.

The WEIS event types have been mapped to a numerical cooperation/conflict scale (called Goldstein weights) where a positive event weight indicates a cooperative or friendly event and a negative weight indicates a conflictive or hostile event. The mapping from WEIS codes to Goldstein weights is available in the file Goldstein_weights.zip.

To make use of these event weights we have to merge them into the event table. This can be done with any software that supports to join tables; for instance with the R software for statistical computing (also see the tutorial on using the visone-R connection). The following code examples are for R. To load the event list into a table change first the R working directory with the setwd command to the folder where the event file is located. To load the events into a data frame and see some summary statistics type.

 gulf.data <- read.table("GULF99.ALL.events", header = TRUE, sep = "\t")
 summary(gulf.data)

Note that the WEIS.code is treated as numeric (although character strings seems to be more appropriate); this doesn't matter since we only need the code to merge the event file with the goldstein weights. To load the mapping from WEIS codes to goldstein weights (available in the file Goldstein_weights.zip) type.

 goldstein.weights <- read.table("goldstein_weights", header = TRUE, sep = "\t")

To merge the Goldstein weights into the events and see summary statistics use the following commands

 gulf.data <- merge(gulf.data, subset(goldstein.weights, select=c(WEIS.code,Goldstein.weight)), 
                    by.x="WEIS.code", by.y="WEIS.code")
 summary(gulf.data)

Note that this reorders the events.

We further code explicitly whether the event is conflictive or cooperative.

 gulf.data$Type <- "cooperation"
 gulf.data$Type[gulf.data$Goldstein.weight < 0.0] <- "conflict"
 gulf.data$Type <- as.factor(gulf.data$Type)
 summary(gulf.data)

And export to a file.

 write.table(gulf.data, file="gulf_events_preprocessed.csv", sep=";", row.names=FALSE)

For convenience, the resulting file is made available in Gulf_events_preprocessed.zip.