Data input (tutorial): Difference between revisions

From visone manual
Jump to navigation Jump to search
Line 53: Line 53:




== If nothing else works: R console and KNIME connection ==
== If nothing else works: use the R console or KNIME connection ==

Revision as of 18:34, 18 March 2012

Normally, visone reads network data from GraphML files, which should never cause any problems. However, in some cases it is necessary to import data stemming from other sources that can, for instance, export adjacency matrices to comma-separated-value (CSV) tables. This tutorial guides you through the various possibilities to input data into visone.

The usual way: read GraphML

GraphML is the usual file format for visone; it encodes the three types of information that are contained in visone networks: network structure, attributes, and graphical information and it is the only format that does so. To read network data from GraphML files use the file menu, click on open..., select files of type .graphml, and click on the ok button.

The other file types are only needed when you want to import data from other sources that cannot output GraphML.

An overview about the other possibilities

Apart from GraphML, visone can read network data from files that are exported by other network analysis software, including UCINET, Pajek, Siena, and some more. Opening these files is also done via the file menu by selecting the appropriate file type. More information about reading these file types is provided at the end of this tutorial.

A more basic option that should be feasible in most situations is to read network data from comma-separated-value (CSV) files. CSV files are plain-text files looking, for example, like this

  ;A;B;C;D
 A;0;1;1;1
 B;1;0;1;0
 C;1;1;0;0
 D;1;0;0;0

that can be created by spread sheet editors (such as MS Excel), statistical software, most network analysis software, and many more. However, reading data from CSV files is more error-prone since these files do not provide an unequivocal definition about how to interpret them. Most of this tutorial is dedicated to the import of CSV files.

Yet another possibility to create networks directly in visone is to enter them manually (which is only appropriate if the neworks are small and the data is not yet available in electronic format). This option is illustrated in the tutorial introducing the visual network editor.

The variants of comma-separated-value (CSV) files

A comma-separated-value file can be thought of as a plain-text file that encodes a table, i.e., a data array that has rows and columns - sometimes also referred to as a matrix. In a network context, there are different possibilities to encode information about nodes, links, or attribute information in such tables. The first three (adjacency matrices, link lists, and adjacency lists) provide information about nodes and links and can be opened via the file menu by selecting the appropriate file type, attribute tables provide information about node or link attributes and can be opened via the attribute manager. What follows is a short characterization of these file types; more exhaustive treatments are in the following sections.

  • Adjacency matrix files. An adjacency matrix encodes for all pairs of nodes (indexing the rows and columns of the table) whether or not there is a link connecting these nodes. An example is shown in the following.
  ;A;B;C;D
 A;0;1;1;1
 B;0;0;1;0
 C;1;1;0;0
 D;1;0;0;0

The first row and the first column are the labels of the nodes; the remaining part encodes whether there is a link from the node indexing the row to the node indexing the column. For instance, the character 1 in the row indexed by A and the column indexed by B indicates that there is a link going from A to B; the 0 in the row B and column A indicates that there is no link in the reverse direction.

  • Link list files. A link list contains as many rows as there are links in the network and (in its most basic form) a link list contains two columns where the entry in the first column is the identifier for the source node and the entry in the second column denotes the target node of the link. The following example
 A;C
 C;B
 B;A
 A;D

defines four links: from node A to node C, from C to B, etc.

  • Adjacency list files
  • Attribute tables

Adjacency matrix files

Link list files

Adjacency list files

Importing node and link attributes

Other supported formats

If nothing else works: use the R console or KNIME connection