Documentation

In this page a brief documentation of the ARG Database will be provided. More details can be found in this file.

The graph files

The graphs directory that you can download here contains 168 .zip files in which the actual graph files are stored in a compressed format. You will need a decompression utility able to deal with the .zip format in order to extract the graph files, if you have not yet installed one.

The name of the .zip files reflect the kind of graphs that are stored in the corresponding archives; in particular, the name is made of a prefix, which indicates the type of matching that holds between the graphs, and an identifier of the type of graphs (see section References for more details on the graph types). In TABLE 1 there is a list of prefixes with the corresponding meaning; TABLE 2 contains the description of the identifiers used for the types.

The prefix and the type identifier are separated by an underscore (“_”). Examples:

The file si2_m2D.zip has the prefix si2, meaning that it contains pairs of graphs for which there is a graph-subgraph isomorphism relation, and the subgraph contain 20% of the nodes of the full graph; the type identifier is m2D, corresponding to bidimensional meshes with no additional random edges.
The file mcs10_m2D.zip has the prefix mcs10, meaning that it contains pairs of graphs holding a common subgraph whose size is 10% of the size of the graphs; the type identifier is m2D, corresponding to bidimensional meshes with no additional random edges.

TABLE 1: Filename prefixes
Prefix	Description
iso	Pairs of isomorphic graphs
`si2`	Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 20% of the full graph
`si4`	Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 40% of the full graph
`si6`	Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 60% of the full graph
`mcs10`	Pairs of labeled graphs, whose maximum common subgraph size is 10% of the graph size
`mcs30`	Pairs of labeled graphs, whose maximum common subgraph size is 30% of the graph size
`mcs50`	Pairs of labeled graphs, whose maximum common subgraph size is 50% of the graph size
`mcs70`	Pairs of labeled graphs, whose maximum common subgraph size is 70% of the graph size
`mcs90`	Pairs of labeled graphs, whose maximum common subgraph size is 90% of the graph size

TABLE 2: Graph type identifiers
Identifier	Description
`r001`	randomly generated graphs with eta = 0.01
`r005`	randomly generated graphs with eta = 0.05
`r01`	randomly generated graphs with eta = 0.1
`r02`	randomly generated graphs with eta = 0.2
`m2D`	bi-dimensional meshes with rho = 0
`m2Dr2`	bi-dimensional meshes with rho = 0.2
`m2Dr4`	bi-dimensional meshes with rho = 0.4
`m2Dr6`	bi-dimensional meshes with rho = 0.6
`m3D`	tri-dimensional meshes with rho = 0
`m3Dr2`	tri-dimensional meshes with rho = 0.2
`m3Dr4`	tri-dimensional meshes with rho = 0.4
`m3Dr6`	tri-dimensional meshes with rho = 0.6
`m4D`	quadri-dimensional meshes with rho = 0
`m4Dr2`	quadri-dimensional meshes with rho = 0.2
`m4Dr4`	quadri-dimensional meshes with rho = 0.4
`m4Dr6`	quadri-dimensional meshes with rho = 0.6
`b03`	bounded-valence graphs with valence = 3
`b03m`	modified bounded-valence graphs with valence = 3
`b06`	bounded-valence graphs with valence = 6
`b06m`	modified bounded-valence graphs with valence = 6
`b09`	bounded-valence graphs with valence = 9
`b09m`	modified bounded-valence graphs with valence = 9

The files inside each .zip archive are divided into subdirectories, using three levels of grouping. The first level correspond to the type of matching, and the directories are named using the prefixes of TABLE 1. The second level groups together similar graph types; TABLE 2 shows the names of the directories, and the graph types they contain. Finally, the third level corresponds to graph types, and the directories are named after TABLE 2. For example, the graphs in the file si4_r01.zip get expanded into the directory si4/rand/r01/.


Group	Corresponding types
rand	r001, r005, r01 ,r02
m2D	m2D, m2Dr2, m2Dr4, m2Dr6
m3D	m3D, m3Dr2, m3Dr4, m3Dr6
m4D	m4D, m4Dr2, m4Dr4, m4Dr6
bvg	b03, b03m, b06, b06m, b09, b09m

Type groups

The name of the graph files in each .zip archive are obtained from the name of the .zip file by adding a suffix and an extension. The suffix, separated by an underscore (“_&”) from the graph type identifier, indicates the number of nodes in the graphs. The first letter of the suffix is “s“; for small graphs (i.e. up to 100 nodes) and “m” for medium-sized graphs (i.e. above 100 nodes but below 2000); after this letter, there is the decimal representation of the number of nodes. The file extension (separated by a period “.” from the rest of the file name) is made of a letter, which can be “A” or “B“, and two digits, forming a number from 00 to 99, that identifies one of the pairs of matching graphs. Within the pair, the graph with the “A” in the filename extension is the subgraph, for the subgraph-isomorphism matching types. For example, a graph file name could be mcs50_r01_s15.A27, meaning that the graph is the first of the pair #27 of randomly-generated graphs having a maximum common subgraph whose dimension is 50% of size of the graphs, with eta=0.01 and having 15 nodes.

The graph file format

The graphs are stored in a compact binary format, one graph per file. The file is composed of 16 bit words, which are represented using the so-called little-endian convention, i.e. the least significant byte of the word is stored first. Two different formats are used for labeled and unlabeled graphs. The functions to read these two different formats are provided below.

How to read unlabeled graphs

How to read labeled graphs

The ground truth files

For each .zip file in the graphs directory there is a corresponding .gtr file containing the ground truth regarding the pairs of graphs in the .zip archive. For example, the ground truth of the file iso_b03.zip is in iso_b03.gtr.

A ground truth file is a text file containing a line for each pair of graphs in the .zip file;

Unlabeled graphs. The line has two fields, separated by a TAB character (ASCII 9), and is terminated by a LF character (ASCII 10). The first field is the name of the first graph of the pair, while the second field is the number of matchings found.
Labeled graphs. The line has four fields, separated by a TAB character (ASCII 9), and is terminated by a LF character (ASCII 10). The first field is the name of the first graph of the pair, while the other fields are the number of matchings found. More in details, the second field is the number of maximum common sungraphs when the number of attributes of the graphs is fixed to 33% of graph size. Analogously, the third and the fourth fields are respectively the number of maximum common subgraphs when the number of attributes is fixed to 50% and 75% of the graph size.

For example, a line of the iso_b03.gtr file could be iso_b03_m1000.A00 1 to signify that between the graphs iso_b03_m1000.A00 and iso_b03_m1000.B00 there is only one isomorphism, while a line of the mcs10_b03.gtr file could be mcs10_b03_s20.A00 1 2 1 meaning that the graphs mcs10_b03_s20.A00 and mcs10_b03_s20.B00 holds respectively one, two and one maximum common subgraph when the number of attributes of the graphs is fixed to 33%, 50% and 75% of graph size.
All the graph pairs in the iso_* files are present in the corresponding ground truth files. However, for graph-subgraph isomorphisms, and for maxmum common subgraphs only a subset of the pairs have been accounted for in the ground truth files, due to the considerably larger computation time needed to generate this information.

Graphs Generator

The source files of the programs used to generate the graphs composing the database can be found here. These programs have been developed and used under Linux, and are not completely portable to other operating systems, since they use the /dev/urandom pseudo-device for initializing the random number generator. Each program generates a pair of graphs; we have used shell scripts to call repeatedly the programs with the proper arguments for generating the whole database.
In the following table there is an outline of the programs and of their command line arguments.


Name	Arguments	Types generated
`gen`	nodes edges file1 file2	isor001, isor005, iso*r01
`gensub`	subgraph-nodes nodes edges file1 file2	sir001, sir005, si*r01
`genm2D`	nodes extra-edges file1 file2	isom2D, isom2Dr2, isom2Dr4, isom2Dr6
`gensubm2D`	subgraph-nodes nodes extra-edges file1 file2	sim2D, sim2Dr2, sim2Dr4, sim2Dr6
`genm3D`	nodes extra-edges file1 file2	isom3D, isom3Dr2, isom3Dr4, isom3Dr6
`gensubm3D`	subgraph-nodes nodes extra-edges file1 file2	sim3D, sim3Dr2, sim3Dr4, sim3Dr6
`genm4D`	nodes extra-edges file1 file2	isom4D, isom4Dr2, isom4Dr4, isom4Dr6
`gensubm4D`	subgraph-nodes nodes extra-edges file1 file2	sim4D, sim4Dr2, sim4Dr4, sim4Dr6
`genbvg`	nodes valence file1 file2	isob03, isob06, iso*b09
`gensubbvg`	subgraph-nodes nodes valence file1 file2	sib03, sib06, si*b09
`genbvgm`	nodes valence file1 file2	isob03m, isob06m, iso*b09m
`gensubbvgm`	subgraph-nodes nodes valence file1 file2	sib03m, sib06m, si*b09m
`genrndat`	nodes nodesub edges edgesub file1 file2	mcsr005, mcsr01, mcs*r02
`genm2Dat`	nodes nodesub edges extra-edgesub file1 file2	mcsm2D, mcsm2Dr2, mcsm2Dr4, mcsm2Dr6
`genm3Dat`	nodes nodesub edges extra-edgesub file1 file2	mcsm3D, mcsm3Dr2, mcsm3Dr4, mcsm3Dr6
`genm4Dat`	nodes nodesub edges extra-edgesub file1 file2	mcsm4D, mcsm4Dr2, mcsm4Dr4, mcsm4Dr6
`genbvgat`	nodes nodesub edges valence file1 file2	mcsb03, mcsb06, mcs*b09
`genbvgmat`	nodes nodesub edges valence file1 file2	mcsb03m, mcsb06m, mcs*b09m

For each program there is one source file with extension .cc. The programs gen, gensub, genm2D and gensubm2D need to be linked with the VFLIB graph matching library. All other files are self-contained.

Nome utente
Password

	Ricordami Password dimenticata? Registrati