In this page a brief documentation of the ARG Database will be provided. More details can be found in this file.
The graph files
The graphs directory that you can download here contains 168 .zip files in which the actual graph files are stored in a compressed format. You will need a decompression utility able to deal with the .zip format in order to extract the graph files, if you have not yet installed one.
The name of the .zip files reflect the kind of graphs that are stored in the corresponding archives; in particular, the name is made of a prefix, which indicates the type of matching that holds between the graphs, and an identifier of the type of graphs (see section References for more details on the graph types). In TABLE 1 there is a list of prefixes with the corresponding meaning; TABLE 2 contains the description of the identifiers used for the types.
The prefix and the type identifier are separated by an underscore (“_”). Examples:
- The file si2_m2D.zip has the prefix si2, meaning that it contains pairs of graphs for which there is a graph-subgraph isomorphism relation, and the subgraph contain 20% of the nodes of the full graph; the type identifier is m2D, corresponding to bidimensional meshes with no additional random edges.
- The file mcs10_m2D.zip has the prefix mcs10, meaning that it contains pairs of graphs holding a common subgraph whose size is 10% of the size of the graphs; the type identifier is m2D, corresponding to bidimensional meshes with no additional random edges.
Prefix | Description |
---|---|
iso | Pairs of isomorphic graphs |
si2 |
Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 20% of the full graph |
si4 |
Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 40% of the full graph |
si6 |
Pairs of graph-subgraph isomorphic unlabeled graphs; the subgraph size is 60% of the full graph |
mcs10 |
Pairs of labeled graphs, whose maximum common subgraph size is 10% of the graph size |
mcs30 |
Pairs of labeled graphs, whose maximum common subgraph size is 30% of the graph size |
mcs50 |
Pairs of labeled graphs, whose maximum common subgraph size is 50% of the graph size |
mcs70 |
Pairs of labeled graphs, whose maximum common subgraph size is 70% of the graph size |
mcs90 |
Pairs of labeled graphs, whose maximum common subgraph size is 90% of the graph size |
Identifier | Description |
---|---|
r001 |
randomly generated graphs with eta = 0.01 |
r005 |
randomly generated graphs with eta = 0.05 |
r01 |
randomly generated graphs with eta = 0.1 |
r02 |
randomly generated graphs with eta = 0.2 |
m2D |
bi-dimensional meshes with rho = 0 |
m2Dr2 |
bi-dimensional meshes with rho = 0.2 |
m2Dr4 |
bi-dimensional meshes with rho = 0.4 |
m2Dr6 |
bi-dimensional meshes with rho = 0.6 |
m3D |
tri-dimensional meshes with rho = 0 |
m3Dr2 |
tri-dimensional meshes with rho = 0.2 |
m3Dr4 |
tri-dimensional meshes with rho = 0.4 |
m3Dr6 |
tri-dimensional meshes with rho = 0.6 |
m4D |
quadri-dimensional meshes with rho = 0 |
m4Dr2 |
quadri-dimensional meshes with rho = 0.2 |
m4Dr4 |
quadri-dimensional meshes with rho = 0.4 |
m4Dr6 |
quadri-dimensional meshes with rho = 0.6 |
b03 |
bounded-valence graphs with valence = 3 |
b03m |
modified bounded-valence graphs with valence = 3 |
b06 |
bounded-valence graphs with valence = 6 |
b06m |
modified bounded-valence graphs with valence = 6 |
b09 |
bounded-valence graphs with valence = 9 |
b09m |
modified bounded-valence graphs with valence = 9 |
The files inside each .zip
archive are divided into subdirectories, using three levels of grouping. The first level correspond to the type of matching, and the directories are named using the prefixes of TABLE 1. The second level groups together similar graph types; TABLE 2 shows the names of the directories, and the graph types they contain. Finally, the third level corresponds to graph types, and the directories are named after TABLE 2. For example, the graphs in the file si4_r01.zip
get expanded into the directory si4/rand/r01/
.
Group | Corresponding types |
---|---|
rand | r001, r005, r01 ,r02 |
m2D | m2D, m2Dr2, m2Dr4, m2Dr6 |
m3D | m3D, m3Dr2, m3Dr4, m3Dr6 |
m4D | m4D, m4Dr2, m4Dr4, m4Dr6 |
bvg | b03, b03m, b06, b06m, b09, b09m |
Type groups
The name of the graph files in each .zip
archive are obtained from the name of the .zip
file by adding a suffix and an extension. The suffix, separated by an underscore (“_&”) from the graph type identifier, indicates the number of nodes in the graphs. The first letter of the suffix is “s
“; for small graphs (i.e. up to 100 nodes) and “m
” for medium-sized graphs (i.e. above 100 nodes but below 2000); after this letter, there is the decimal representation of the number of nodes. The file extension (separated by a period “.” from the rest of the file name) is made of a letter, which can be “A
” or “B
“, and two digits, forming a number from 00
to 99
, that identifies one of the pairs of matching graphs. Within the pair, the graph with the “A
” in the filename extension is the subgraph, for the subgraph-isomorphism matching types. For example, a graph file name could be mcs50_r01_s15.A27, meaning that the graph is the first of the pair #27 of randomly-generated graphs having a maximum common subgraph whose dimension is 50% of size of the graphs, with eta=0.01 and having 15 nodes.
The graph file format
The graphs are stored in a compact binary format, one graph per file. The file is composed of 16 bit words, which are represented using the so-called little-endian convention, i.e. the least significant byte of the word is stored first. Two different formats are used for labeled and unlabeled graphs. The functions to read these two different formats are provided below.
How to read unlabeled graphs
How to read labeled graphs
The ground truth files
For each .zip
file in the graphs
directory there is a corresponding .gtr file containing the ground truth regarding the pairs of graphs in the .zip archive. For example, the ground truth of the file iso_b03.zip is in iso_b03.gtr.
A ground truth file is a text file containing a line for each pair of graphs in the .zip file;
- Unlabeled graphs. The line has two fields, separated by a TAB character (ASCII 9), and is terminated by a LF character (ASCII 10). The first field is the name of the first graph of the pair, while the second field is the number of matchings found.
- Labeled graphs. The line has four fields, separated by a TAB character (ASCII 9), and is terminated by a LF character (ASCII 10). The first field is the name of the first graph of the pair, while the other fields are the number of matchings found. More in details, the second field is the number of maximum common sungraphs when the number of attributes of the graphs is fixed to 33% of graph size. Analogously, the third and the fourth fields are respectively the number of maximum common subgraphs when the number of attributes is fixed to 50% and 75% of the graph size.
For example, a line of the iso_b03.gtr
file could be iso_b03_m1000.A00 1
to signify that between the graphs iso_b03_m1000.A00
and iso_b03_m1000.B00
there is only one isomorphism, while a line of the mcs10_b03.gtr
file could be mcs10_b03_s20.A00 1 2 1
meaning that the graphs mcs10_b03_s20.A00
and mcs10_b03_s20.B00
holds respectively one, two and one maximum common subgraph when the number of attributes of the graphs is fixed to 33%, 50% and 75% of graph size.
All the graph pairs in the iso_*
files are present in the corresponding ground truth files. However, for graph-subgraph isomorphisms, and for maxmum common subgraphs only a subset of the pairs have been accounted for in the ground truth files, due to the considerably larger computation time needed to generate this information.
Graphs Generator
The source files of the programs used to generate the graphs composing the database can be found here. These programs have been developed and used under Linux, and are not completely portable to other operating systems, since they use the /dev/urandom
pseudo-device for initializing the random number generator. Each program generates a pair of graphs; we have used shell scripts to call repeatedly the programs with the proper arguments for generating the whole database.
In the following table there is an outline of the programs and of their command line arguments.
Name | Arguments | Types generated |
---|---|---|
gen |
nodes edges file1 file2 | iso*r001, iso*r005, iso*r01 |
gensub |
subgraph-nodes nodes edges file1 file2 | si*r001, si*r005, si*r01 |
genm2D |
nodes extra-edges file1 file2 | iso*m2D, iso*m2Dr2, iso*m2Dr4, iso*m2Dr6 |
gensubm2D |
subgraph-nodes nodes extra-edges file1 file2 | si*m2D, si*m2Dr2, si*m2Dr4, si*m2Dr6 |
genm3D |
nodes extra-edges file1 file2 | iso*m3D, iso*m3Dr2, iso*m3Dr4, iso*m3Dr6 |
gensubm3D |
subgraph-nodes nodes extra-edges file1 file2 | si*m3D, si*m3Dr2, si*m3Dr4, si*m3Dr6 |
genm4D |
nodes extra-edges file1 file2 | iso*m4D, iso*m4Dr2, iso*m4Dr4, iso*m4Dr6 |
gensubm4D |
subgraph-nodes nodes extra-edges file1 file2 | si*m4D, si*m4Dr2, si*m4Dr4, si*m4Dr6 |
genbvg |
nodes valence file1 file2 | iso*b03, iso*b06, iso*b09 |
gensubbvg |
subgraph-nodes nodes valence file1 file2 | si*b03, si*b06, si*b09 |
genbvgm |
nodes valence file1 file2 | iso*b03m, iso*b06m, iso*b09m |
gensubbvgm |
subgraph-nodes nodes valence file1 file2 | si*b03m, si*b06m, si*b09m |
genrndat |
nodes nodesub edges edgesub file1 file2 | mcs*r005, mcs*r01, mcs*r02 |
genm2Dat |
nodes nodesub edges extra-edgesub file1 file2 | mcs*m2D, mcs*m2Dr2, mcs*m2Dr4, mcs*m2Dr6 |
genm3Dat |
nodes nodesub edges extra-edgesub file1 file2 | mcs*m3D, mcs*m3Dr2, mcs*m3Dr4, mcs*m3Dr6 |
genm4Dat |
nodes nodesub edges extra-edgesub file1 file2 | mcs*m4D, mcs*m4Dr2, mcs*m4Dr4, mcs*m4Dr6 |
genbvgat |
nodes nodesub edges valence file1 file2 | mcs*b03, mcs*b06, mcs*b09 |
genbvgmat |
nodes nodesub edges valence file1 file2 | mcs*b03m, mcs*b06m, mcs*b09m |
For each program there is one source file with extension .cc
. The programs gen
, gensub
, genm2D
and gensubm2D
need to be linked with the VFLIB graph matching library. All other files are self-contained.