- Topic: IGB-Addressing The Gaps
In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets
for Deep Learning Research
- Speaker: Vikram Sharma Mailthody
(Sr. Research Scientist, NVIDIA Research, previously UIUC) and Arpandeep
Khatua (Software Engineer, Meta, previously UIUC)
- Description: Graph neural networks
(GNNs) have shown high potential for a variety of real-world, challenging
applications, but one of the major obstacles in GNN research is the lack
of large-scale flexible datasets. Most existing public datasets for GNNs
are relatively small, which limits the ability of GNNs to generalize to
unseen data. The few existing large-scale graph datasets provide very
limited labeled data. This makes it difficult to determine if the GNN
model's low accuracy for unseen data is inherently due to insufficient
training data or if the model failed to generalize. Additionally, datasets
used to train GNNs need to offer flexibility to enable a thorough study of
the impact of various factors while training GNN models. In this work, we
introduce the Illinois Graph Benchmark (IGB), a research dataset tool that
the developers can use to train, scrutinize, and systematically evaluate
GNN models with high fidelity. IGB includes both homogeneous and
heterogeneous academic graphs of enormous sizes, with more than 40% of
their nodes labeled. Compared to the largest graph datasets publicly
available, the IGB provides over 162X more labeled data for deep learning
practitioners and developers to create and evaluate models with higher
accuracy. The IGB dataset is a collection of academic graphs designed to
be flexible, enabling the study of various GNN architectures, embedding
generation techniques, and analyzing system performance issues for node
classification tasks. IGB is open-sourced, supports DGL and PyG
frameworks, and comes with releases of the raw text that we believe foster
emerging language models and GNN research projects. An early public
version of IGB is available at this https URL.
…Read more
Less…
- Tags
-