Introduction to Graph Convolutional Networks (GCN)

5 min readNov 11, 2020

Most real-world datasets such as social networks, knowledge graphs, the World Wide Web, etc have come in the form of graphs or networks. By reading this article you will be able to get an overall idea about how a GCN works.

First of all, we will get an idea about what is a graph and a convolutional network to get an orientation to the subject.

Graph

“ A Graph consists of a finite set of vertices (or nodes) and set of Edges which connect a pair of nodes. ”

Graphs are commonly used to solve different kinds of real-life problems, represent social networks, molecular structures, semantic structures, geographical or physical models.

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a neural network structure which breaks down an input, into smaller pieces and performs feature extraction. It derives important parts of the input which can be used to make a decision, typically a classification decision.

Graph Convolutional Networks (GCN)

Graph convolutional network (GCN) is also a kind of convolutional neural network that has the ability to directly working with graphs and their structural information.

Similar to how CNN extracting the most important information from an image to classify the image, GCN is also passing a filter over a graph, searching for important vertices and edges that can be used to classify nodes within the graph.

For a GCN model, it is needed to learn a function of features on a graph. Consider a graph G = (V,E) with N nodes. Here V and E represent nodes/vertices and edges respectively.

We need adjacency matrix A and feature matrix X for our further steps. The adjacency matrix for the above graph and the features matrix are,

We can obtain features of all neighbor nodes for each node by taking multiplication of A and X.

f(X, A) = AX

Now you can observe that each row of the resultant matrix represents the sum of features of its neighboring nodes. Let’s consider the first row of the adjacency matrix A. As you can see here, node 1 has a connection with node 3 and the 3rd row of the feature vector X represents the features of node 3. Then the first row of the resulting matrix is the feature vector of node 3 that is connected to node 1. We can see the same behavior in other nodes. So, we obtained the sum of all neighbors’ feature vectors.

Did we ignore something?

Yes, there are 2 things that should be improved in the above calculation.

Problem 1
When we taking the multiplication between A and X, we have ignored the feature of the node itself. That means the first row of the resultant matrix should also contain features of node 1.

Solution — add a self-loop to each node
That can be performed by adding an identity matrix I to the adjacency matrix A before taking the multiplication between A and X.

Ã = A + I

Problem 2
A is typically not normalized and therefore the multiplication with A will completely change the scale of the feature vectors. Nodes with large degrees will have large values in their feature representation while nodes with small degrees will have small values.

Solution —normalizing the feature representations
In this step, we need another matrix named Degree matrix D. When generating D, we consider self-loop too.

The feature representations can be normalized by node degree by transforming the adjacency matrix A by multiplying it with the inverse degree matrix D. Therefore our new propagation Rule is,

f(X, A) = D⁻¹AX

Now we can apply weights to the matrix. Let’s take a simple weight matrix to make the calculations easy.

Finally, we can apply an activation function like RELU σ.

f(X, A) = σ (D⁻¹AXW)

Applications of GCNs

Generating Predictions in Physical Systems
Image Classification
Community Prediction Using Semi-Supervised Learning
Operations Research and Combinatorial Optimization
Molecular Structure Analysis

Now let’s have a look at several variations of GCN.

Attention mechanisms — attention-based Graph Convolutional Networks maintain state information to capture the neighborhood properties of the nodes. This makes it possible to remember important nodes and give them higher weights throughout the learning process.
Graph Generative Networks — can generate new, realistic structures from data, similar to a Generative Adversarial Network. An external evaluation and reward system helps the network generate progressively more realistic graphs.
Graph Spatial-Temporal Networks — can support graphs with a fixed structure but inputs that change over time. For example, a traffic system with a fixed set of roads but variable traffic arriving over time.

Conclusion

In this article, I provided a high-level introduction to graph convolutional networks and illustrated how the feature representation of a node at each layer in the GCN is based on an aggregate of its neighborhood.