The goal of GCN is simple: combine a node’s features with those of its neighbors in a way that is mathematically well-defined, computationally efficient, and stable when stacked across layers. Instead of learning arbitrary attention weights or complex aggregation functions, GCN takes a more structured approach. It uses a normalized adjacency matrix to average information across neighbors, correcting for node degrees and ensuring that aggregation behaves like a diffusion process on the graph. In other words, GCN can be seen as a learnable smoothing operator: each layer mixes information locally, while a linear transformation learns how to re-weight and combine features.