Image credit: Made with Midjourney
5. Spectral graph theory#
In this chapter, we look at network data. We begin by defining basic graph concepts. We then introduce ideas from spectral graph theory, in particular the eigenvalues of the Laplacian and their variational characterization. We discuss and implement some applications, including community detection. Here is a more detailed overview of the main sections of the chapter.
“Background: basic concepts in graph theory” This section provides an introduction to the basic concepts of graph theory, covering both undirected and directed graphs. It introduces key definitions such as vertices, edges, paths, connectivity, and various special types of graphs like cliques, trees, and directed acyclic graphs. The section also discusses different matrix representations. It discusses the Laplacian matrix, emphasizing its role in understanding graph connectivity. Finally, the section underscores the practical use of these concepts with the NetworkX Python package for graph manipulation and visualization.
“Variational characterization of eigenvalues” This section explores a variational characterization of eigenvalues, offering a different perspective on understanding these important mathematical objects. It begins by presenting a proof of the spectral theorem, showcasing how orthogonal transformations can be employed to diagonalize a symmetric matrix. This proof serves as a foundation for the subsequent exploration of variational characterizations, which involve expressing eigenvalues as solutions to optimization problems. The section details specific cases, demonstrating how the largest and smallest eigenvalues can be determined through optimization procedures involving the Rayleigh quotient. Finally, it presents the Courant-Fischer theorem.
“Spectral properties of the Laplacian matrix” The section discusses the spectral properties of the Laplacian matrix of a graph. It begins by observing that the Laplacian matrix is symmetric and positive semidefinite, and that the constant unit vector is always an eigenvector with eigenvalue 0. The section then explores the connection between the Laplacian matrix and graph connectivity, proving that a graph is connected if and only if the second smallest eigenvalue (i.e., the algebraic connectivity) is strictly positive. Finally, the section presents a variational characterization of the second smallest eigenvalue, which minimizes the sum of squared differences of values assigned to adjacent vertices, subject to a centering and normalization constraint. This characterization is used to explain why eigenvectors of the Laplacian matrix are useful for graph drawing and revealing the underlying geometry of the graph.
“Application: graph partitioning via spectral clustering” This section explores graph partitioning, the division of a graph into two sets of nodes with minimal connections between them. It introduces the concept of the isoperimetric number (or Cheeger constant), a measure of a cut’s quality, balancing the minimization of edges between sets against the need for balanced set sizes. The Cheeger’s Inequalities are presented, linking the isoperimetric number to the second smallest Laplacian eigenvalue, demonstrating the relationship between spectral graph theory and graph partitioning. The section outlines a graph-cutting algorithm leveraging Laplacian eigenvectors, offering a heuristic approach to find good cuts with provable guarantees, and discusses the computational challenges of finding optimal cuts. It concludes by revisiting community detection, applying spectral properties to identify sub-communities within a graph.
“Erdős-Rényi random graph and stochastic blockmodel” This section discusses random graph models, focusing on the inhomogeneous Erdős-Rényi (ER) random graph and the stochastic blockmodel (SBM). The inhomogeneous ER random graph is a versatile model for generating random graphs, where each edge is included independently with a specified probability. A special case is the ER random graph, where all edges between distinct vertices have the same probability of being present. The SBM extends this concept by partitioning vertices into blocks and assigning different probabilities for connections within and between blocks, capturing the tendency of nodes within the same group to connect more frequently.