19 graph convolutional networks(gcn)

CS224W: Analysis of Networks Jure Leskovec and Marinka Zitnik, Stanford University http://cs224w.stanford.edu ¡ Intuition: Map nodes to d-dimensional embeddings such that similar nodes in the graph are embedded close together f( )= Input graph 2D node embeddings How to learn mapping function !? 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity (e.g., proximity) in the network ¡ Input network 12/6/18 d-dimensional embedding space Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Goal: similarity(u, v) ⇡ z> v zu Need to define! Input network 12/6/18 d-dimensional embedding space Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Encoder: Map a node to a low-dimensional vector: d-dimensional enc(v) = zv embedding node in the input graph ¡ Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity(u, v) ⇡ z> v zu Similarity of u and v in the network 12/6/18 dot product between node embeddings Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu So far we have focused on “shallow” encoders, i.e embedding lookups: ¡ embedding matrix embedding vector for a specific node Dimension/size of embeddings Z= one column per node 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Shallow encoders: § One-layer of data transformation § A single hidden layer maps node ! to embedding "# via function $, e.g., "# = $&"( , * ∈ ,- ! ' 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Limitations of shallow embedding methods: § O(|V|) parameters are needed: § No sharing of parameters between nodes § Every node has its own unique embedding § Inherently “transductive”: § Cannot generate embeddings for nodes that are not seen during training § Do not incorporate node features: § Many graphs have features that we can and should leverage 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Today: We will now discuss deep methods based on graph neural networks: enc(v) = ¡ 12/6/18 multiple layers of non-linear transformation of graph structure Note: All these deep encoders can be combined with node similarity functions defined in CS224W lecture 09 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu … Output: Node embeddings Also, we can embed larger network structures, subgraphs, graphs 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10 [Velickovic et al., ICLR 2018; Vaswani et al., NIPS 2017] Can we better than simple neighborhood aggregation? Can we let weighting factors !"# to be implicitly defined? Goal: Specify arbitrary importances to different neighbors of each node in the graph ¡ Idea: Compute embedding $& % of each node in the graph following an attention strategy: ¡ § Nodes attend over their neighborhoods’ message § Implicitly specifying different weights to different nodes in a neighborhood 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 59 ¡ Let !"# be computed as a byproduct of an attention mechanism $: § Let % compute attention coefficients &'( across pairs of nodes ), * based on their messages: /12 +"# = %(./ 0/12 , # / " ) § &'( indicates the importance of node (5 message to node ' § Normalize coefficients using the softmax function in order to be comparable across different neighborhoods: exp(+"# ) !"# = ∑/∈< " exp(+"/ ) 0/" = =(∑#∈< " !"# / 0/12 # ) Next: What is the form of attention mechanism %? 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60 ¡ Attention mechanism !: § The approach is agnostic to the choice of ! § E.g., use a simple single-layer neural network § ! can have parameters, which need to be estimates § Parameters of ! are trained jointly: § Learn the parameters together with weight matrices (i.e., other parameter of the neural net) in an end-to-end fashion ¡ Multi-head attention: Stabilize the learning process of attention mechanism [Velickovic et al., ICLR 2018]: § Attention operations in a given layer are independently replicated R times (each replica with different parameters) § Outputs are aggregated (by concatenating or adding) 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61 ¡ Key benefit: Allows for (implicitly) specifying different importance values ("#$ ) to different neighbors ¡ Computationally efficient: § Computation of attentional coefficients can be parallelized across all edges of the graph § Aggregation may be parallelized across all nodes ¡ Storage efficient: § Sparse matrix operations not require more than O(V+E) entries to be stored § Fixed number of parameters, irrespective of graph size ¡ Trivially localized: § Only attends over local network neighborhoods ¡ Inductive capability: § It is a shared edge-wise mechanism § It does not depend on the global graph structure 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62 Attention mechanism can be used with many different graph neural network models In many cases, attention leads to performance gains ¡ t-SNE plot of GAT-based node embeddings: § Node color: publication classes § Edge thickness: Normalized attention coefficients between nodes ! and $ ", across eight attention heads, ∑$(&'( + &('$ ) 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 63 Basics of deep learning for graphs Graph Convolutional Networks (GCN) Graph Attention Networks (GAT) Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 64 ¡ Data preprocessing is important: § Use renormalization tricks § Variance-scaled initialization § Network data whitening ¡ ADAM optimizer: § ADAM naturally takes care of decaying the learning rate ¡ ¡ ReLU (activation function) often works really well No activation function at your output layer: § Easy mistake if you build layers with a shared function ¡ ¡ 12/6/18 Include bias term in every layer GCN layer of size 64 or 128 is already plenty Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 66 ¡ Debug?!: § Loss/accuracy not converging during training ¡ Important for model development: § Overfit on training data: § Accuracy should be essentially 100% or error close to § If neural network cannot overfit a single data point, something is wrong § Scrutinize your loss function! § Scrutinize your visualizations! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 67 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 68 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 69 Basics of deep learning for graphs Graph Convolutional Networks Graph Attention Networks (GAT) Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 70 ¡ Project write-ups: § Sun Dec Midnight (11:59pm) Pacific Time: § team member uploads PDF to Gradescope § See course website for more info ¡ Poster session: § Tue Dec 11 at 3:30pm Pacific Time § All groups with at least one non-SCPD member must present § There should be person at the poster at all times § Prepare a 2-minute elevator pitch of your poster § More instructions to follow 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 71 ¡ CS246: Mining Massive Datasets (Winter 2019) § Data Mining & Machine Learning for big data § (big==does’ fit in memory/single machine), SPARK ¡ CS341: Project in Data Mining (Spring 2019) § Groups a research project on big data § We provide interesting data, projects and access to the Amazon computing infrastructure § Nice way to finish up CS224W project & publish it! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 72 ¡ ¡ You Have Done a Lot! And (hopefully) learned a lot! § Answered questions and proved many interesting results § Implemented a number of methods § And are doing excellently on the class project! Thank You for the Hard Work!!! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 73

Định dạng
Số trang	73
Dung lượng	38,15 MB