1. Trang chủ
  2. » Công Nghệ Thông Tin

19 graph convolutional networks(gcn)

73 1 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 73
Dung lượng 38,15 MB

Nội dung

CS224W: Analysis of Networks Jure Leskovec and Marinka Zitnik, Stanford University http://cs224w.stanford.edu ¡ Intuition: Map nodes to d-dimensional embeddings such that similar nodes in the graph are embedded close together f( )= Input graph 2D node embeddings How to learn mapping function !? 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Goal: Map nodes so that similarity in the embedding space (e.g., dot product) approximates similarity (e.g., proximity) in the network ¡ Input network 12/6/18 d-dimensional embedding space Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Goal: similarity(u, v) ⇡ z> v zu Need to define! Input network 12/6/18 d-dimensional embedding space Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Encoder: Map a node to a low-dimensional vector: d-dimensional enc(v) = zv embedding node in the input graph ¡ Similarity function defines how relationships in the input network map to relationships in the embedding space: similarity(u, v) ⇡ z> v zu Similarity of u and v in the network 12/6/18 dot product between node embeddings Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu So far we have focused on “shallow” encoders, i.e embedding lookups: ¡ embedding matrix embedding vector for a specific node Dimension/size of embeddings Z= one column per node 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Shallow encoders: § One-layer of data transformation § A single hidden layer maps node ! to embedding "# via function $, e.g., "# = $&"( , * ∈ ,- ! ' 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Limitations of shallow embedding methods: § O(|V|) parameters are needed: § No sharing of parameters between nodes § Every node has its own unique embedding § Inherently “transductive”: § Cannot generate embeddings for nodes that are not seen during training § Do not incorporate node features: § Many graphs have features that we can and should leverage 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Today: We will now discuss deep methods based on graph neural networks: enc(v) = ¡ 12/6/18 multiple layers of non-linear transformation of graph structure Note: All these deep encoders can be combined with node similarity functions defined in CS224W lecture 09 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu … Output: Node embeddings Also, we can embed larger network structures, subgraphs, graphs 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10 [Velickovic et al., ICLR 2018; Vaswani et al., NIPS 2017] Can we better than simple neighborhood aggregation? Can we let weighting factors !"# to be implicitly defined? Goal: Specify arbitrary importances to different neighbors of each node in the graph ¡ Idea: Compute embedding $& % of each node in the graph following an attention strategy: ¡ § Nodes attend over their neighborhoods’ message § Implicitly specifying different weights to different nodes in a neighborhood 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 59 ¡ Let !"# be computed as a byproduct of an attention mechanism $: § Let % compute attention coefficients &'( across pairs of nodes ), * based on their messages: /12 +"# = %(./ 0/12 , # / " ) § &'( indicates the importance of node (5 message to node ' § Normalize coefficients using the softmax function in order to be comparable across different neighborhoods: exp(+"# ) !"# = ∑/∈< " exp(+"/ ) 0/" = =(∑#∈< " !"# / 0/12 # ) Next: What is the form of attention mechanism %? 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60 ¡ Attention mechanism !: § The approach is agnostic to the choice of ! § E.g., use a simple single-layer neural network § ! can have parameters, which need to be estimates § Parameters of ! are trained jointly: § Learn the parameters together with weight matrices (i.e., other parameter of the neural net) in an end-to-end fashion ¡ Multi-head attention: Stabilize the learning process of attention mechanism [Velickovic et al., ICLR 2018]: § Attention operations in a given layer are independently replicated R times (each replica with different parameters) § Outputs are aggregated (by concatenating or adding) 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61 ¡ Key benefit: Allows for (implicitly) specifying different importance values ("#$ ) to different neighbors ¡ Computationally efficient: § Computation of attentional coefficients can be parallelized across all edges of the graph § Aggregation may be parallelized across all nodes ¡ Storage efficient: § Sparse matrix operations not require more than O(V+E) entries to be stored § Fixed number of parameters, irrespective of graph size ¡ Trivially localized: § Only attends over local network neighborhoods ¡ Inductive capability: § It is a shared edge-wise mechanism § It does not depend on the global graph structure 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62 Attention mechanism can be used with many different graph neural network models In many cases, attention leads to performance gains ¡ t-SNE plot of GAT-based node embeddings: § Node color: publication classes § Edge thickness: Normalized attention coefficients between nodes ! and $ ", across eight attention heads, ∑$(&'( + &('$ ) 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 63 Basics of deep learning for graphs Graph Convolutional Networks (GCN) Graph Attention Networks (GAT) Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 64 ¡ Data preprocessing is important: § Use renormalization tricks § Variance-scaled initialization § Network data whitening ¡ ADAM optimizer: § ADAM naturally takes care of decaying the learning rate ¡ ¡ ReLU (activation function) often works really well No activation function at your output layer: § Easy mistake if you build layers with a shared function ¡ ¡ 12/6/18 Include bias term in every layer GCN layer of size 64 or 128 is already plenty Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 66 ¡ Debug?!: § Loss/accuracy not converging during training ¡ Important for model development: § Overfit on training data: § Accuracy should be essentially 100% or error close to § If neural network cannot overfit a single data point, something is wrong § Scrutinize your loss function! § Scrutinize your visualizations! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 67 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 68 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 69 Basics of deep learning for graphs Graph Convolutional Networks Graph Attention Networks (GAT) Practical tips and demos 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 70 ¡ Project write-ups: § Sun Dec Midnight (11:59pm) Pacific Time: § team member uploads PDF to Gradescope § See course website for more info ¡ Poster session: § Tue Dec 11 at 3:30pm Pacific Time § All groups with at least one non-SCPD member must present § There should be person at the poster at all times § Prepare a 2-minute elevator pitch of your poster § More instructions to follow 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 71 ¡ CS246: Mining Massive Datasets (Winter 2019) § Data Mining & Machine Learning for big data § (big==does’ fit in memory/single machine), SPARK ¡ CS341: Project in Data Mining (Spring 2019) § Groups a research project on big data § We provide interesting data, projects and access to the Amazon computing infrastructure § Nice way to finish up CS224W project & publish it! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 72 ¡ ¡ You Have Done a Lot! And (hopefully) learned a lot! § Answered questions and proved many interesting results § Implemented a number of methods § And are doing excellently on the class project! Thank You for the Hard Work!!! 12/6/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 73

Ngày đăng: 26/07/2023, 19:37