Abstract
- Kool의 Attention! Learn to Solve Routing Problems 같은 경우 single-agent라는 한계 조재
- 기존 방법론은 대부분 MinSum이지만 본 논문에서는 MinMax에 집중
- MinMax란 minimizing the length of the longest subroute
- mTSP는 distributed scheduling policy를 통해 cooperative strategic routing을 만들어 내야 되서 어려움
- 또한, delayed and sparse single reward라는 점도 학습을 어렵게 만듬.
- ScheduleNet은 arbitrary number of workers and tasks에 대해 적용 가능 -> Scalability가 좋음.
- Training 방법으로는 Clipped REINFORCE 사용.
Introduction
- MinMax의 장점은 MinSum과 달리 agent 별로 balanced된 task 분배가 가능하다는 점이다.
-
이 논문의 주요 부분은 아래와 같음.
- Decentralized policy를 통해 arbitrary number of agents and tasks에 대해 적용 가능.
- Attention-based node embeddings
- Training with a single delayed shared reward signal
Problem Formulation
State Definition
- Salesman: VT = {1,2,…,m}
- Cities: VC = {m+1,m+2,…,m+N}
- pi: 2D-coordinates of entities (salesman, cities, and the depot)
- time-dependent for workers
- static for tasks and the depot
- τ: event where any worker reaches its assigned city
- t(τ): time of event τ
- State s: siτ = (piτ, 1activeτ, 1assignedτ)
- 1activeτ:
- already visited task is inactive
- worker at the depot is inactive
- 1assignedτ:
- whether worker is assigned to a task or not
- senvτ: current time, sequence of tasks visited by each worker
- sτ state at the τ-th event: ({siτ}m+Ni=1, senvτ)
Action
- worker-to-tak assignment
Reward
- For nonterminal events, r(sτ) = 0
- r(sT) = t(T)
ScheduleNet
- Whenever event occurs, contruct graph as shown above.
- Edge feature is the Euclidean distance between two nodes
- hi: node embedding (maybe node coordinate?)
- hij: edge embedding (high-dim conversion of Euclidean distance)
-
Single iteration of TGA embedding consists of three phases: (1) edge update (2) message aggregation (3) node update
- h’ij: embeddings after TGA update (message from source node j to destination node i)
- zij: how valuable is node i is to node j (just like compatibility in self-attention)
- kj: type of entity j (i.e. active worker if 1activeτ = 1)
How Type-aware Graph Attention works?
-
“Type-aware” edge update
i. Context embedding cij of edge eij
- Source node j의 종류에 따라 context node Embedding 달라짐.
ii. Type-aware edge encoding
- MI layer dynamically generates its parameter depending on the context cij
- Dynamic edge feature which varies depending on the source node type.
iii. Type-aware edge encoding
- Upper one is for TGAE
- Lower one is for TGAA
-
Message Aggregation i. k type 별로 따로따로 attention score 구함.
- vi를 기준으로 주변에 type-k neighbor에 대하여 attention score 계산.
- 현재 세팅에선 agent, task, depot 세가지 종류의 노드가 존재하므로 두가지 attention score가 생길 것으로 예상.
ii. Attention score에 앞에서 구한 node embedding을 node type별로 곱해서 concat 진행.
-
Node Update i. Context embedding ci of node vi
-
Type-aware aggregation이 중요한 이유
- Node distribution은 task의 갯수가 월등히 많을 가능성이 큼.
- Type-aware aggregation은 위 같은 불균형으로 인해 생기는 학습의 어려움을 어느정도 완화.
Overall Procedure
- raw-2-hid: encode initial node and edge features to obtain initial h(0)i, h(0)ij
- hid-2-hid: encode the target subgraph Gsτ
- The subgraph is composed of a target-worker(unassigned-worker) node and all unassigned-city nodes.
- hid-2-hid layer is repeated H times to obtain final hidden embeddings h(H)i, h(H)ij
Probability Generation Process
Training
-
Makespan Normalization
i. Only reward signal is makespan of mTSP, which is M(π) ii. Such reward varies depending on the problem size, topology of the map, and … iii. So, it is normalized using the equation below.
Comments