ScheduleNet: Learn to Solve MinMax Multiple Travelling Salesman Problem

Abstract

  • Kool의 Attention! Learn to Solve Routing Problems 같은 경우 single-agent라는 한계 조재
  • 기존 방법론은 대부분 MinSum이지만 본 논문에서는 MinMax에 집중
  • MinMax란 minimizing the length of the longest subroute
  • mTSP는 distributed scheduling policy를 통해 cooperative strategic routing을 만들어 내야 되서 어려움
  • 또한, delayed and sparse single reward라는 점도 학습을 어렵게 만듬.
  • ScheduleNet은 arbitrary number of workers and tasks에 대해 적용 가능 -> Scalability가 좋음.
  • Training 방법으로는 Clipped REINFORCE 사용.

Introduction

  • MinMax의 장점은 MinSum과 달리 agent 별로 balanced된 task 분배가 가능하다는 점이다.
  • 이 논문의 주요 부분은 아래와 같음.

    1. Decentralized policy를 통해 arbitrary number of agents and tasks에 대해 적용 가능.
    2. Attention-based node embeddings
    3. Training with a single delayed shared reward signal

Problem Formulation

State Definition

  1. Salesman: VT = {1,2,…,m}
  2. Cities: VC = {m+1,m+2,…,m+N}
  3. pi: 2D-coordinates of entities (salesman, cities, and the depot)
    1. time-dependent for workers
    2. static for tasks and the depot
  4. τ: event where any worker reaches its assigned city
  5. t(τ): time of event τ
  6. State s: siτ = (piτ, 1activeτ, 1assignedτ)
  7. 1activeτ:
    1. already visited task is inactive
    2. worker at the depot is inactive
  8. 1assignedτ:
    1. whether worker is assigned to a task or not
  9. senvτ: current time, sequence of tasks visited by each worker
  10. sτ state at the τ-th event: ({siτ}m+Ni=1, senvτ)

Action

  1. worker-to-tak assignment

Reward

  1. For nonterminal events, r(sτ) = 0
  2. r(sT) = t(T)

ScheduleNet

스크린샷 2021-11-13 오후 7 38 55

  • Whenever event occurs, contruct graph as shown above.
  • Edge feature is the Euclidean distance between two nodes
  • hi: node embedding (maybe node coordinate?)
  • hij: edge embedding (high-dim conversion of Euclidean distance)
  • Single iteration of TGA embedding consists of three phases: (1) edge update (2) message aggregation (3) node update

    스크린샷 2021-11-13 오후 8 15 03

  • h’ij: embeddings after TGA update (message from source node j to destination node i)
  • zij: how valuable is node i is to node j (just like compatibility in self-attention)
  • kj: type of entity j (i.e. active worker if 1activeτ = 1)

How Type-aware Graph Attention works?

  1. “Type-aware” edge update

    i. Context embedding cij of edge eij Screenshot from 2021-11-15 14-27-11

    • Source node j의 종류에 따라 context node Embedding 달라짐.

    ii. Type-aware edge encoding

    Screenshot from 2021-11-15 14-43-35

    • MI layer dynamically generates its parameter depending on the context cij
    • Dynamic edge feature which varies depending on the source node type.

    iii. Type-aware edge encoding

    Screenshot from 2021-11-15 14-53-30

    • Upper one is for TGAE
    • Lower one is for TGAA
  2. Message Aggregation i. k type 별로 따로따로 attention score 구함.

    Screenshot from 2021-11-15 15-01-54

    • vi를 기준으로 주변에 type-k neighbor에 대하여 attention score 계산.
    • 현재 세팅에선 agent, task, depot 세가지 종류의 노드가 존재하므로 두가지 attention score가 생길 것으로 예상.

    ii. Attention score에 앞에서 구한 node embedding을 node type별로 곱해서 concat 진행.

    Screenshot from 2021-11-15 18-51-34
    Screenshot from 2021-11-15 18-53-05

  3. Node Update i. Context embedding ci of node vi

    Screenshot from 2021-11-15 18-57-56

    Screenshot from 2021-11-15 18-58-59

    Screenshot from 2021-11-15 18-59-30

  4. Type-aware aggregation이 중요한 이유

    • Node distribution은 task의 갯수가 월등히 많을 가능성이 큼.
    • Type-aware aggregation은 위 같은 불균형으로 인해 생기는 학습의 어려움을 어느정도 완화.

Overall Procedure

Screenshot from 2021-11-15 19-07-16

  • raw-2-hid: encode initial node and edge features to obtain initial h(0)i, h(0)ij
  • hid-2-hid: encode the target subgraph Gsτ
  • The subgraph is composed of a target-worker(unassigned-worker) node and all unassigned-city nodes.
  • hid-2-hid layer is repeated H times to obtain final hidden embeddings h(H)i, h(H)ij

Probability Generation Process

Screenshot from 2021-11-15 19-13-20 Screenshot from 2021-11-15 19-13-49

Training

  1. Makespan Normalization

    i. Only reward signal is makespan of mTSP, which is M(π) ii. Such reward varies depending on the problem size, topology of the map, and … iii. So, it is normalized using the equation below. Screenshot from 2021-11-15 19-32-38

Comments