Hashicorp Nomad

Oct 2015

HashiCorp Nomad Ivan [email protected]@gliush

mailto:[email protected]

Overview

❖ Docker support

❖ Operationally simple: one binary, multi-datacenter

❖ Built for scale

❖ Microservices

❖ Hybrid cloud deployment (AWS, Azure, GCE, Bare Metal, VMWare, …)

Concepts

❖ Task -> Task Group -> Job

❖ Driver

❖ Client / Server

❖ Allocation

❖ Evaluation

❖ Regions and Datacenters

Architecture

❖ Client-Server

❖ Multi-Region

Architecture

❖ Consensus Protocol: Raft

❖ default: possibly stale reads for network partitioning

❖ stale: faster reads from any server

❖ Gossip Protocol to manage membership

❖ Single global WAN gossip pool for cross-region requests

Scheduling

❖ Design inspired by Google’s papers Omega, Borg

❖ Allocation: set of tasks in a job to be run on some node

❖ Scheduling: process of determining the appropriate allocations

❖ Evaluation: process of handling state change

https://research.google.com/pubs/pub41684.html

https://research.google.com/pubs/pub43438.html

Scheduling

❖ State is changed -> create evaluation

❖ Evaluation Broker (Leader): «at least once», priority order, manage queued pending evaluations

❖ Scheduler types: batch, service, core

❖ Schedulers (all Servers): process evaluation, create evaluation plan

Scheduler❖ Generate allocation plan from desired state, real state

❖ Plan: set allocations to evict, update or create+place

❖ Place allocation:

❖ feasibility checking: filter out unhealthy nodes, no drivers, etc

❖ ranking: Scores for each node (bin packing + affinity/anti-affinity rules). Max value node wins

❖ Failed allocation rescheduled given the previous result

Job Specification❖ HCL or JSON

❖ Job -> [Task Group]

❖ Task Group -> [Task]

❖ Job: datacenters, region, type (service/batch), update strategy, priority, meta

❖ Task Group: count, meta

❖ Task: driver, config, resources (cpu, memory, ..), meta

❖ Resources: cpu, disk, iops, memory, network (ports, mbits)

job "my-service" { # Job should run in the US region region = "us"

# Spread tasks between us-west-1 and us-east-1 datacenters = ["us-west-1", “us-east-1"]

# Rolling updates should be sequential update { max_parallel = 1 }

group "webs" { # We want 5 web servers count = 5 task "frontend" { driver = "docker" config { image = “hashicorp/web-frontend" } resources { cpu = 500 memory = 128 network { dynamic_ports = ["http","https"] }}}}}

Runtime Environment

❖ Env: from job specification, from runtime during alloc

❖ NOMAD_META_{key} = {value} from job spec

❖ NOMAD_CPU_LIMIT: int, unit = 1MHz

❖ NOMAD_MEMORY_LIMIT: int, unit = 1MB

❖ NOMAD_IP, NOMAD_PORT_{LABEL} («http», …)

Task Drivers❖ To execute a task, isolate resources, mask details,

provide abstraction

❖ Docker

❖ Fork/Exec

❖ Java

❖ Qemu

❖ Custom

HTTP API

❖ /v1/jobs/v1/nodes/v1/allocations/v1/evaluations/v1/agent/{self,join,members,force-leave,servers} /v1/status/{leader,peers}

❖ CLI invokes HTTP API

Questions?

Hashicorp Nomad

Software

Transcript of Hashicorp Nomad