v0.1.2

Scientific build + research workflows, one binary. Workflow build + nghiên cứu theo phong cách khoa học, chỉ một binary.

vbuild executes real shell commands, resolves dependencies as a DAG, and ships AI/Research primitives like datasets, experiments, lineage, and metrics. vbuild chạy lệnh shell thật, giải phụ thuộc theo DAG, và có các primitive AI/Nghiên cứu như dataset, experiment, lineage, metrics.

workflow: "Research pipeline"

vars:
  VERSION: v0.1.2

tasks:
  default:
    deps: [data:prep, train, report]

  train:
    resources: { gpu: 1, memory: 16GB }
    seed: 1337
    metrics:
      regex: ["accuracy=(?P[0-9\\.]+)"]
    run:
      - python train.py --seed {{SEED}}

Overview

vbuild is a cross-platform CLI that turns YAML into deterministic workflows. It is intentionally thin: it delegates the actual work to your shell while it handles dependency resolution, scheduling, environment shaping, caching, and observability.

DAG execution

Tasks are resolved into a DAG and executed in parallel when safe, with fail-fast and retry policies.

Reproducible runs

Config validation, lock files, cache keys, snapshots, and version injection keep CI and experiments deterministic.

Research primitives

Datasets, experiments, lineage, metrics, model cards, benchmarks, and canary checks are first-class.

Tổng quan

vbuild là CLI đa nền tảng biến YAML thành workflow có thể tái lập. Nó mỏng và nhanh: giao việc thật cho shell, còn vbuild xử lý phụ thuộc, lập lịch, môi trường, cache và quan sát.

Chạy theo DAG

Task được sắp theo DAG và chạy song song khi an toàn, có fail-fast và retry.

Tái lập kết quả

Validation cấu hình, lock file, cache, snapshot và version injection giúp CI/experiment ổn định.

Primitive nghiên cứu

Dataset, experiment, lineage, metrics, model card, benchmark, canary đều có sẵn.

Quick start

  1. Create a .vbuild.yml in your repo.
  2. Run vbuild (default task) or vbuild <task>.
  3. Grow the DAG with dependencies, outputs, and cache.
workflow: "Starter"

tasks:
  default:
    deps: [fmt, test]

  fmt:
    run:
      - gofmt -w .

  test:
    run:
      - go test ./...

Bắt đầu nhanh

  1. Tạo .vbuild.yml trong repo.
  2. Chạy vbuild (task default) hoặc vbuild <task>.
  3. Mở rộng DAG bằng deps, outputs và cache.
workflow: "Starter"

tasks:
  default:
    deps: [fmt, test]

  fmt:
    run:
      - gofmt -w .

  test:
    run:
      - go test ./...

Install

Installer scripts are hosted in scripts/ on GitHub and download the right release asset.

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh

# Pin a version
VBUILD_VERSION=v0.1.2 curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh

Windows (PowerShell)

iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex

# Pin a version
$env:VBUILD_VERSION = "v0.1.2"; iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex

Cài đặt

Script cài đặt nằm trong thư mục scripts/ trên GitHub và tự chọn đúng binary.

Linux / macOS

curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh

# Ghim phiên bản
VBUILD_VERSION=v0.1.2 curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh

Windows (PowerShell)

iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex

# Ghim phiên bản
$env:VBUILD_VERSION = "v0.1.2"; iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex

CLI reference

vbuildRun the default task.
vbuild listList tasks and descriptions.
vbuild graph --format dotShow task graph.
vbuild --dry-runPrint commands without execution.
vbuild script -- --flagPass args to a task.
vbuild --since 12hRun tasks changed since a time.
vbuild --until buildRun up to a target.
vbuild --reverse cleanupReverse topo order.
vbuild watch devWatch files and rerun.
vbuild dataset listShow dataset registry.
vbuild experiment listShow experiment registry.
vbuild lineage --format jsonShow lineage graph.
vbuild registry pushSync registry.
vbuild report --out report.jsonGenerate compliance report.
vbuild update --to v0.1.2Self-update binary.

Run vbuild help for the full list of commands and flags.

CLI tham khảo

vbuildChạy task mặc định.
vbuild listLiệt kê task và mô tả.
vbuild graph --format dotXem đồ thị task.
vbuild --dry-runIn lệnh, không thực thi.
vbuild script -- --flagTruyền args cho task.
vbuild --since 12hChạy task thay đổi theo thời gian.
vbuild --until buildChạy tới task mục tiêu.
vbuild --reverse cleanupChạy topo ngược.
vbuild watch devTheo dõi file và chạy lại.
vbuild dataset listXem registry dataset.
vbuild experiment listXem registry experiment.
vbuild lineage --format jsonXem lineage.
vbuild registry pushĐồng bộ registry.
vbuild report --out report.jsonTạo báo cáo compliance.
vbuild update --to v0.1.2Tự cập nhật binary.

Chạy vbuild help để xem đầy đủ lệnh và cờ.

Configuration

vbuild reads .vbuild.yml (or --file). The schema is validated with clear errors, and includes defaults, includes, aliases, datasets, and experiments.

workflow: "ML pipeline"

defaults:
  timeout: 45m
  shell: bash
  retries: 1
  max_retries: 3
  backoff: 5s
  jitter: 1s

include:
  - .vbuild.d/**/*.yml

vars:
  DATA_ROOT: data
  VERSION: v0.1.2

env:
  PYTHONUNBUFFERED: "1"

resources:
  cpu: 8
  memory: 32GB
  gpus: 2
  gpu_devices: ["0", "1"]
  groups:
    gpu: 1

datasets:
  raw:
    path: data/raw
    version: 2024-01-01
    tags: [source]

tasks:
  data:prep:
    run:
      - python scripts/prepare.py --out data/clean

  train:
    deps: [data:prep]
    resources: { gpu: 1, memory: 16GB, group: gpu }
    seed: 1337
    datasets:
      - name: raw
    output:
      MODEL_PATH: models/model.pt
    outputs:
      - models/model.pt
    run:
      - python train.py --data {{DATASET_RAW_PATH}} --out {{MODEL_PATH}}

Script tasks are a shorthand for a single command with CLI args.

tasks:
  train:
    script: python train.py {dir} {out}

Run: vbuild train --dir=data --out=dist

Flags like --key=value are exposed as {{KEY}} / {{VBUILD_ARG_KEY}} (uppercased, - -> _).

script tasks accept args automatically (no need for pass_args).

Cấu hình

vbuild đọc .vbuild.yml (hoặc --file). Schema được validate rõ ràng và hỗ trợ defaults, include, alias, dataset, experiment.

workflow: "ML pipeline"

defaults:
  timeout: 45m
  shell: bash
  retries: 1
  max_retries: 3
  backoff: 5s
  jitter: 1s

include:
  - .vbuild.d/**/*.yml

vars:
  DATA_ROOT: data
  VERSION: v0.1.2

env:
  PYTHONUNBUFFERED: "1"

resources:
  cpu: 8
  memory: 32GB
  gpus: 2
  gpu_devices: ["0", "1"]
  groups:
    gpu: 1

datasets:
  raw:
    path: data/raw
    version: 2024-01-01
    tags: [source]

tasks:
  data:prep:
    run:
      - python scripts/prepare.py --out data/clean

  train:
    deps: [data:prep]
    resources: { gpu: 1, memory: 16GB, group: gpu }
    seed: 1337
    datasets:
      - name: raw
    output:
      MODEL_PATH: models/model.pt
    outputs:
      - models/model.pt
    run:
      - python train.py --data {{DATASET_RAW_PATH}} --out {{MODEL_PATH}}

Script task là dạng 1 lệnh và nhận args từ CLI.

tasks:
  train:
    script: python train.py {dir} {out}

Chạy: vbuild train --dir=data --out=dist

Flag dạng --key=value map ra {{KEY}} / {{VBUILD_ARG_KEY}} (uppercase, - -> _).

Task script tự nhận args (không cần pass_args).

Schema reference

Global keys

  • workflow: display name for the workflow.
  • vars, env, env_file: variable and environment sources.
  • defaults: default timeout/shell/workdir/retries/backoff/jitter.
  • include: merge additional YAML configs (local, URL, glob).
  • templates / tasks: reusable blocks and task definitions.
  • resources: global CPU/memory/GPU pools and groups.
  • datasets: dataset registry definitions.
  • experiments: experiment defaults and metadata.
  • registry: registry storage config.
  • seed, seed_env, offline, snapshot.
  • cache_remote, artifacts_upload, artifacts_dir.
  • plugins, log_plugins, secrets, fail_fast, timeout.

Task keys

  • run, pre, post, workdir, run_dir, shell.
  • script: single-command tasks with pass-through args and {name} placeholders.
  • pass_args: allow CLI args via {{ARGS}}, {{ARG_0}}, {{ARGC}}.
  • deps, depends_on, parallel, fanout, matrix, sweep.
  • when, only_on, confirm, allow_failure, continue_on_error.
  • retries, max_retries, retry_on_exit_codes, retry_on_regex, retry_on_signal.
  • inputs, outputs, output_paths, output, exports, cache, if_missing.
  • capture, silent, secrets, tags, watch, artifacts.
  • limits, resources, priority, group, remote, scheduler, isolate.
  • datasets, dataset_outputs, split, validate, stats.
  • metrics, canary, benchmark, experiment, checkpoint.
  • model_card, notebook, export, sbom, sign, snapshot, offline.
  • use / with to apply templates and parameters.

Tham chiếu schema

Khóa toàn cục

  • workflow: tên hiển thị của workflow.
  • vars, env, env_file: nguồn biến và môi trường.
  • defaults: timeout/shell/workdir/retries/backoff/jitter mặc định.
  • include: ghép thêm YAML (local, URL, glob).
  • templates / tasks: block tái sử dụng và task.
  • resources: pool CPU/memory/GPU và group.
  • datasets: khai báo dataset registry.
  • experiments: mặc định và metadata cho experiment.
  • registry: cấu hình registry.
  • seed, seed_env, offline, snapshot.
  • cache_remote, artifacts_upload, artifacts_dir.
  • plugins, log_plugins, secrets, fail_fast, timeout.

Khóa cho task

  • run, pre, post, workdir, run_dir, shell.
  • script: task 1 lệnh, nhận args và placeholder {name}.
  • pass_args: cho phép truyền args từ CLI qua {{ARGS}}, {{ARG_0}}, {{ARGC}}.
  • deps, depends_on, parallel, fanout, matrix, sweep.
  • when, only_on, confirm, allow_failure, continue_on_error.
  • retries, max_retries, retry_on_exit_codes, retry_on_regex, retry_on_signal.
  • inputs, outputs, output_paths, output, exports, cache, if_missing.
  • capture, silent, secrets, tags, watch, artifacts.
  • limits, resources, priority, group, remote, scheduler, isolate.
  • datasets, dataset_outputs, split, validate, stats.
  • metrics, canary, benchmark, experiment, checkpoint.
  • model_card, notebook, export, sbom, sign, snapshot, offline.
  • use / with để áp dụng template và tham số.

Variables & environment

  • vars expand in commands as {{VAR}}. You can override them with VBUILD_VAR_NAME.
  • env merges OS env → global env → task env → exports.
  • exports propagate to downstream tasks.
  • pass_args tasks expose {{ARGS}}, {{ARG_0}}, and {{ARGC}} from the CLI.
  • --export-env writes the resolved environment to a file.
  • --print-vars prints resolved variables for a task.
tasks:
  build:
    vars:
      OUT: bin/app
    env:
      GOOS: linux
    exports:
      APP_VERSION: "{{VERSION}}"
    run:
      - go build -o {{OUT}} ./cmd/app

Biến & môi trường

  • vars được thay trong lệnh dưới dạng {{VAR}}. Có thể override bằng VBUILD_VAR_NAME.
  • env được merge theo thứ tự: OS → global → task → exports.
  • exports truyền sang task downstream.
  • Task có pass_args sẽ có {{ARGS}}, {{ARG_0}}, {{ARGC}} từ CLI.
  • --export-env ghi environment đã resolve ra file.
  • --print-vars in các biến đã resolve.
tasks:
  build:
    vars:
      OUT: bin/app
    env:
      GOOS: linux
    exports:
      APP_VERSION: "{{VERSION}}"
    run:
      - go build -o {{OUT}} ./cmd/app

DAG scheduling

  • Dependencies are resolved with cycle detection.
  • Independent tasks run in parallel across the DAG.
  • parallel: true runs commands inside a task concurrently with prefixed logs.
  • fanout and matrix expand tasks into multiple DAG nodes.
  • fail_fast, continue_on_error, and max-parallel control failure and concurrency.
  • Use --until, --reverse, --since, and only-changed for partial runs.
tasks:
  lint:
    run: ["golangci-lint run ./..."]

  test:
    deps: [lint]
    matrix:
      GOOS: [linux, darwin]
      GOARCH: [amd64, arm64]
    env:
      GOOS: "{{GOOS}}"
      GOARCH: "{{GOARCH}}"
    run:
      - go test ./...

Lập lịch DAG

  • Giải phụ thuộc có phát hiện vòng lặp.
  • Task độc lập chạy song song trên DAG.
  • parallel: true chạy lệnh trong task song song và có prefix log.
  • fanoutmatrix tạo nhiều node trong DAG.
  • fail_fast, continue_on_error, max-parallel kiểm soát lỗi và đồng thời.
  • Dùng --until, --reverse, --since, only-changed cho chạy từng phần.
tasks:
  lint:
    run: ["golangci-lint run ./..."]

  test:
    deps: [lint]
    matrix:
      GOOS: [linux, darwin]
      GOARCH: [amd64, arm64]
    env:
      GOOS: "{{GOOS}}"
      GOARCH: "{{GOARCH}}"
    run:
      - go test ./...

Cache & artifacts

  • Use inputs/outputs and cache: mtime|sha256 for incremental builds.
  • cache_remote supports S3/GCS/MinIO and profile-based auth.
  • if_missing skips tasks if outputs already exist.
  • artifacts collects outputs into .vbuild/artifacts and artifacts_upload pushes to GitHub or S3.
tasks:
  build:
    inputs:
      - cmd/app/**/*.go
    outputs:
      - bin/app
    cache: sha256
    artifacts:
      - bin/app
    run:
      - go build -o bin/app ./cmd/app

Cache & artifact

  • Dùng inputs/outputscache: mtime|sha256 cho incremental build.
  • cache_remote hỗ trợ S3/GCS/MinIO và auth theo profile.
  • if_missing bỏ qua task nếu output đã tồn tại.
  • artifacts gom output vào .vbuild/artifactsartifacts_upload đẩy lên GitHub/S3.
tasks:
  build:
    inputs:
      - cmd/app/**/*.go
    outputs:
      - bin/app
    cache: sha256
    artifacts:
      - bin/app
    run:
      - go build -o bin/app ./cmd/app

Data & research workflows

vbuild ships domain features for AI and scientific pipelines: dataset registries, experiment tracking, metrics, benchmarks, checkpoints, model cards, notebooks, and exports.

datasets:
  images:
    path: data/images
    version: 2024-01-01
    format: files

tasks:
  train:
    datasets:
      - name: images
    dataset_outputs:
      - name: embeddings
        path: data/embeddings
        version: v1
    split:
      input: data/images
      output: data/splits
      train: 0.8
      val: 0.1
      test: 0.1
    validate:
      paths: [data/images]
      min_files: 1000
      extensions: [".jpg", ".png"]
    metrics:
      regex: ["loss=(?P[0-9\\.]+)"]
      file: metrics.json
      format: json
    benchmark:
      iterations: 5
      warmup: 1
    checkpoint:
      paths: [checkpoints/*.pt]
      var: CHECKPOINT_PATH
    model_card:
      path: .vbuild/model_cards/train.md
    notebook:
      path: notebooks/report.ipynb
      output: notebooks/report.executed.ipynb
    export:
      path: dist/train-artifacts.zip
      format: zip
    run:
      - python train.py --data {{DATASET_IMAGES_PATH}} --ckpt {{CHECKPOINT_PATH}}

Use seed and seed_env for deterministic runs, and offline to disable network access for model hubs.

Workflow dữ liệu & nghiên cứu

vbuild có sẵn nhiều tính năng cho AI/Nghiên cứu: registry dataset, theo dõi experiment, metrics, benchmark, checkpoint, model card, notebook và export.

datasets:
  images:
    path: data/images
    version: 2024-01-01
    format: files

tasks:
  train:
    datasets:
      - name: images
    dataset_outputs:
      - name: embeddings
        path: data/embeddings
        version: v1
    split:
      input: data/images
      output: data/splits
      train: 0.8
      val: 0.1
      test: 0.1
    validate:
      paths: [data/images]
      min_files: 1000
      extensions: [".jpg", ".png"]
    metrics:
      regex: ["loss=(?P[0-9\\.]+)"]
      file: metrics.json
      format: json
    benchmark:
      iterations: 5
      warmup: 1
    checkpoint:
      paths: [checkpoints/*.pt]
      var: CHECKPOINT_PATH
    model_card:
      path: .vbuild/model_cards/train.md
    notebook:
      path: notebooks/report.ipynb
      output: notebooks/report.executed.ipynb
    export:
      path: dist/train-artifacts.zip
      format: zip
    run:
      - python train.py --data {{DATASET_IMAGES_PATH}} --ckpt {{CHECKPOINT_PATH}}

Dùng seedseed_env để tái lập, offline để tắt network cho model hub.

Resources, scheduling, and remote

  • resources manage CPU, memory, GPUs, and group-based quotas.
  • scheduler supports Slurm/PBS wrappers for HPC queues.
  • remote runs commands over SSH; remote.hosts fans out to multiple hosts.
tasks:
  gpu:train:
    resources: { gpu: 1, memory: 24GB, group: gpu }
    scheduler:
      type: slurm
      queue: gpu
      gpus: 1
      time: "02:00:00"
    remote:
      hosts: ["gpu01", "gpu02"]
      user: ml
    run:
      - python train.py

Tài nguyên, scheduler, và remote

  • resources quản lý CPU, memory, GPU và quota theo group.
  • scheduler hỗ trợ wrapper Slurm/PBS.
  • remote chạy lệnh qua SSH; remote.hosts fanout đa host.
tasks:
  gpu:train:
    resources: { gpu: 1, memory: 24GB, group: gpu }
    scheduler:
      type: slurm
      queue: gpu
      gpus: 1
      time: "02:00:00"
    remote:
      hosts: ["gpu01", "gpu02"]
      user: ml
    run:
      - python train.py

Lineage & compliance reports

Every run can register dataset inputs/outputs, experiments, and lineage edges. Use reports to ship provenance in CI or audits.

vbuild dataset list
vbuild experiment list
vbuild lineage --format dot
vbuild report --out compliance.json
vbuild registry push

Lineage & báo cáo

Mỗi lần chạy có thể ghi nhận dataset inputs/outputs, experiment và lineage. Dùng report cho CI hoặc audit.

vbuild dataset list
vbuild experiment list
vbuild lineage --format dot
vbuild report --out compliance.json
vbuild registry push

Self-update

  • vbuild update pulls from GitHub Releases and picks the correct asset.
  • If a .sha256 file exists, it verifies the checksum before replacing.
  • Rollback is automatic when verification fails.
  • On Windows, replacement is deferred via a helper script.

Tự cập nhật

  • vbuild update lấy từ GitHub Releases và chọn đúng asset.
  • Nếu có file .sha256, vbuild sẽ verify trước khi thay thế.
  • Nếu lỗi, tự rollback.
  • Trên Windows, thay thế bằng helper script vì không ghi đè binary đang chạy.

CI/CD integration

name: build
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.22.x'
      - run: curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
      - run: vbuild test
      - run: vbuild build

Release builds use -trimpath, -buildvcs=false, and version injection for reproducibility.

Tích hợp CI/CD

name: build
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: '1.22.x'
      - run: curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
      - run: vbuild test
      - run: vbuild build

Release build dùng -trimpath, -buildvcs=false và version injection để tái lập.

Troubleshooting

  • Run vbuild doctor if required binaries are missing.
  • Config errors show full paths to the offending field.
  • Use --json logs for CI parsing and --json-summary for machine output.
  • Check .vbuild/registry and .vbuild/artifacts for persisted data.

Xử lý lỗi

  • Chạy vbuild doctor nếu thiếu tool.
  • Lỗi cấu hình hiển thị rõ đường dẫn field bị sai.
  • Dùng --json--json-summary khi tích hợp CI.
  • Kiểm tra .vbuild/registry.vbuild/artifacts để xem dữ liệu đã lưu.

License

vbuild uses a dual-license model: Apache-2.0 for open-source and non-commercial use, plus a separate commercial license for paid products, SaaS, CI/CD services, or redistribution. See LICENSE and LICENSE-COMMERCIAL.

Giấy phép

vbuild dùng mô hình dual-license: Apache-2.0 cho mã nguồn mở và phi thương mại, kèm giấy phép thương mại riêng cho sản phẩm trả phí, SaaS, CI/CD dịch vụ hoặc phân phối lại. Xem LICENSELICENSE-COMMERCIAL.