Overview
vbuild is a cross-platform CLI that turns YAML into deterministic workflows. It is intentionally thin: it delegates the actual work to your shell while it handles dependency resolution, scheduling, environment shaping, caching, and observability.
DAG execution
Tasks are resolved into a DAG and executed in parallel when safe, with fail-fast and retry policies.
Reproducible runs
Config validation, lock files, cache keys, snapshots, and version injection keep CI and experiments deterministic.
Research primitives
Datasets, experiments, lineage, metrics, model cards, benchmarks, and canary checks are first-class.
Tổng quan
vbuild là CLI đa nền tảng biến YAML thành workflow có thể tái lập. Nó mỏng và nhanh: giao việc thật cho shell, còn vbuild xử lý phụ thuộc, lập lịch, môi trường, cache và quan sát.
Chạy theo DAG
Task được sắp theo DAG và chạy song song khi an toàn, có fail-fast và retry.
Tái lập kết quả
Validation cấu hình, lock file, cache, snapshot và version injection giúp CI/experiment ổn định.
Primitive nghiên cứu
Dataset, experiment, lineage, metrics, model card, benchmark, canary đều có sẵn.
Quick start
- Create a
.vbuild.ymlin your repo. - Run
vbuild(default task) orvbuild <task>. - Grow the DAG with dependencies, outputs, and cache.
workflow: "Starter"
tasks:
default:
deps: [fmt, test]
fmt:
run:
- gofmt -w .
test:
run:
- go test ./...
Bắt đầu nhanh
- Tạo
.vbuild.ymltrong repo. - Chạy
vbuild(task default) hoặcvbuild <task>. - Mở rộng DAG bằng deps, outputs và cache.
workflow: "Starter"
tasks:
default:
deps: [fmt, test]
fmt:
run:
- gofmt -w .
test:
run:
- go test ./...
Install
scripts/ on GitHub and download the right release asset.
Linux / macOS
curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
# Pin a version
VBUILD_VERSION=v0.1.2 curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
Windows (PowerShell)
iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex
# Pin a version
$env:VBUILD_VERSION = "v0.1.2"; iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex
Cài đặt
scripts/ trên GitHub và tự chọn đúng binary.
Linux / macOS
curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
# Ghim phiên bản
VBUILD_VERSION=v0.1.2 curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
Windows (PowerShell)
iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex
# Ghim phiên bản
$env:VBUILD_VERSION = "v0.1.2"; iwr -useb https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.ps1 | iex
CLI reference
vbuildRun the default task.vbuild listList tasks and descriptions.vbuild graph --format dotShow task graph.vbuild --dry-runPrint commands without execution.vbuild script -- --flagPass args to a task.vbuild --since 12hRun tasks changed since a time.vbuild --until buildRun up to a target.vbuild --reverse cleanupReverse topo order.vbuild watch devWatch files and rerun.vbuild dataset listShow dataset registry.vbuild experiment listShow experiment registry.vbuild lineage --format jsonShow lineage graph.vbuild registry pushSync registry.vbuild report --out report.jsonGenerate compliance report.vbuild update --to v0.1.2Self-update binary.Run vbuild help for the full list of commands and flags.
CLI tham khảo
vbuildChạy task mặc định.vbuild listLiệt kê task và mô tả.vbuild graph --format dotXem đồ thị task.vbuild --dry-runIn lệnh, không thực thi.vbuild script -- --flagTruyền args cho task.vbuild --since 12hChạy task thay đổi theo thời gian.vbuild --until buildChạy tới task mục tiêu.vbuild --reverse cleanupChạy topo ngược.vbuild watch devTheo dõi file và chạy lại.vbuild dataset listXem registry dataset.vbuild experiment listXem registry experiment.vbuild lineage --format jsonXem lineage.vbuild registry pushĐồng bộ registry.vbuild report --out report.jsonTạo báo cáo compliance.vbuild update --to v0.1.2Tự cập nhật binary.Chạy vbuild help để xem đầy đủ lệnh và cờ.
Configuration
vbuild reads .vbuild.yml (or --file). The schema is validated with clear errors, and includes defaults, includes, aliases, datasets, and experiments.
workflow: "ML pipeline"
defaults:
timeout: 45m
shell: bash
retries: 1
max_retries: 3
backoff: 5s
jitter: 1s
include:
- .vbuild.d/**/*.yml
vars:
DATA_ROOT: data
VERSION: v0.1.2
env:
PYTHONUNBUFFERED: "1"
resources:
cpu: 8
memory: 32GB
gpus: 2
gpu_devices: ["0", "1"]
groups:
gpu: 1
datasets:
raw:
path: data/raw
version: 2024-01-01
tags: [source]
tasks:
data:prep:
run:
- python scripts/prepare.py --out data/clean
train:
deps: [data:prep]
resources: { gpu: 1, memory: 16GB, group: gpu }
seed: 1337
datasets:
- name: raw
output:
MODEL_PATH: models/model.pt
outputs:
- models/model.pt
run:
- python train.py --data {{DATASET_RAW_PATH}} --out {{MODEL_PATH}}
Script tasks are a shorthand for a single command with CLI args.
tasks:
train:
script: python train.py {dir} {out}
Run: vbuild train --dir=data --out=dist
Flags like --key=value are exposed as {{KEY}} / {{VBUILD_ARG_KEY}} (uppercased, - -> _).
script tasks accept args automatically (no need for pass_args).
Cấu hình
vbuild đọc .vbuild.yml (hoặc --file). Schema được validate rõ ràng và hỗ trợ defaults, include, alias, dataset, experiment.
workflow: "ML pipeline"
defaults:
timeout: 45m
shell: bash
retries: 1
max_retries: 3
backoff: 5s
jitter: 1s
include:
- .vbuild.d/**/*.yml
vars:
DATA_ROOT: data
VERSION: v0.1.2
env:
PYTHONUNBUFFERED: "1"
resources:
cpu: 8
memory: 32GB
gpus: 2
gpu_devices: ["0", "1"]
groups:
gpu: 1
datasets:
raw:
path: data/raw
version: 2024-01-01
tags: [source]
tasks:
data:prep:
run:
- python scripts/prepare.py --out data/clean
train:
deps: [data:prep]
resources: { gpu: 1, memory: 16GB, group: gpu }
seed: 1337
datasets:
- name: raw
output:
MODEL_PATH: models/model.pt
outputs:
- models/model.pt
run:
- python train.py --data {{DATASET_RAW_PATH}} --out {{MODEL_PATH}}
Script task là dạng 1 lệnh và nhận args từ CLI.
tasks:
train:
script: python train.py {dir} {out}
Chạy: vbuild train --dir=data --out=dist
Flag dạng --key=value map ra {{KEY}} / {{VBUILD_ARG_KEY}} (uppercase, - -> _).
Task script tự nhận args (không cần pass_args).
Schema reference
Global keys
- workflow: display name for the workflow.
- vars, env, env_file: variable and environment sources.
- defaults: default timeout/shell/workdir/retries/backoff/jitter.
- include: merge additional YAML configs (local, URL, glob).
- templates / tasks: reusable blocks and task definitions.
- resources: global CPU/memory/GPU pools and groups.
- datasets: dataset registry definitions.
- experiments: experiment defaults and metadata.
- registry: registry storage config.
- seed, seed_env, offline, snapshot.
- cache_remote, artifacts_upload, artifacts_dir.
- plugins, log_plugins, secrets, fail_fast, timeout.
Task keys
- run, pre, post, workdir, run_dir, shell.
- script: single-command tasks with pass-through args and
{name}placeholders. - pass_args: allow CLI args via
{{ARGS}},{{ARG_0}},{{ARGC}}. - deps, depends_on, parallel, fanout, matrix, sweep.
- when, only_on, confirm, allow_failure, continue_on_error.
- retries, max_retries, retry_on_exit_codes, retry_on_regex, retry_on_signal.
- inputs, outputs, output_paths, output, exports, cache, if_missing.
- capture, silent, secrets, tags, watch, artifacts.
- limits, resources, priority, group, remote, scheduler, isolate.
- datasets, dataset_outputs, split, validate, stats.
- metrics, canary, benchmark, experiment, checkpoint.
- model_card, notebook, export, sbom, sign, snapshot, offline.
- use / with to apply templates and parameters.
Tham chiếu schema
Khóa toàn cục
- workflow: tên hiển thị của workflow.
- vars, env, env_file: nguồn biến và môi trường.
- defaults: timeout/shell/workdir/retries/backoff/jitter mặc định.
- include: ghép thêm YAML (local, URL, glob).
- templates / tasks: block tái sử dụng và task.
- resources: pool CPU/memory/GPU và group.
- datasets: khai báo dataset registry.
- experiments: mặc định và metadata cho experiment.
- registry: cấu hình registry.
- seed, seed_env, offline, snapshot.
- cache_remote, artifacts_upload, artifacts_dir.
- plugins, log_plugins, secrets, fail_fast, timeout.
Khóa cho task
- run, pre, post, workdir, run_dir, shell.
- script: task 1 lệnh, nhận args và placeholder
{name}. - pass_args: cho phép truyền args từ CLI qua
{{ARGS}},{{ARG_0}},{{ARGC}}. - deps, depends_on, parallel, fanout, matrix, sweep.
- when, only_on, confirm, allow_failure, continue_on_error.
- retries, max_retries, retry_on_exit_codes, retry_on_regex, retry_on_signal.
- inputs, outputs, output_paths, output, exports, cache, if_missing.
- capture, silent, secrets, tags, watch, artifacts.
- limits, resources, priority, group, remote, scheduler, isolate.
- datasets, dataset_outputs, split, validate, stats.
- metrics, canary, benchmark, experiment, checkpoint.
- model_card, notebook, export, sbom, sign, snapshot, offline.
- use / with để áp dụng template và tham số.
Variables & environment
- vars expand in commands as
{{VAR}}. You can override them withVBUILD_VAR_NAME. - env merges OS env → global env → task env → exports.
- exports propagate to downstream tasks.
- pass_args tasks expose
{{ARGS}},{{ARG_0}}, and{{ARGC}}from the CLI. --export-envwrites the resolved environment to a file.--print-varsprints resolved variables for a task.
tasks:
build:
vars:
OUT: bin/app
env:
GOOS: linux
exports:
APP_VERSION: "{{VERSION}}"
run:
- go build -o {{OUT}} ./cmd/app
Biến & môi trường
- vars được thay trong lệnh dưới dạng
{{VAR}}. Có thể override bằngVBUILD_VAR_NAME. - env được merge theo thứ tự: OS → global → task → exports.
- exports truyền sang task downstream.
- Task có
pass_argssẽ có{{ARGS}},{{ARG_0}},{{ARGC}}từ CLI. --export-envghi environment đã resolve ra file.--print-varsin các biến đã resolve.
tasks:
build:
vars:
OUT: bin/app
env:
GOOS: linux
exports:
APP_VERSION: "{{VERSION}}"
run:
- go build -o {{OUT}} ./cmd/app
DAG scheduling
- Dependencies are resolved with cycle detection.
- Independent tasks run in parallel across the DAG.
parallel: trueruns commands inside a task concurrently with prefixed logs.fanoutandmatrixexpand tasks into multiple DAG nodes.fail_fast,continue_on_error, andmax-parallelcontrol failure and concurrency.- Use
--until,--reverse,--since, andonly-changedfor partial runs.
tasks:
lint:
run: ["golangci-lint run ./..."]
test:
deps: [lint]
matrix:
GOOS: [linux, darwin]
GOARCH: [amd64, arm64]
env:
GOOS: "{{GOOS}}"
GOARCH: "{{GOARCH}}"
run:
- go test ./...
Lập lịch DAG
- Giải phụ thuộc có phát hiện vòng lặp.
- Task độc lập chạy song song trên DAG.
parallel: truechạy lệnh trong task song song và có prefix log.fanoutvàmatrixtạo nhiều node trong DAG.fail_fast,continue_on_error,max-parallelkiểm soát lỗi và đồng thời.- Dùng
--until,--reverse,--since,only-changedcho chạy từng phần.
tasks:
lint:
run: ["golangci-lint run ./..."]
test:
deps: [lint]
matrix:
GOOS: [linux, darwin]
GOARCH: [amd64, arm64]
env:
GOOS: "{{GOOS}}"
GOARCH: "{{GOARCH}}"
run:
- go test ./...
Cache & artifacts
- Use
inputs/outputsandcache: mtime|sha256for incremental builds. cache_remotesupports S3/GCS/MinIO and profile-based auth.if_missingskips tasks if outputs already exist.artifactscollects outputs into.vbuild/artifactsandartifacts_uploadpushes to GitHub or S3.
tasks:
build:
inputs:
- cmd/app/**/*.go
outputs:
- bin/app
cache: sha256
artifacts:
- bin/app
run:
- go build -o bin/app ./cmd/app
Cache & artifact
- Dùng
inputs/outputsvàcache: mtime|sha256cho incremental build. cache_remotehỗ trợ S3/GCS/MinIO và auth theo profile.if_missingbỏ qua task nếu output đã tồn tại.artifactsgom output vào.vbuild/artifactsvàartifacts_uploadđẩy lên GitHub/S3.
tasks:
build:
inputs:
- cmd/app/**/*.go
outputs:
- bin/app
cache: sha256
artifacts:
- bin/app
run:
- go build -o bin/app ./cmd/app
Data & research workflows
vbuild ships domain features for AI and scientific pipelines: dataset registries, experiment tracking, metrics, benchmarks, checkpoints, model cards, notebooks, and exports.
datasets:
images:
path: data/images
version: 2024-01-01
format: files
tasks:
train:
datasets:
- name: images
dataset_outputs:
- name: embeddings
path: data/embeddings
version: v1
split:
input: data/images
output: data/splits
train: 0.8
val: 0.1
test: 0.1
validate:
paths: [data/images]
min_files: 1000
extensions: [".jpg", ".png"]
metrics:
regex: ["loss=(?P[0-9\\.]+)"]
file: metrics.json
format: json
benchmark:
iterations: 5
warmup: 1
checkpoint:
paths: [checkpoints/*.pt]
var: CHECKPOINT_PATH
model_card:
path: .vbuild/model_cards/train.md
notebook:
path: notebooks/report.ipynb
output: notebooks/report.executed.ipynb
export:
path: dist/train-artifacts.zip
format: zip
run:
- python train.py --data {{DATASET_IMAGES_PATH}} --ckpt {{CHECKPOINT_PATH}}
Use seed and seed_env for deterministic runs, and offline to disable network access for model hubs.
Workflow dữ liệu & nghiên cứu
vbuild có sẵn nhiều tính năng cho AI/Nghiên cứu: registry dataset, theo dõi experiment, metrics, benchmark, checkpoint, model card, notebook và export.
datasets:
images:
path: data/images
version: 2024-01-01
format: files
tasks:
train:
datasets:
- name: images
dataset_outputs:
- name: embeddings
path: data/embeddings
version: v1
split:
input: data/images
output: data/splits
train: 0.8
val: 0.1
test: 0.1
validate:
paths: [data/images]
min_files: 1000
extensions: [".jpg", ".png"]
metrics:
regex: ["loss=(?P[0-9\\.]+)"]
file: metrics.json
format: json
benchmark:
iterations: 5
warmup: 1
checkpoint:
paths: [checkpoints/*.pt]
var: CHECKPOINT_PATH
model_card:
path: .vbuild/model_cards/train.md
notebook:
path: notebooks/report.ipynb
output: notebooks/report.executed.ipynb
export:
path: dist/train-artifacts.zip
format: zip
run:
- python train.py --data {{DATASET_IMAGES_PATH}} --ckpt {{CHECKPOINT_PATH}}
Dùng seed và seed_env để tái lập, offline để tắt network cho model hub.
Resources, scheduling, and remote
resourcesmanage CPU, memory, GPUs, and group-based quotas.schedulersupports Slurm/PBS wrappers for HPC queues.remoteruns commands over SSH;remote.hostsfans out to multiple hosts.
tasks:
gpu:train:
resources: { gpu: 1, memory: 24GB, group: gpu }
scheduler:
type: slurm
queue: gpu
gpus: 1
time: "02:00:00"
remote:
hosts: ["gpu01", "gpu02"]
user: ml
run:
- python train.py
Tài nguyên, scheduler, và remote
resourcesquản lý CPU, memory, GPU và quota theo group.schedulerhỗ trợ wrapper Slurm/PBS.remotechạy lệnh qua SSH;remote.hostsfanout đa host.
tasks:
gpu:train:
resources: { gpu: 1, memory: 24GB, group: gpu }
scheduler:
type: slurm
queue: gpu
gpus: 1
time: "02:00:00"
remote:
hosts: ["gpu01", "gpu02"]
user: ml
run:
- python train.py
Lineage & compliance reports
Every run can register dataset inputs/outputs, experiments, and lineage edges. Use reports to ship provenance in CI or audits.
vbuild dataset list
vbuild experiment list
vbuild lineage --format dot
vbuild report --out compliance.json
vbuild registry push
Lineage & báo cáo
Mỗi lần chạy có thể ghi nhận dataset inputs/outputs, experiment và lineage. Dùng report cho CI hoặc audit.
vbuild dataset list
vbuild experiment list
vbuild lineage --format dot
vbuild report --out compliance.json
vbuild registry push
Self-update
vbuild updatepulls from GitHub Releases and picks the correct asset.- If a
.sha256file exists, it verifies the checksum before replacing. - Rollback is automatic when verification fails.
- On Windows, replacement is deferred via a helper script.
Tự cập nhật
vbuild updatelấy từ GitHub Releases và chọn đúng asset.- Nếu có file
.sha256, vbuild sẽ verify trước khi thay thế. - Nếu lỗi, tự rollback.
- Trên Windows, thay thế bằng helper script vì không ghi đè binary đang chạy.
CI/CD integration
name: build
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22.x'
- run: curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
- run: vbuild test
- run: vbuild build
Release builds use -trimpath, -buildvcs=false, and version injection for reproducibility.
Tích hợp CI/CD
name: build
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22.x'
- run: curl -fsSL https://raw.githubusercontent.com/vietrix/vbuild/main/scripts/install.sh | sh
- run: vbuild test
- run: vbuild build
Release build dùng -trimpath, -buildvcs=false và version injection để tái lập.
Troubleshooting
- Run
vbuild doctorif required binaries are missing. - Config errors show full paths to the offending field.
- Use
--jsonlogs for CI parsing and--json-summaryfor machine output. - Check
.vbuild/registryand.vbuild/artifactsfor persisted data.
Xử lý lỗi
- Chạy
vbuild doctornếu thiếu tool. - Lỗi cấu hình hiển thị rõ đường dẫn field bị sai.
- Dùng
--jsonvà--json-summarykhi tích hợp CI. - Kiểm tra
.vbuild/registryvà.vbuild/artifactsđể xem dữ liệu đã lưu.
License
vbuild uses a dual-license model: Apache-2.0 for open-source and non-commercial use, plus a separate commercial license for paid products, SaaS, CI/CD services, or redistribution. See LICENSE and LICENSE-COMMERCIAL.
Giấy phép
vbuild dùng mô hình dual-license: Apache-2.0 cho mã nguồn mở và phi thương mại, kèm giấy phép thương mại riêng cho sản phẩm trả phí, SaaS, CI/CD dịch vụ hoặc phân phối lại. Xem LICENSE và LICENSE-COMMERCIAL.