Zenera Logo
Runtime

Transactional Execution Runtime

Sub-200ms code execution on data of any size — isolated, transactional, crash-safe, and Kubernetes-native from the ground up.

Transactional Execution Runtime overview

Why Code Execution Needs Its Own Runtime

Code generation and execution is the beating heart of the Zenera platform. Self-coding agents synthesize integration logic. Data pipelines transform enterprise datasets. Application generators produce full-stack code. Every one of these operations ends the same way: code runs on data, and the result must be correct, isolated, and durable.

Generic container orchestration is not enough. Running agent-generated code in a vanilla Kubernetes pod introduces problems that enterprise workloads cannot tolerate:

ProblemConsequence
Cold-start latencySpinning up a new pod takes 10–30 seconds. An agent waiting 30 seconds per tool call is useless for interactive workflows.
Data copyingDownloading gigabytes of enterprise data into each container wastes bandwidth, time, and storage — and creates ephemeral copies that are untracked and ungoverned.
No transactional safetyIf a pod crashes mid-execution, partially written files corrupt the dataset. There is no rollback. No branch isolation. No recovery.
No execution lineageWhen something goes wrong, there is no immutable record of what code ran, on what data, and what it produced.
Scale rigidityHomogeneous worker pools cannot serve the heterogeneous demands of enterprise AI: some tasks need GPUs, some need air-gapped networking, some need terabytes of memory.
"Agent-generated code is untrusted by definition. Running it without transactional isolation is the equivalent of giving every intern root access to the production database."

Architecture Overview

Everything runs inside the Kubernetes cluster. Temporal, Redis, LakeFS, object storage, and every worker pool are Kubernetes-native workloads. The entire runtime can be deployed in an air-gapped data center with a single Helm chart.

Transactional Execution Runtime architecture diagram showing Kubernetes cluster, control plane services, worker pools, and object storage

Three Layers

  1. 1Task Dispatch — Temporal workflows emit execution tasks as lightweight metadata pointers into priority-routed Redis queues.
  2. 2Warm Worker Pools — Pre-provisioned Kubernetes pods with runtimes already loaded, listening for work. Multiple pools with different hardware profiles (CPU, GPU, air-gapped, high-memory), each governed by its own NetworkPolicy.
  3. 3Virtual Data Layer — LakeFS branches mounted via FUSE sidecars inside each pod, providing instant access to datasets of any size without copying a single byte.

The Core Innovation: Data Injection, Not Data Fetching

Most execution platforms follow the same pattern: start a container, download the data, run the code, upload the results. For a 200 GB dataset, that means 200 GB of network transfer before a single line of code executes.

Zenera inverts the model entirely. Data is never copied. It is mapped.

When a task arrives, the runtime does not pull data into the worker. Instead, a high-performance FUSE driver exposes a LakeFS branch as a virtual filesystem inside the running pod. To the executing code, /data/project and /data/session appear as ordinary local directories with standard POSIX semantics.

  • Files are streamed on demand — Only the bytes that the code actually reads are fetched from object storage. A 200 GB dataset where the code reads 50 MB results in 50 MB of I/O — not 200 GB.
  • Writes go to the branch — Every file written by the code lands on an isolated LakeFS branch. The main dataset is untouched regardless of what the code does.
  • No local storage consumed — The worker’s filesystem is read-only. All mutable state lives on the versioned branch. If the pod dies, nothing is lost and nothing is corrupted.
"The code sees a local filesystem. The platform sees a versioned, transactional, streamable data layer. The two never conflict."

Task Anatomy: Metadata Pointers, Not Heavy Payloads

A task in the Zenera runtime is not a heavy object. It is a metadata pointer pushed to the Redis queue — typically under 1 KB.

Every task resolves to a set of lakefs:// URIs that define the execution context. The task envelope contains:

  • Branch URI — A lakefs:// reference that identifies the repository, the execution branch, and the paths to mount. The branch is the transactional boundary.
  • Resource profile — Routes the task to the appropriate worker pool (CPU, GPU, air-gapped, etc.).
  • Execution constraints — Timeout, network policy, and idempotency key.
  • Workflow reference — Link back to the Temporal workflow for completion callbacks and audit.

Because the task is a pointer — typically under 1 KB — queue throughput is bounded by Redis performance, not by payload size. Thousands of tasks per second is routine.

Worker Pool Architecture

Enterprise AI workloads are not uniform. Zenera operates multiple specialized worker pools, each configured for a class of workload.

Worker pool routing diagram showing task dispatch from Temporal workflows to specialized Redis queues
PoolHardwareNetworkWorkloads
General CPUMulti-core CPU, 16–64 GB RAMStandard egress rulesData transforms, API integrations, report generation
GPU-EnabledNVIDIA A100/H100, 80+ GB VRAMStandard egress rulesModel inference, embedding generation, image/video processing
High-Memory256–1024 GB RAMStandard egress rulesLarge dataset joins, in-memory analytics, graph processing
Air-GappedGeneral CPUNo internet accessRegulated data processing, classified workloads, PII handling
SandboxMinimal CPU, 4 GB RAMNo network at allUntrusted agent-generated code, integration testing

Each pool runs as a Kubernetes Deployment with a Horizontal Pod Autoscaler (HPA). When load drops, pools scale down to a configurable minimum — warm workers are always available for sub-200ms dispatch.

Transactional Execution Lifecycle

Every code execution follows a strict state machine that guarantees data consistency, crash safety, and complete auditability.

Transactional execution lifecycle state machine: IDLE → LINK → EXECUTE → SNAPSHOT → RESET, with FAILED branch

Task Arrival

IDLE → LINK

The worker listens on its pool's Redis queue via BLPOP. Upon receiving a task, it validates the idempotency key, resolves the LakeFS branch (zero-copy metadata operation), establishes the FUSE mount, and binds task-specific paths into the execution namespace.

Isolated Execution

LINK → EXECUTE

The worker spawns an isolated subprocess with restricted environment, cgroup-enforced resource limits, read-only root filesystem, and Kubernetes NetworkPolicy per pool. The code interacts with data through standard POSIX file operations — no knowledge of LakeFS or the transactional layer required.

Atomic Commit

EXECUTE → SNAPSHOT

On successful exit, the worker triggers a LakeFS commit — an atomic snapshot of all files written during execution. The commit includes task_id, workflow_id, execution duration, resource consumption, and a hash of the executed code for complete auditability.

Cleanup

SNAPSHOT → RESET

Paths are unmounted, FUSE cache is cleared, and a completion signal is sent to the originating Temporal workflow with commit reference, execution metrics, and output summary. The worker returns to IDLE.

"A pod crash mid-execution is a non-event. The main branch is untouched. The workflow retries. The data is intact. This is what transactional execution means."

Crash Safety and Recovery

In enterprise Kubernetes deployments, crashes are routine — node evictions, OOM kills, spot instance reclamation, rolling upgrades. The runtime is designed so that crashes are operationally invisible.

When a Worker Dies Mid-Execution

  • Data is safe — All writes were going to an isolated LakeFS branch. No commit was issued, so the main branch is pristine. The partial branch is preserved for debugging or garbage-collected.
  • Task is retried — The Temporal workflow detects the activity timeout and dispatches the task again. The idempotency_key ensures duplicate execution is idempotent.
  • No orphaned state — Because the worker’s filesystem is read-only and all mutable state lives on the LakeFS branch, there is nothing to clean up in the crashed pod.
  • No data pollution — Temporary branches from failed executions are tracked by the runtime’s garbage collector and deleted after a configurable TTL.

When the Entire Cluster Restarts

  • Temporal replays — The durable workflow engine replays workflow history from persistent storage. Every in-flight task is re-dispatched.
  • Workers warm up — The autoscaler provisions warm workers across all pools. Pool minimums ensure baseline capacity within seconds.
  • FUSE mounts re-establish — Mounting Sidecars lazily reconnect to LakeFS. No data migration required.
  • Execution resumes — Tasks picked up by warm workers. From the workflow’s perspective, it experienced a brief timeout — not a catastrophic failure.

Scaling Characteristics

More Workers, More Throughput

Horizontal

Each pool scales independently based on queue depth. Workers are warm (runtime loaded, FUSE driver initialized), so scale-up latency is just Kubernetes pod scheduling — 5–15 seconds for pre-pulled images.

Bigger Workers, Bigger Problems

Vertical

For workloads exceeding a single worker's capacity, specialized pool definitions provision workers with up to 1 TB of RAM or multi-GPU configurations. Pool definitions are standard Kubernetes resource specifications — declarative and version-controlled.

Size Is Irrelevant to Provisioning

Data Scale

Whether the dataset is 50 MB or 500 GB, the Link Phase takes the same < 200ms. No pre-staging. No capacity planning for data transfer. Multi-terabyte datasets are routine.

Security and Isolation Model

Every code execution passes through five independent isolation boundaries.

Five layers of isolation: Branch, Filesystem, Process, Network, and Audit isolation

Five Layers of Isolation

1

Branch Isolation

Code operates on an isolated LakeFS branch. Reads and writes are scoped. Cannot access other branches, sessions, or agents' data. Failures are free — discard the branch and retry.

2

Filesystem Isolation

Root filesystem is read-only. Writable paths limited to /data/session and /tmp. No persistent state on the worker.

3

Process Isolation

Code runs in a subprocess with restricted environment variables and cgroup-enforced CPU, memory, and time limits.

4

Network Isolation

Kubernetes NetworkPolicy per pool. Sandbox: no network. Air-gapped: no internet. General: scoped egress to whitelisted endpoints.

5

Audit Isolation

Every execution is immutably logged: code hash, data branch, inputs, outputs, duration, resource consumption. Complete chain of custody.

Network Isolation: Defense in Depth

PoolAllowed EgressRationale
GeneralLakeFS API, internal services, scoped external APIsStandard business integrations
GPULakeFS API, model registry, internal servicesModel inference does not require internet
Air-GappedLakeFS API onlyRegulated data must not leave the cluster
SandboxNothingUntrusted code gets zero network access
"This is not a simulation of transactional semantics layered on top of a filesystem. This is actual branch-merge versioning built into the storage layer — the same semantics that make Git safe for collaborative code development, applied to enterprise data at any scale."

Observability and Audit

Every execution produces a structured execution record that feeds into the platform's trajectory and observability systems.

  • Complete lineage — From the Temporal workflow that dispatched the task, through the code that ran, to the data that was produced. Every output file is traceable to the exact code and input data that generated it.
  • Performance profiling — CPU, memory, I/O, and duration metrics enable the Meta-Agent to optimize resource profiles and detect performance regressions.
  • Forensic debugging — The execution record plus the preserved LakeFS branch give developers a checkpointed, reproducible environment for exact failure reproduction.
  • Compliance evidence — For regulated industries, the immutable execution record combined with the versioned data branch provides auditable proof of what code operated on what data, when, and with what result.

Integration with the Temporal Workflow Engine

The Transactional Execution Runtime is a Temporal activity provider — every code execution is a durable, retriable activity within a larger workflow.

Example: “Enrich Customer Portfolio” Workflow

ActivityPoolCommit
Fetch raw data from ERPgeneral poolraw-data-v1
Run enrichment pipelinehigh-memory poolenriched-v1
Generate risk scoresgpu poolscored-v1
Produce compliance reportair-gapped poolreport-v1
Merge to productionLakeFS mergeatomic
  • Independently retriable — If Activity 3 fails, Temporal retries it without re-running Activities 1 and 2.
  • Versioned outputs — Each commit is an immutable checkpoint. If Activity 4 fails, outputs of Activities 1–3 are preserved and valid.
  • Different pools per activity — CPU-heavy work goes to general pools, GPU inference to GPU pools, regulated data to air-gapped pools.
  • Bounded by timeout — The task envelope’s timeout_ms sets a hard wall-clock limit. Temporal’s activity timeout wraps it with retry and escalation logic.
"Multi-step agent workflows — spanning hours, touching terabytes of data, crossing security boundaries — execute with the same transactional guarantees as a single database query."

Comparison: Zenera Runtime vs. Alternatives

DimensionDocker-in-DockerServerlessVM SandboxesZenera
Provisioning latency10–30s1–5s (cold)30–60s< 200ms
Data handlingCopy into containerCopy into functionCopy into VMVirtual mount, zero copy
Max data sizeContainer disk limitFunction memory limitVM disk limitUnlimited (streaming)
Crash recoveryLost stateRetry from scratchSnapshot restore (slow)Branch preserved, instant retry
Transactional isolationNoneNoneNoneLakeFS branch-merge
Execution lineageLogs onlyLogs onlyLogs onlyVersioned commits + records
GPU supportComplexLimitedManual configNative pool routing
Network isolationManual iptablesProvider-dependentManual firewallK8s NetworkPolicy per pool
ScalingManualAuto (with cold starts)ManualAuto (warm pools)

Design Principles

The worker is always warm

Cold starts are eliminated by pre-provisioning. The runtime never spins up a new container for a task — it dispatches to a running worker.

Data is mapped, never copied

The FUSE-based virtual mount makes dataset size irrelevant to provisioning time. A 200 GB dataset and a 200 KB dataset link in the same < 200ms.

Every execution is a transaction

Branch isolation ensures that code cannot corrupt shared data. Failures are free — discard the branch and retry.

Crashes are expected, not exceptional

The architecture assumes pods will die. Temporal retries. LakeFS preserves. The system converges to the correct state.

Isolation is defense in depth

Five independent layers — branch, filesystem, process, network, audit — ensure that no single failure in isolation can compromise the system.

Standard POSIX for generated code

Code uses open(), os.path, pandas.read_csv(). It does not need to know about LakeFS, S3, or the transactional layer.

Heterogeneity is native

Different workloads get different hardware, different network policies, and different trust levels — without changing the code running inside.

"The Transactional Execution Runtime is where Zenera's architectural promises become physical reality: self-coding agents generate code, the runtime executes it transactionally, on data of any size, with crash safety, security isolation, and sub-200ms provisioning — on Kubernetes, at enterprise scale."

Explore the Runtime

See how transactional execution, branch isolation, and warm worker pools power enterprise-grade code execution.

Request a Demo