Technology

Announcing Vector: Human-Level Performance in the Enterprise

Announcing Vector: Human-Level Performance in the Enterprise

Today, we are introducing Vector, a new class of AI agent deeply integrated into the core of Mantle. Vector is capable of autonomously planning, executing, and verifying complex, multi-step enterprise workflows with human-level performance. While general-purpose models excel at conversation and retrieval, Vector was purpose-built to perform real work - turning hours of manual operations into minutes of auditable, automated execution.

The development of Vector is the culmination of a two-year focused effort to build AI that delivers quantifiable and transformative results for the enterprise. This required not just novel training techniques, but the creation of an entirely new standard for evaluating what matters: real-world operational performance in business-critical contexts.

Measuring What Matters

The promise of AI in the enterprise has remained largely unfulfilled. While large language models can converse, summarize, and generate content, they consistently fail when tasked with executing the precise, stateful, and unforgiving workflows that define business operations.

When we evaluated leading commercial models against the complexities our customers face daily - recognizing complex accounting transactions, reconciling inventory, billing complex projects - the results were disappointing. These models lack the contextual understanding and procedural reliability required for mission-critical work.

This gap exists because ERP is not a language problem; it is a systems problem. Success requires a deep, causal understanding of business logic, data dependencies, and operational constraints. Generic models, trained on the open internet, are prone to hallucinations and reasoning errors that are unacceptable when precision is required and mistakes are costly. Businesses cannot afford to build their core operations on a foundation of probabilistic guesswork. They need results with guarantees.

To measure this gap rigorously, we had to build the ruler ourselves.

EnterpriseBench: A New Standard for Enterprise AI

We developed EnterpriseBench, the first comprehensive benchmark designed to evaluate agentic AI in real-world ERP tasks.

Constructed over 12 months in collaboration with certified public accountants, ERP implementation consultants, and supply chain logistics experts, EnterpriseBench comprises thousands of unique, multi-step tasks spanning the full continuum of business operations: finance, CRM, project management, inventory and supply chain, HR, payroll, and tax compliance.

These are not simple Q&A problems. They are real-world scenarios derived from anonymized operational data, such as:

  • Executing a multi-currency month-end financial close across multiple entities
  • Processing a complete project billing cycle, from timesheet collection to invoice generation and revenue recognition
  • Reconciling physical inventory counts with system records and automatically generating adjustment journals
  • Fulfilling complex sales orders with components sourced from multiple warehouses

Each task requires the agent to plan a sequence of actions, execute them across multiple system modules, and verify the outcome against business rules and data integrity constraints. Crucially, every task includes a deterministic, machine-verifiable definition of success, ensuring objective and repeatable evaluation. An independent panel of subject matter experts validated a representative subset of the benchmark to certify its correctness and relevance to modern enterprise challenges.

The benchmark was constructed to mirror the complexity of modern business, featuring dependencies, edge cases, and conditional logic that frequently cause general-purpose AI to fail.

Performance on EnterpriseBench

Vector consistently outperforms leading models across all major domains within EnterpriseBench, demonstrating a profound understanding of business logic and operational context.

Finance

Vector82.1%
GPT-547.2%
Gemini 2.5 Pro32.0%
Human Baseline89.2%

Project Management

Vector94.6%
GPT-571.8%
Gemini 2.5 Pro70.8%
Human Baseline97.9%

Supply Chain

Vector70.5%
GPT-532.7%
Gemini 2.5 Pro29.2%
Human Baseline81.1%
Vector
GPT-5
Gemini 2.5 Pro
Human Baseline

Results represent pass rates on subsets of the EnterpriseBench benchmark — higher is better.

The performance gap is not linear; it widens dramatically as task complexity increases. On multi-module operations requiring five or more sequential steps, Vector achieves up to 2.4× better performance than the next best alternative. This is because Vector was built for the specific, structured world of an ERP, not for the boundless ambiguity of the open internet.

How Vector Was Built

Achieving this level of performance required a fundamentally different approach from training general-purpose language models.

Data Sourcing Strategy

Our data strategy combined three complementary approaches:

  • Synthetic Data Generation: We generated extensive synthetic data within the Mantle simulation environment, creating thousands of task variations across core workflows like invoice processing, purchase order management, and financial reporting. This provided broad coverage of standard operating procedures and common edge cases.
  • Real-World Interaction Logs: We collected anonymized interaction logs from our internal accounting and operations teams as they performed their daily work in Mantle, as well as consenting partners. These real-world traces captured the nuanced decision-making patterns and error recovery strategies that experienced users employ.
  • Expert Annotation: We engaged operations consultants and senior accountants to provide expert annotations on particularly complex or ambiguous scenarios, explicitly labeling correct action sequences and documenting the business reasoning behind critical decisions.

A Three-Phase Training Regime

The training followed a deliberate, multi-stage process:

  1. Supervised Pretraining: We began with behavioral cloning on expert demonstrations, teaching the model to predict appropriate actions given the current system state. This foundation enabled the agent to achieve initial success on straightforward tasks.
  2. Online Reinforcement Learning: We deployed the agent in sandbox Mantle instances where it could attempt complete workflows with automatic verification of outcomes. Our reward function balanced task completion, efficiency, and data accuracy. Throughout this phase, a human-in-the-loop feedback system allowed ERP experts to evaluate agent performance, provide preference rankings between alternative solutions, and contribute intervention data.
  3. Adversarial Training: The final phase introduced corrupted data, system failures, and configuration variations to strengthen the agent's robustness and error recovery capabilities, ensuring it could perform reliably in real-world conditions.

The result is an agent that has achieved human parity on a substantial subset of routine ERP tasks. By focusing on multi-stage curriculum learning, we taught an AI system to navigate complex, multi-step business processes with the same reliability and contextual understanding as an experienced human operator.

Complete Business Orchestration

In practice, Vector's capabilities translate to a shift in how work gets done. Users can make requests in natural language, and Vector will formulate a plan, seek approval, and execute it across the entire system.

Process end-of-month for our main warehouse

One request triggers a cascade of coordinated actions because Vector understands how your business functions are interconnected. Billing a project touches project management, sales, and accounting; paying a supplier impacts treasury, purchasing, and cash forecasting.

Vector orchestrates these workflows in real-time with guaranteed consistency. Every operation is logged in a complete audit trail, and if any step fails, the entire process is automatically rolled back, preserving system integrity.

Limitations, Accountability and Safety

Building AI systems that operate at the core of business operations demands a keen understanding of their limitations as well as rigorous accountability and safety measures. Vector was designed from the ground up with these principles embedded in its architecture.

Despite its capabilities, Vector is not a general artificial intelligence. Its expertise is confined to the operational logic and data within the Mantle ecosystem. Its performance on highly bespoke or novel business processes not represented in its training data may be degraded. We are actively working to expand its domain knowledge and improve its handling of unfamiliar edge cases.

Every action Vector recommends is auditable and interpretable at every step. All operations generate a complete execution trail, documenting the precise reasoning, data provenance, and decision points that led to each action. This approach ensures full transparency and enables operators to verify, challenge, or override any decision.

Most importantly we designed Vector to require human approval for high-stakes operations, a critical safeguard we believe is essential for responsible deployment. While it can plan and coordinate complex workflows autonomously, final authority rests with human operators who can control what Vector can see, what it can do, and when it can act. This human-in-the-loop design reflects our commitment to supervised autonomy rather than unconstrained automation.

The International Chamber of Commerce (ICC) has documented Mantle's approach to safe human-AI teaming in its overarching narrative on artificial intelligence as a case study in deploying strong technical guardrails to ensure that AI systems remain accountable, interpretable, and under human control.

As AI capabilities advance, maintaining accountability becomes more critical, not less. Vector represents the latest iteration of an architecture for AI systems that are not only powerful but also transparent, auditable, and ultimately answerable to the humans who depend on them.

Availability

We have commenced Vector's phased rollout to our existing partners. Active Mantle teams will receive 1,000 credits to use with Vector, enabling them to experience firsthand how autonomous AI execution can transform their workflows. Broader access will be made available to new Mantle customers in the coming weeks.

Looking Onwards

We are still in the earliest stages of what AI will make possible in enterprise operations. The momentum we are seeing, the capabilities now coming into focus, represent not an incremental improvement but a major departure from what came before.

Our contemporaries engage in elaborate efforts to market AI offerings that are, in practice, little more than thin wrappers around general-purpose models. Their focus remains on packaging and positioning - the fashionable thing - rather than the demanding, unglamorous work of engineering systems that perform reliably in high-stakes operational contexts.

We have chosen a different path, one that required patience and a certain indifference to the received wisdom. Building AI that functions at the operational core demands a calibrated fusion of custom model training, rigorous benchmarking, and a willingness to rebuild systems from first principles rather than retrofitting broken architectures with trendy capabilities. It is harder. It takes longer. The systems we deliver either work in production or they do not - and ultimately results are the only argument that endures.

Vector is not a hypothetical system that might work someday. It works now. It represents a shift from AI that discusses work to AI that executes it - and this is only the beginning.

The solution to power your digital transformation

Let go of inflexible software built for a different world and discover how Mantle radically accelerates critical business outcomes with software-defined operations management.