Right-Sizing Your AI Team: A Composition Guide by Project Type

Apr 28, 2026

Most "how to build an AI team" advice lists roles in a vacuum. Here's what actually changes when you move from a proof of concept to production, and from an LLM project to a computer vision one.

We sell AI teams for a living, so take our framing accordingly. That said, we've staffed over 60 projects since 2018 across government and private clients, and the patterns in what works and what doesn't have become consistent enough to be worth sharing.

The biggest mistake we see companies make with AI staffing is treating it as a headcount problem. They ask "how many ML engineers do we need?" before asking what kind of project they're running, at what stage, and with what data. The answer to the headcount question depends entirely on the answers to those three.

Why project type matters more than team size

A two-person team can deliver a production LLM application. A six-person team can fail at a computer vision proof of concept. The difference is whether the team composition matches the project's technical demands.

LLM projects, computer vision projects, and voice/speech projects each pull on different skill distributions. An LLM integration might be 70% software engineering and 30% ML. A custom object detection model might be 30% software engineering and 70% ML. Staffing them identically guarantees at least one will underperform.

The same applies to project stage. A feasibility assessment needs a senior engineer who can evaluate fast and say no. A production deployment needs MLOps, monitoring, and integration work that a researcher will find tedious and do poorly. Matching skills to phase matters as much as matching skills to domain.

The composition matrix

Below is how we think about team composition across project stages and AI domains. The numbers represent full-time equivalents, not rigid headcount. A senior ML engineer covering 0.5 FTE of architecture work alongside 0.5 FTE of hands-on development is common in smaller teams.

Stage 1: Feasibility and prototyping (4-8 weeks)

The goal is to answer a question: can this work with the available data, within the constraints that matter? The output is a working prototype and a clear recommendation on whether to proceed.

LLM / RAG / agent projects: 1 senior ML engineer (prompt engineering, retrieval architecture, evaluation design), 1 software engineer (API integration, data pipeline, basic UI). The senior engineer should have production RAG experience specifically. The most common failure at this stage is assigning a generalist who builds a demo that looks good but collapses under real document volumes or edge cases.

Computer vision projects: 1 senior ML engineer (model selection, data assessment, annotation pipeline design), 1 ML engineer (training runs, baseline experiments, metric reporting). You need someone who can look at your data on day one and tell you whether the labelling quality and volume are sufficient. Software engineering is minimal at this stage because the question is whether the model can work, not whether the system can scale.

Voice / speech projects: 1 senior ML engineer (ASR/TTS pipeline assessment, language coverage evaluation), 1 software engineer (audio pipeline, real-time streaming architecture). Voice projects have an infrastructure dependency that LLM and CV projects don't: latency requirements are strict and domain-specific. If the project involves a non-English language, this stage needs to determine whether fine-tuning is required and what that implies for data collection.

Across all types at this stage: One person on the team needs to own the decision on whether to proceed. This is the most undervalued role in AI prototyping. Teams without a designated skeptic tend to produce prototypes that justify their own continuation regardless of results.

Stage 2: Development and validation (2-4 months)

The prototype worked. Now it needs to become reliable, handle edge cases, pass evaluation thresholds, and integrate with existing systems. This is where most AI projects either find their footing or enter what the industry has started calling "pilot purgatory."

LLM / RAG / agent projects: 1 team lead (architecture decisions, sprint planning, stakeholder communication), 2 ML engineers (evaluation pipeline, prompt optimization, retrieval tuning, guardrails), 1 software engineer (API development, system integration, frontend). The shift from Stage 1 is that evaluation becomes a first-class concern. Someone needs to own the eval framework, not as a side task but as their primary responsibility. Without this, the team optimizes for demo performance rather than production reliability.

Computer vision projects: 1 team lead, 2 ML engineers (model training, hyperparameter optimization, data augmentation, edge case handling), 1 data engineer (annotation pipeline scaling, data versioning), 1 software engineer (inference pipeline, deployment prep). CV projects at this stage tend to be data-bottlenecked rather than compute-bottlenecked. The data engineer role is often missing from team plans, and teams compensate by having ML engineers do manual data work, which is expensive and slow.

Voice / speech projects: 1 team lead, 2 ML engineers (ASR fine-tuning, TTS pipeline development, language model integration), 1 software engineer (real-time audio infrastructure, WebSocket/streaming architecture), 1 data engineer (audio corpus management, transcription pipeline). Voice projects at Stage 2 often reveal that off-the-shelf ASR doesn't meet accuracy requirements for the target language or domain. If fine-tuning is required, the data engineering role becomes load-bearing. Our Medicall project (Lithuanian medical call transcription) required bespoke ASR using Whisper and Wav2Vec2 because no off-the-shelf model met the 85% accuracy threshold in a noisy call center environment.

Stage 3: Production deployment and handover (2-4 months)

The system works. Now it needs to work reliably at scale, with monitoring, alerting, documentation, and a team on the client side who can maintain it.

All project types at this stage share a common shape: 1 team lead, 1-2 ML engineers (model monitoring, retraining pipelines, performance optimization), 1 MLOps / DevOps engineer (CI/CD, infrastructure, deployment automation), 1 software engineer (production hardening, integration testing, security review). The new role is MLOps. This is the hire that separates teams who ship AI into production from teams who ship AI into a demo environment and call it done.

The handover component also requires dedicated time: admin manuals, runbooks, training sessions for the client's maintenance team. We typically allocate 2-4 weeks of explicit handover at the end of this stage, overlapping with the client's internal team. Skipping this creates a dependency that neither side wants.

The roles that actually matter (and the ones that are oversold)

After 60+ projects, our view on which roles carry the most weight:

High-impact roles that are frequently under-resourced: The evaluation engineer (in LLM projects, someone who owns eval metrics and test datasets as their primary job), the data engineer (in CV and voice projects, the person who keeps the annotation and data pipeline running), and the MLOps engineer (at production stage, the person who makes the system observable and maintainable).

Roles that matter but are oversold: "AI architect" as a standalone role rarely works in teams under 8 people. The architecture decisions should come from the team lead, who is also writing code and reviewing PRs. Separating architecture from implementation creates an accountability gap. Similarly, "AI product manager" is essential for companies building AI products, but for companies deploying AI to solve a specific business problem, the product decisions should sit with the client's domain expert working alongside the team lead.

How project complexity maps to team size

Simplified:

2 engineers can handle a feasibility assessment, a well-scoped LLM integration against a proven API, or a proof of concept in any domain.

4-5 engineers + team lead can take a validated concept from prototype to production-ready, including evaluation, edge case handling, and deployment.

6+ engineers + team lead is appropriate for enterprise-scale implementations that require ongoing operations, multiple system integrations, or multilingual/multi-domain coverage.

The most expensive mistake is staffing Stage 1 like Stage 3. The second most expensive is staffing Stage 3 like Stage 1.

The rent-vs-build decision (addressed honestly)

We're a team rental company, so our answer to this question is predictable. But here's our honest reasoning for when renting makes sense and when it doesn't.

Renting works when: the project has a defined scope and timeline (3-12 months), the company doesn't have internal ML expertise and hiring would take longer than the project itself, or the project requires a skill mix (say, CV + MLOps + data engineering) that's hard to assemble through individual hires.

Building internally works when: AI is core to the product (not a supporting function), the company has the runway to wait 6-12 months for hiring and onboarding, and the team will have continuous work beyond the initial project. If you're building a self-driving car, rent the first few sprints but build the long-term team. If you're deploying a document processing pipeline, renting through production is likely more cost-effective than permanent hires.

The hybrid model (renting a team to build and deploy, then transferring knowledge to a smaller internal maintenance team) is what we see most often among our enterprise clients.

What to look for in any AI team (rented or built)

Regardless of whether you staff internally or externally, four things predict project success more reliably than credentials:

Production experience in the relevant domain. A PhD in computer vision who has only published papers will struggle with the data pipeline and deployment challenges that define real projects. Ask for examples of systems running in production, not papers or conference talks.

Explicit evaluation practices. Any team that can't describe how they'll measure whether the AI system is working before they start building is a team that will build something hard to validate later. Evaluation methodology should be part of the project charter, not an afterthought.

Clear handover plan from day one. If the team (rented or internal) doesn't have a plan for how the system gets maintained after the initial build, you're building a dependency rather than a capability.

Willingness to say no. The best AI teams kill projects at the feasibility stage when the data or constraints don't support success. Teams that always find a way to continue past Stage 1 are optimizing for their own utilization.

AAI Labs provides dedicated AI engineering teams sized from 2-person feasibility squads to full cross-functional build teams. We work in 2-week sprints with full IP ownership for the client. See our team configurations or get in touch.