Coding Agents in Practice: How to Get Better Results
Sep 23, 2025
AI-powered coding assistants (“coding agents”) are becoming common in developers’ workflows – in fact, 92% of developers report using AI coding tools in some capacity [1]. These tools promise to speed up coding, and over 80% of developers expect AI to improve team collaboration and code quality [1].
However, simply turning an AI loose on your code doesn’t guarantee faster results – in one controlled study, experienced developers actually took 19% longer to complete tasks when using AI assistance naively [2]. The key to success is learning how to work with the AI effectively.
We learned a lot of lessons by starting a new playground project (Language Learning app) that’s coded almost exclusively using LLM Coding Agents. The app is still pending a release but we feel that we can already share some of the guidelines to save time for the others.
This guide is a practical overview of lessons learned from many coding agent interactions, highlighting what tends to work well, what pitfalls to avoid, and how to refine your approach for better outcomes.
High-Level Observations
Small, well-defined tasks succeed quickly: Straightforward, well-scoped coding tasks – such as editing a document, making a small refactor, or migrating a single API hook – often succeeded with the AI in just one or two iterations. When the problem was clear and contained, the agent could produce a correct solution almost immediately.
Cross-cutting changes need multiple rounds: Larger refactors or features spanning multiple files and systems typically required several back-and-forth rounds. Integrations (e.g. wiring up a new session API or configuring a Docker environment) and broad changes (like reorganizing tests or adopting a new state management library) uncovered hidden context and ecosystem quirks. The AI would hit unforeseen constraints or missing information, necessitating iterative clarification and fixes.
Alignment matters more than code generation: The biggest drag on velocity wasn’t the AI’s coding speed – it was alignment. Many delays came from the agent misunderstanding requirements, lacking critical context, or running into environment mismatches. In other words, clarifying constraints and acceptance criteria (the “what” and “why” of the task) proved more important to success than the actual code output. Ambiguous prompts led to more wrong turns and retries than any inherent coding limitation of the AI.
Documentation and Guidelines are critical: Having the right context and guidelines prepared for the project will do wonders. A separate docs/ directory with the description of project, architecture, domain model, and dos and don’ts will greatly increase alignment of you and your agent. It’s a big topic so we prepared a separate article about this.
What Worked Well
Documentation improvements: Using the agent to enhance docs (like API references or quickstart guides) yielded great results. When given clear structure and examples to follow, the AI produced significantly better organized and more consistent documentation in one pass. Terminology became more consistent and explanations clearer, indicating that well-scoped doc edits play to the AI’s strengths.
Focused, example-driven refactors: Small refactoring tasks with a crisp goal (e.g. “convert this component to use the useReducer hook” or “implement the Page Object Pattern in this test module”) went well once the AI had a concrete example of the desired outcome. Success rates improved when the developer shared a slice of existing code and a target “shape” or example for the new pattern. With that guidance, the agent could transform the code as needed, often with minimal tweaks afterward.
Step-by-step guidance: Walking the AI through changes in discrete steps proved very productive. For instance, instructing the agent to first update an alert component’s structure, then in a follow-up prompt handle a styling change, yielded better results than asking for a large overhaul in one go. By narrowing each iteration to one change at a time, the user and agent could verify and course-correct continuously. Tasks like adding an auto-reload to an npm script or fixing a specific test failure were resolved quickly when tackled incrementally.
Explanations tied to code examples: The AI was especially helpful in teaching or enforcing concepts when prompted with both a question and a small code example. For example, asking about the “rules of React Hooks” or how to apply the Repository pattern in domain-driven design, and providing a short code snippet to work with, led to concise conceptual explanations followed by code illustrations. The combination of theory with self-contained code examples helped these insights “click” for the team.
Explicit best-practice checklists: When the AI was prompted with clear best-practice guidelines or checklists, it followed them diligently and helped enforce standards. For instance, a prompt that included a checklist for state management (“if state is simple, consider Context; if global and complex, consider Redux, etc.”) or a testing guideline (“use Given–When–Then structure for test descriptions”) resulted in outputs that the team could apply more reliably. By front-loading such rules, the agent’s suggestions aligned better with intended architecture and style.
What Didn’t Work (or Took Too Long)
Mixed or “multi-ask” prompts: Packing multiple distinct requests – e.g. asking the AI to weigh product decisions and design architecture and write implementation code in one query – consistently produced diffuse, unfocused answers. The agent would either fixate on one aspect or give shallow treatment to each, leading to rework. Breaking these concerns into separate, focused prompts was necessary to get useful results.
Unstated constraints and context gaps: A common failure pattern was not telling the AI about important constraints. Missing information like the framework version, build system, or existing code structure often led the agent down wrong paths (suggesting incompatible syntax or unsupported APIs). In several cases, forgetting to mention an environmental detail (like “we’re using Webpack, not Vite” or the specific Node.js version) caused the AI to produce solutions that didn’t fit. Remember, “AI can’t infer requirements you haven’t stated” [3] – it won’t magically know your environment or implicit expectations. Skipping this context meant spending extra cycles fixing compatibility issues.
“Big-bang” massive changes: Attempting large-scale refactors or broad feature implementations in one go was usually problematic. For example, asking the AI to migrate an entire app state from Redux to Zustand in one shot, or to reorganize an entire test suite all at once, often introduced regressions or confusion. Without a migration plan or intermediate checkpoints, the AI would make sweeping changes that were hard to validate and debug. The better approach was breaking such changes into smaller pieces behind feature flags or in parallel branches.
Vague problem statements: Providing prompts like “There’s a rendering error, please fix it” without any further detail led to generic or misguided advice. The agent might give boilerplate suggestions unrelated to the actual cause. Vague prompts wasted time – the AI would essentially guess, and often guess wrong. As a rule of thumb, specific problems get specific solutions, while vague questions get hallucinated answers [4]. Investing time to narrow down the issue (with error logs, steps to reproduce, etc.) was always worthwhile.
Upstream documentation gaps: When the code relied on third-party tools or APIs that were poorly documented, the AI had a hard time. In one scenario, integrating a new library with unclear docs forced the agent to guess how it worked – this led to a lot of trial and error. These cases resulted in backtracking and re-implementing once the correct usage became clear. It highlighted that if you don’t have certain specs or docs clarified, the AI definitely won’t either. In such cases, it was often better to find or create proper documentation (even manually) before asking the AI to proceed.
Most Common Frustrations
Even with many successes, developers encountered some recurring frustrations when using coding agents:
Iteration fatigue: It was often frustrating to go through numerous back-and-forth iterations for what felt like a straightforward issue. Each iteration consumed time and mental energy. This usually happened when initial prompts were unclear or when new hidden requirements kept emerging. Reducing iteration count became a key goal.
Misalignment with intent: Sometimes the AI’s solutions, while logical, missed the mark in terms of the user’s intent or the desired user experience. For example, the AI might propose a technically correct UI change that wasn’t in line with the product’s UX vision. These misalignments required the user to re-explain or adjust the ask, highlighting the importance of conveying the “why” behind a task, not just the “what.”
Overwhelm from AI-generated options: In some cases the assistant would propose multiple approaches or an overly complex solution, leaving the developer unsure which to choose. Ironically, having too many options from the AI caused analysis paralysis. One developer recounted that an AI helper sometimes made it “hard to choose the best [suggestion]” and even tempted them to consider unnecessarily complex solutions over simpler ones [5]. This reinforced the need to guide the AI toward the simplest acceptable solution.
Ambiguity and opaque errors: If the AI’s solution didn’t work, the error messages or failures could be hard to decipher. Sometimes the agent introduced a subtle bug that wasn’t immediately obvious, and the subsequent error output was ambiguous. Debugging those cases felt challenging, especially when the AI couldn’t directly observe the runtime error (for example, in a front-end context without the actual environment). This led to a feeling of helplessness at times, until the issue was sufficiently isolated for the AI (or a human) to reason about.
Mock data vs. real data drift: When working with test data or mocks, occasionally the structures would drift away from the real application’s data models. The AI might update code to match the mock data it saw, only for the team to later discover that production data had a different shape or additional fields. These mismatches caused downstream failures and eroded confidence in tests. It became clear that keeping test fixtures and real domain models in sync was crucial when using the AI – otherwise it was optimizing for the wrong target.
Patterns That Led to Success
From these experiences, several effective patterns emerged for getting good results with coding agents:
Narrow scope + clear acceptance criteria: Successful interactions were almost always the ones with a single, well-defined goal. For each AI query, define one outcome you’re trying to achieve (e.g. “Make this test pass” or “Implement caching for the X endpoint”). Also state explicit acceptance criteria or “done signals” – for example, “the build succeeds and all tests are green” or “the page should load in under 2 seconds.” When the scope stayed tight and success was measurable, the agent stayed on track and it was obvious when the goal was met.
Context-rich prompts: Providing the AI with ample relevant context upfront dramatically improved responses. This includes sharing your tech stack (framework, versions, OS, etc.), pertinent code snippets or file paths, and even prior attempts or error logs. The agent can only reason with what it’s told – giving it the right background prevents it from making wrong assumptions. A good prompt might say: “We’re using React 18 with Next.js 13.4; here is the Header.jsx file. We get a TypeError when clicking the login button (log excerpt included). Please help fix it.” This level of detail grounds the AI in reality. Developers who included version numbers, file names, and exact error text saw far more relevant solutions on the first try.
Plan first, code second: A major winning strategy was asking the AI to outline a plan or approach before writing any code. By having the agent explain its intended solution step-by-step (and even asking it to double-check its plan for gaps), many misunderstandings were caught early. In fact, one experienced user found that making the AI draft a plan and then critique it “eliminates 80% of the ‘AI got confused halfway through’ moments” [6]. This intermediate step ensures the AI’s understanding aligns with yours. Only after agreeing on the plan did the developer green-light the AI to generate code, resulting in far fewer surprises.
Minimal reproducible examples: When the AI was struggling or giving generic advice, turning the problem into a tiny, self-contained example was a breakthrough technique. For instance, instead of asking “How do I fix my app’s login bug?” you might present a 20-line snippet that reproduces the error in isolation. This is akin to the “minimal reproducible example” idea in bug reporting. These bite-sized contexts allowed the AI to laser-focus on the issue, often yielding a precise fix or explanation. It also avoided the agent getting distracted by irrelevant parts of the larger codebase.
Contracts and types as guardrails: Teams found that strongly typed interfaces and shared schemas greatly aided the AI in producing correct code. For example, defining a clear TypeScript type for an API response or using a schema validation (like Zod) for data meant the AI had less room to hallucinate incorrect fields or assumptions. By working against a well-defined contract, the AI’s suggestions more often passed integration tests on the first try. These contracts served as a form of “communication” to the agent about what shape of data or interaction is expected (since the types/specs were included in the prompt context). This pattern reduced churn where the AI would otherwise introduce mismatches that had to be discovered and fixed later.
Incremental refactoring via seams: When using the AI for large refactors or new feature rollouts, the successful pattern was to introduce seams in the code – places where the old and new implementations could coexist behind an interface or flag. For example, if migrating from one library to another, you might first implement an adapter layer or toggles such that both versions can run side by side. Then, ask the AI to refactor one piece at a time using that seam. This modular approach meant the system continued to run, tests could be gradually adapted, and any AI mistakes were isolated. It also gave the human developers confidence to verify each step, rather than dealing with a giant “big bang” change.
Test-first for risky changes: A powerful technique was writing tests before asking the AI to implement something complex. If a bug needed fixing or a new feature was security-sensitive, the team would create or update a suite of tests capturing the expected behavior. Then the prompt to the AI became “Make these tests pass.” This approach did two things: (1) it provided extremely concrete acceptance criteria to the agent, and (2) it ensured that once the agent’s code made the tests pass, the solution was indeed verified. It turned the AI into a tool for satisfying test cases, which is a much clearer goal than “implement X correctly” in the abstract. As a result, first-try correctness improved, and regressions dropped.
Anti-Patterns to Avoid
Equally important, the team identified some anti-patterns – approaches that consistently led to poor outcomes with coding agents:
All-in-one mega-prompts: Avoid the temptation to have the AI “do everything in one go.” For example, a prompt that asks for a full design outline, coding of multiple features, and a deployment plan in one shot will almost certainly fail. Break concerns into separate prompts or iterative steps. The AI works best when it can focus on one clear objective at a time.
Dumping the whole codebase: Copy-pasting huge swaths of your code or providing excessive context can backfire. Not only can this hit token limits, but it also “destroys [the AI’s] attention” [7]. The model may get overwhelmed or focus on irrelevant details. Instead, include only the relevant snippets and refer to others by name if needed. The goal is to give just enough context for the task at hand – anything more is noise.
Mismatched context or silent assumptions: Do not give the AI stale or mismatched context, such as code that isn’t actually in use or references to libraries you haven’t installed. Similarly, don’t assume the AI knows about your project’s configuration or business domain without telling it. For instance, expecting the agent to magically know you’re using an older version of a framework (with different API calls) will lead to incorrect suggestions. Always make implicit context explicit in your prompt. As noted earlier, expecting the AI to “mind-read” your intentions or environment is a recipe for mistakes [3].
Skipping the verification step: If you don’t ask “how can we test or verify this change?”, you might apply an AI-generated patch blindly – only to discover later that it broke something. Skipping a clear verification plan is an anti-pattern. Always consider how you (and the AI) will confirm the solution works. This might be as formal as writing a test or as simple as running a command the AI suggests. Without this, you’re flying blind on the AI’s word.
Huge diffs without safety nets: Beware of letting the AI produce large diffs (lots of changes at once) when you have no tests or rollback plan. If the agent’s big suggested change doesn’t work, you’ll have a hard time untangling what went wrong. It’s far safer to make incremental changes with the ability to revert if needed. If you must apply a large change, do it on a branch and ensure you have backups. In short, treat AI changes like an experimental PR – don’t merge without review and testing.
With the above learnings in mind, let’s outline a practical playbook for working with coding agents effectively. This playbook covers how to set up a prompt for success, how to collaborate through iterations, and how to solidify changes with proper testing and documentation.
A Practical Playbook for Working with Coding Agents
Before Asking for AI Help
Define your goal clearly: Start by writing down what you want to achieve in one sentence. For example: “Enable auto-login using session tokens on the front-end” or “Reduce image load time by caching thumbnails.” A clear goal prevents the AI from meandering. It also helps you ensure you’re asking a well-scoped question.
Share environment and context: Provide the agent with key context about your project setup: operating system, programming language and framework versions, package manager or build tool, etc. Mention any relevant configuration (e.g. “This is a Next.js 13 app using TypeScript and Node 18”). Also give a quick outline of the file structure or the specific file names you’re dealing with. If the task is about modifying code, include the relevant code snippet or function in the prompt (enclosed in triple backticks for clarity). The agent will perform much better when it can see the actual code and environment it’s working with.
Include evidence of the issue: If you’re asking about a bug or failing test, include the exact error message, stack trace, or test output. For instance: “Test XYZ is failing with NullReferenceException on line 42 – here’s the stack trace…” or “When I run the app and click the button, nothing happens (no network request in the dev tools).” Concrete evidence focuses the AI’s attention on symptoms and greatly improves the relevance of its suggestions.
Outline acceptance criteria: Let the AI know how you will judge success. This could be functional criteria (“the page should render the user’s name after login”) or technical criteria (“all unit tests should pass and Lighthouse performance score should be >90”). If performance or security is a concern, state those targets too (e.g. “must handle 1000 requests/sec” or “should sanitize all inputs to prevent XSS”). Defining done-conditions ensures the agent’s solution can be verified objectively.
When Prompting the Agent
Make the agent draft a plan: Instead of diving straight into coding, ask the assistant to first outline a solution approach. Have it list the steps it intends to take or the key considerations before writing code. This not only checks the AI’s understanding, but also gives you a chance to correct any misconceptions. If something in the plan seems off, you can clarify or adjust the plan before any code is written. Many users report that this approach catches errors early – “write a plan first, let AI critique it” is cited as a game-changer for avoiding confusion [6].
Encourage clarifying questions: A well-configured agent should be allowed to ask questions back. Prompt it with something like, “Let me know if you need any clarification before proceeding.” In practice, telling the AI it can ask 2-3 questions often leads it to surface hidden assumptions or missing details. For example, it might ask, “Are you using React Router or Next.js routing?” or “Should the output be cached in memory or on disk?”. This is extremely useful – it shows the AI is aligning with your needs. Answer those questions, update your prompt with the new details, and then let it continue.
Focus on minimal changes: Instruct the AI to keep its solution as small and targeted as possible. You can say, “Provide only the diff for changes” or “Only modify the Login.js and AuthService.js files; leave others unchanged.” By constraining the scope of changes, you’ll have an easier time reviewing the AI’s output and integrating it. Small patches also reduce the chance of introducing new bugs. If the AI’s initial answer is too large or touches unrelated areas, don’t hesitate to reiterate that it should limit the changes and perhaps proceed one step at a time.
Ask for a validation plan: After the agent provides a solution, you can follow up with, “How can we verify this works?” or “Provide steps to test this change.” A good agent should then output something like: “Run npm run build and ensure no errors. Then start the dev server and try logging in; you should see a welcome message if successful.” This is incredibly helpful because it not only confirms the agent’s understanding of the acceptance criteria, but also gives you a to-do list to confirm the fix. In some cases, the AI might even suggest relevant unit tests or edge cases to double-check.
Discuss risks and rollbacks: For significant changes, ask the AI if there are any risks, side effects, or performance concerns to be aware of. Similarly, ask how to roll back if something goes wrong (e.g. “If we deploy this and it causes errors, what is the mitigation?”). This step forces the AI to think beyond the happy path. It might reveal, for example, “This change could break compatibility with older clients” or “We should monitor memory usage after this update.” It might also suggest feature flags or configuration toggles to disable the new feature if needed. Considering these factors upfront makes your deployment safer.
Refactoring and Integration Strategies
Introduce seams for large changes: When doing a refactor or integrating a new system, don’t remove the old implementation all at once. Instead, create an interface or adapter that both the old and new code implement, or use a feature flag to switch between behaviors. By doing so, you isolate the AI-driven changes behind a “seam.” For example, if you’re swapping out an auth library, first abstract the auth calls behind your own functions (the seam), then have the AI implement those functions for the new library. This way, if the AI’s code has issues, you can toggle back to the old path easily. It also makes debugging easier since you know any problems lie within that seam’s scope.
Migrate incrementally: Break refactoring tasks into bite-sized pieces. If you have to update multiple modules, do them one at a time with the agent, rather than all together. After each small refactor, run tests and ensure everything still works before moving on. This incremental approach applies to integration tasks too – for instance, when introducing a new API, have the AI integrate one endpoint and get that working before moving to the next. Incremental changes not only reduce risk, but also give the AI a clearer target each round.
Use explicit contracts and schemas: When integrating systems (or even between components in your app), define explicit contracts. This could mean using TypeScript interfaces for data objects, JSON schemas for API payloads, or library-specific contract tests. By having a single source of truth for data structures, you can share that with the AI so it knows exactly what shape of data to expect. In our experience, using a schema validation library like Zod to generate both fake test data and validate real data helped the AI align mock data with the actual domain model. It prevented the common “works with mock, fails with real data” problem by ensuring consistency. In essence, tell the AI what the data looks like – don’t let it guess.
Add contract tests for integrations: For any external integration (a third-party API, a microservice, a database query, etc.), write a couple of tests that use either real calls (if possible) or simulated responses to assert that your code meets the contract. For example, if you integrate a payment API, a contract test might use a known test card number and expect a specific response. These tests act as guardrails. When the AI writes code to call the external service, you run the tests to immediately see if something’s off. If the tests fail, you now have concrete feedback to give the AI (e.g. “the test expecting error code XYZ is failing”), which it can use to adjust the implementation.
Keep a changelog of decisions: As you make design decisions or trade-offs during an AI-assisted refactor, keep notes (even informally in a markdown file or the PR description). Note things like “Decided to use library X for Y reason” or “Dropped support for Z feature in the new implementation.” Summarizing these after each AI iteration is useful for a few reasons: it solidifies in your mind what was done, it informs colleagues (and future you) about the context of changes, and it can even be fed back into the AI in later prompts (“As noted in our changelog, we opted not to cache results due to memory constraints.”). Good documentation of decisions leads to better continuity across AI interactions.
Testing and Quality Assurance
Adopt a test-first mindset: When adding new features or fixing bugs with the help of an AI, try writing the test before the code. For example, if a bug is reported, create a unit test or integration test that fails because of that bug. Show that to the AI and say “fix the code to make this test pass.” This mirrors Test-Driven Development (TDD) with the AI as the implementer. It gives the agent an exact definition of done. We found this especially useful for tricky logic or off-by-one errors – the AI might not intuitively handle every edge case on its own, but if the test spells it out, the AI will address it. Plus, once the test passes, you have immediate confirmation that the issue is resolved.
Stabilize flaky tests and environment first: If your test suite has known flaky tests or your dev environment is finicky (maybe some tests rely on network and sometimes fail), sort those out before involving the AI in changes. Otherwise, you might waste time chasing “failures” that are unrelated to the AI’s code. You can ask the AI to help stabilize tests: e.g. “This test fails sometimes due to a race condition – rewrite it using proper async/await or fixed delays.” Tackling flakiness (like using deterministic random seeds, or injecting testable timeouts) will save you and the AI a lot of grief in the long run.
Use high-level test patterns for clarity: Encourage or refactor tests to use patterns like the Page Object Model (for UI testing) or the Given–When–Then format for describing scenarios. Why? Because these patterns encapsulate intent in a readable way, which the AI can more easily follow and maintain. For instance, if the AI sees a Given–When–Then structured test, it knows exactly what the expected behavior narrative is, and it’s less likely to introduce unrelated steps. These patterns also make it easier for a developer to verify that the AI’s changes in tests are logically sound and not just cargo-culting some fix.
Add smoke tests around boundaries: When the AI helps integrate something at the boundaries of your system (a new external API call, a major library upgrade, etc.), add a simple smoke test. A smoke test might be as basic as “call the new API client’s health-check method and assert we get a 200 response.” Or “spin up the app with the new config and hit the ping endpoint.” These are not full verifications, but they catch obvious misconfigurations or errors early. They serve as a quick sanity check that the integration is at least wired correctly. If a smoke test fails, you can immediately loop back with the AI on that specific failure.
Use the agent to improve tests: You can also leverage the coding agent to improve your test coverage or quality. For example, after fixing a bug, ask the AI: “Generate additional test cases for this module, especially around edge cases.” It might come up with cases you didn’t think of. (Of course, review them for relevance.) Over time, having a more robust test suite in place makes future AI-assisted changes safer and easier to validate – it’s a virtuous cycle.
Documentation and Communication
Write example-first documentation: When updating documentation (be it README, API docs, or code comments) with the help of the AI, instruct it to provide examples. Examples are the quickest way to communicate usage. For instance, if documenting a new API endpoint, have the agent produce a sample request and response JSON. Or if writing a quickstart guide, the first thing should be a “Hello World” type minimal code snippet. We found that agents do well at generating these because examples are concrete, and it prevents the documentation from staying too abstract. Make sure the examples are actually runnable or realistic.
Keep quickstart guides minimal: One thing that consistently worked was keeping the “Getting Started” instructions lean. Aim for the shortest path by which a new user (or developer) can see the feature working. If the AI is helping write a guide, prompt it to assume the reader knows nothing and just wants to get something running. Remove extraneous steps or optional configuration from the quickstart – those can go in advanced sections later. By focusing on the core happy-path, the AI-written docs became much easier to follow, and users could actually get things working without confusion.
Align documentation changes with code changes: Whenever the AI introduces a new feature or changes a behavior, update the documentation in the same iteration. This might mean asking in your prompt, “Also update the README to reflect this new environment variable,” or “Add JSDoc comments for the modified functions.” Keeping docs in sync avoids the scenario where code evolves but docs lag behind (which would set up future AI interactions or developers for confusion). In cases of bigger changes, consider adding a small “change log” or an “Upgrade Guide” for your project. The AI can help generate a table of changes, which is useful for communicating to users or other team members what to expect.
Summarize after each iteration: This is more of a communication practice, but it proved valuable. After an AI completes a task (especially if it took a couple of rounds), write a short summary of what was done, any open questions, and what the next step is. You can even ask the AI to produce this summary! For example: “Summarize the changes you made and any assumptions for the next steps.” This creates a clear handoff at each stage. It’s helpful if multiple developers are collaborating, and it’s also something you can put in commit messages or PR descriptions. Moreover, if you re-prompt the AI later, you can include that summary as part of the context to remind it what has been accomplished so far.
Debugging Ambiguous Issues: A Triage Ladder
When neither you nor the AI are immediately sure what’s causing a bug, use a structured approach to break down the problem. We can think of it as a ladder of steps:
Reproduce the issue reliably: First, ensure there’s a clear way to trigger the bug. If it’s not already reproducible on demand, create a minimal scenario or a test that consistently shows the failure. This might involve simplifying inputs or isolating the component with the problem. Once you have that, you can show it to the AI (in code or description) which makes the situation concrete.
Localize the source: Try to narrow down where in the code the issue lies. Is it in the frontend component, or the backend API, or the database schema? If you have a stack trace, identify the likely file or function. You can then focus the AI on that area specifically (“The error seems to originate in PaymentService.processOrder method…”). This saves the agent from searching your entire codebase.
Inspect all clues: Gather and present all evidence around the bug. This includes log outputs, browser console errors, network request details, etc. If something looks suspicious in the environment (like a version mismatch or a config flag), mention that too. Often, the bug isn’t in code logic but in these external factors. By providing these to the AI, you’re essentially doing the “detective work” together. It might spot a pattern (e.g., “the log shows a UTF-8 encoding error whenever you call endpoint X”) that leads to the solution.
Constrain and isolate variables: If the cause is still unclear, start eliminating possibilities. You might disable certain features or swap components to see if the problem persists. For instance, test with a different browser, or replace the real API with a mock response, or try an older commit of the code to see when the issue was introduced. This process can often pinpoint the trigger. You can then ask the AI specifically about that narrower scenario. For example: “When I use the mock data, the error disappears, so it’s likely something about the real API response formatting – given this info, what could be wrong in our parsing logic?”
Fix in small steps and verify: Once you and the AI have a hypothesis of what’s wrong, apply a small fix and then test immediately. If you have an idea (“I think the date format is causing the crash”), implement or ask the AI for a targeted fix (“convert the date strings using format XYZ in that function”). Then run the app or test to see if it’s resolved. If the issue changes (maybe you get a different error now), that’s progress – it means you cleared the first hurdle and can move to the next symptom. Iterate like this, one fix at a time, rather than having the AI overhaul a bunch of things hoping one will work. This careful approach ensures you’re zeroing in on the true root cause.
By following this ladder, ambiguous problems become more manageable. The AI is good at helping with each step (e.g. writing a test, suggesting what evidence to gather, proposing isolation tactics, etc.), but you need to guide it through the process methodically.
Tooling Practices that Boosted Success
Leverage types and schemas: We touched on this in the context of integrations, but it’s generally useful to lean on strong typing. If you’re in a language like TypeScript, keep your type definitions up-to-date and comprehensive. If not, use things like JSON schemas or explicit validation checks. Not only do these catch human mistakes, they also guide the AI’s responses. When the AI sees a type definition, it adheres to it. For instance, if User objects have a lastLogin: Date field in the type, the AI is less likely to invent a lastLoginTimestamp field by mistake. Essentially, types/schema act as an additional communication channel to the agent about what’s valid. They encode your business logic and constraints in a format the AI can consume [8].
Streamline the dev environment: Make sure your development scripts and environment are easy to use – for both you and the AI. If running the project or tests is a convoluted multi-step process, simplify it. For example, add an npm run start:dev that does all necessary setup, or a single command to run the full test suite. Why does this matter for the AI? Because you can then instruct the AI with those commands (and it can even read your package.json scripts). If an AI knows exactly how to run your app or tests, it can better assist in debugging (“run npm run test:unit and see if any tests fail”) and it will tailor its instructions to your tools. We found that after we added a few convenient npm scripts (like npm run ci:lint), the AI started suggesting using those scripts rather than generic suggestions – which saved time. Also, using standard tools (like nodemon for reload, concurrently for parallel tasks) in a clear way means the AI is more likely to be familiar with them and recommend best practices around them.
Automate code quality checks: Use linters (ESLint, Pylint, etc.) and formatters (Prettier, Black) in your project, and ideally integrate them in your Git hooks or CI pipeline. This has two benefits. First, it keeps the code style consistent no matter who (or what) writes it. Second, if the AI introduces something that violates these rules, you’ll catch it early and can feed that back into the prompt. For example, if the AI gives a snippet that your linter flags (unused variable, or a discouraged API), you can tell the AI “please fix the code to pass ESLint rules, specifically no unused vars”. It will understand and adjust. We noticed that once we included linter output in an iteration, the AI quickly learned to avoid those mistakes in subsequent suggestions. Essentially, these tools provide an external feedback mechanism that the AI can learn from if you incorporate their reports.
Use typed API clients & middleware: For any external API or system that requires repetitive boilerplate (like attaching auth tokens, handling errors uniformly), it’s worth creating a thin client library or using an existing SDK. For instance, instead of scattering raw HTTP calls across the codebase, have a ApiClient class that the AI can use. This not only reduces duplication, but also means when you prompt the AI, you can say “use ApiClient.getUser() instead of raw fetch.” The agent will follow suit and produce cleaner code aligned with your architecture. Similarly, implement interceptors or middleware for cross-cutting concerns (logging, retries, auth) so the AI doesn’t have to implement those from scratch every time. This approach played nicely with the AI – once it was shown the pattern, it consistently used it, which improved consistency and reliability of the code it generated.
Run full test suites in CI (and use the results): Make sure your continuous integration runs not just unit tests, but integration tests, contract tests, lint checks – the whole shebang. When the AI contributes changes, run them through CI. If something fails, you have a wealth of information to feed back. For example, if 3 integration tests fail, you can copy-paste those failure messages to the AI and say “these tests failed after your change, please investigate and fix.” This closes the loop with concrete data. The AI will analyze the failures (perhaps noticing that all failing tests relate to a certain module) and propose fixes. We effectively treated the CI as part of the AI feedback cycle. It helped catch things one might miss in local testing, and provided the AI with objective signals on what was broken. Over time, as the AI’s suggestions improved, we saw fewer CI failures – a sign that the agent was internalizing some of those quality signals.
Key Takeaways for Developers
Working with AI coding agents is a skill, much like any other tool in software development. By reflecting on these experiences, we can distill a few key lessons:
Invest time in clarity up front: Ten minutes spent crafting a precise prompt with all relevant details can save you hours of confusion and debugging later. Clearly state the problem, the context, and the desired outcome. That initial effort pays huge dividends in the quality of the AI’s first response.
Use the AI as a collaborator, not an omniscient solver: Treat the agent like a junior developer who is extremely fast and knowledgeable, but needs guidance. This means you should direct it with plans and examples, and you should review its output critically. Make the AI “prove” it understands by having it explain its approach or by passing your tests. Remember the adage from one guide: “The AI is your intern, not your architect.” Your vision and oversight are still crucial [9].
Keep iteration loops tight and purposeful: It’s normal to have back-and-forth with the AI, but try to structure each iteration. For instance: one iteration to outline approach, one to implement, one to fix tests or edge cases. If you find you’re on the tenth round of “try something and see if it works,” step back and reconsider the strategy (maybe break the task down more, or gather more information). A disciplined, stepwise workflow yields better results than hoping a single long conversation will magically hit the mark [9].
Small, verifiable changes are the way to go: Whether it’s writing code or updating documentation, work in small increments that you or your CI can verify. It’s easier to trust and merge a 10-line change that you fully understand than a 500-line change that “hopefully” covers everything. Use tests and type checks as safety nets, and integrate them into the AI interaction. This not only gives you confidence but also trains the AI on what “correct” means in your project’s context.
Design for safe experimentation in integrations: Anticipate that anything involving external systems or major architecture changes will need a few tries. Feature flags, toggles, and thorough logging are your friends. They allow you to test AI-generated changes without jeopardizing production. And if something goes wrong, you can roll back swiftly. Essentially, create an environment where the AI (and you) can iterate safely.
Documentation is a force multiplier: Don’t neglect updating docs and comments when you make changes with the AI. Not only will this save future developers (or the future you) from confusion, it also improves future AI interactions by providing better context. A well-documented codebase is easier for an AI to navigate. Plus, many of the practices that help AI (clear naming, explicit contracts, clear module boundaries) are just good software engineering. They pay off even without AI, but doubly so with it.
By adopting these practices, developers can turn coding agents from a novelty into a genuine productivity boost. In our experience, when we applied the above lessons, we saw fewer iterations to reach a working solution, higher-quality code on the first attempt, and an overall smoother rhythm in collaborating with the AI. Coding agents excel at executing well-defined tasks and exploring solutions quickly – it’s up to us to set them up for success with well-scoped problems, rich context, and oversight. Embrace the mindset of human + AI working in tandem: use the AI to accelerate routine work and provide inspiration, while you steer the project’s direction and maintain the creative insight. With practice, this collaboration can feel like an extension of your development team, helping you focus on the interesting problems while it handles the boilerplate and heavy lifting under your guidance.
References: The insights above are drawn from a combination of hands-on experience and emerging research/best practices in the field. For instance, a recent developer survey by GitHub highlights the widespread adoption and anticipated benefits of AI coding tools [1], while a controlled study observed that without proper workflow adjustments, AI assistance can sometimes slow down experienced developers [2]. The importance of explicit prompts and step-by-step planning is echoed by experts who recommend making the AI outline solutions first to avoid confusion [6]. Likewise, avoiding vague instructions (“mind-reading”) and giving concrete requirements is consistently advised [3]. Developers have reported that over-reliance on AI suggestions without human judgment can lead to analysis paralysis and over-complication [5], reinforcing the need for a balanced, guided approach. Ultimately, successful use of coding agents comes from blending human expertise in problem definition and review with AI’s speed and knowledge, a theme emphasized across multiple real-world case studies and guides [9]. By learning from these experiences and strategies, you can harness AI agents to become a powerful ally in your software development journey.
[1] Survey reveals AI’s impact on the developer experience - The GitHub Blog
https://github.blog/news-insights/research/survey-reveals-ais-impact-on-the-developer-experience/
[2] Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
[3] [4] [8] [9] AI Agent Best Practices: 12 Lessons from AI Pair Programming for Developers | Forge Code
https://forgecode.dev/blog/ai-agent-best-practices/
[5] I Spent 30 Days Pair Programming with AI—Here’s What It Taught Me - DEV Community
https://dev.to/arpitstack/i-spent-30-days-pair-programming-with-ai-heres-what-it-taught-me-4dal
[6] [7] After 6 months of daily AI pair programming, here's what actually works (and what's just hype) : r/ClaudeAI