How long does an AI implementation take?

Most single workflow implementations take 2-6 weeks from kickoff to production. Full AI transformation programmes run 6-12 weeks.

Do you work with specific AI models?

We are model-agnostic and work with all major providers including Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, Mistral, and more.

Can you deploy AI on our own servers?

Yes. Our Local & Private AI service deploys models on your own infrastructure or private cloud.

7 Prompt Engineering Mistakes That Kill AI Output Quality

The difference between a mediocre AI implementation and a great one often comes down to prompt engineering. We've audited dozens of business AI deployments and the same mistakes appear repeatedly. The model is capable - it's the instructions that are letting it down.

Here are the seven most common prompt engineering mistakes we see, along with concrete fixes you can apply immediately.

1. Being Vague About Output Format

The most common mistake by far. A prompt like "Summarise this document" gives the model no guidance on length, format, focus, or audience. You get a different style of summary every time, making the output unreliable for downstream processes.

The fix: Be explicit about what you want. Instead of "Summarise this document", try: "Summarise this document in exactly 5 bullet points. Each bullet should be one sentence under 25 words. Focus on financial implications and action items. Use plain English suitable for a non-technical executive."

The more specific your format instructions, the more consistent and usable the output. For any output that feeds into automated systems, specify the exact JSON schema you expect.

2. Not Providing Examples

Describing what you want in natural language is surprisingly imprecise. You might write three paragraphs explaining the desired output format, but a single example communicates the same thing more clearly and effectively.

The fix: Include 2-3 examples of input/output pairs in your prompt. This technique - called few-shot prompting - is the single most effective way to improve output quality. It works because the model can pattern-match against concrete examples rather than interpreting abstract instructions.

For classification tasks, extraction tasks, or any task where consistency matters, few-shot examples are not optional. They are the primary mechanism for communicating your expectations.

3. Stuffing Everything Into the User Message

Many implementations put all instructions into a single user message: role definition, context, constraints, formatting rules, and the actual query all jumbled together. This makes prompts hard to maintain, debug, and iterate on.

The fix: Use the system prompt for persistent instructions - role, constraints, formatting rules, and behavioural guidelines. Use the user message for the specific query or content to process. This separation makes it easy to update instructions without touching the application code, and it gives the model clearer signal about what is context versus what requires a response.

4. Not Setting Constraints

Without explicit constraints, models will do their best to be helpful - which sometimes means going far beyond what you asked for. Ask for a product description and you might get a full marketing brochure. Ask for an email response and you might get an essay.

The fix: Set clear boundaries. Specify what the model should not do as well as what it should do. "Do not include pricing information. Do not make claims about competitors. Keep the response under 150 words. If you are not confident in an answer, say so rather than guessing."

Negative constraints are especially important for customer-facing applications where the model might otherwise volunteer information about topics you'd rather it avoided - legal advice, medical recommendations, or competitor comparisons.

5. Ignoring the Chain-of-Thought

For complex reasoning tasks - analysis, decision-making, multi-step extraction - asking the model to jump straight to the answer produces worse results than asking it to think through the problem step by step.

The fix: For complex tasks, explicitly instruct the model to reason before answering. "First, identify the key clauses in this contract. Then, for each clause, assess the risk level and explain your reasoning. Finally, provide a summary of the top three risks." This structured approach produces more accurate and more explainable output.

If you only need the final answer (not the reasoning), you can still instruct the model to think step by step internally and then provide only the conclusion. The reasoning process improves the quality of the final output even when you discard the intermediate steps.

6. Using One Prompt for Everything

A single "super prompt" that handles every variation of a task is tempting from an engineering perspective but almost always produces mediocre results. A prompt that tries to handle customer complaints, product queries, shipping questions, and refund requests all in one will be mediocre at each.

The fix: Use a routing pattern. A fast, cheap model (like Haiku) classifies the incoming request, and then routes it to a specialised prompt designed for that specific task type. Each specialised prompt can be optimised independently with task-specific instructions, examples, and constraints.

This pattern costs marginally more in API calls but produces significantly better output quality. It also makes the system easier to maintain - when you need to update how refund requests are handled, you change one prompt instead of carefully editing a monolithic one.

7. Not Evaluating Systematically

Most teams "evaluate" their prompts by running a few test cases manually and eyeballing the results. This is inadequate for production systems. It misses edge cases, doesn't catch regressions when you update the prompt, and gives you no objective basis for comparing approaches.

The fix: Build a proper evaluation set. Collect 50-100 representative inputs with expected outputs (or at minimum, quality criteria). Score every prompt change against this set before deploying. Automate the scoring where possible - LLM-as- judge, exact match, or regex patterns depending on the task. Track your scores over time.

This might sound like overkill for "just a prompt", but prompts are code. They determine the behaviour of your system. And like all code, they need tests.

The Bigger Picture

Prompt engineering is not a dark art. It is a systematic discipline of clearly communicating requirements to a capable but literal system. The model will do exactly what you ask - the challenge is asking precisely.

Fix these seven mistakes and you will see an immediate improvement in output quality, consistency, and reliability. In our experience, good prompt engineering is worth more than upgrading to a more expensive model - a well-prompted Sonnet outperforms a poorly-prompted Opus in most business applications.

Want an expert review of your AI prompts? We audit and optimise prompt engineering for business AI deployments. Book a strategy call and we'll show you where the quick wins are.