This case study shows how the conversational redesign of EVO Assistant turned a struggling voice bot into a human, efficient, and on-brand experience, raising NPS from 5.4 to 8.2, reaching 80% annual self-service resolution, and positioning EVO Banco among the most innovative digital banks in Europe.

Company
EVO Bank

Year
2020 - 2025

My role
As a Conversation Designer, I led the definition of EVO Assistant’s voice and content strategy, collaborating with linguists, engineers, marketing, and customer success teams. My work focused on three key goals:

  • Humanize the experience: craft an empathetic, clear, and brand-aligned tone.

  • Optimize understanding: improve NLU intents and flows using real data.

  • Enhance comprehension: implement a LLM-NLU architecture to boost understanding and contextual responses.

  • Reduce friction and costs: simplify journeys, minimize transfers, and increase self-service resolution.

EVO Assistant

The Challenge

  • For EVO Bank

    EVO Banco was struggling to deliver a consistent and efficient digital experience through its voice assistant. With a Net Promoter Score of 5.4, low intent recognition, and high call transfers to human agents, operational costs were rising while customer trust was declining. The bank needed to turn EVO Assistant into a reliable, on-brand, and cost-efficient self-service channel.

  • For the clients

    Users couldn’t complete their operations through the assistant, often faced long waiting times in the contact center, and experienced frustration and inconsistency in the overall experience. What was meant to be a quick, simple, and human digital interaction had turned into a source of friction.

  • Both the bank and its customers needed a conversational experience that truly worked, one that felt natural, solved problems effortlessly, and reflected EVO’s promise of a smarter, more intuitive digital bank.

The design process

Before GEN AI

Phase 1

1.1 Understand the user

The first and most important step was to empathize with our users and understand who was actually talking to our assistant.

We analyzed user profiles, and real conversation logs to identify expectations, and communication patterns. This research helped define how the assistant should sound and behave in each context, setting the foundation for a voice and tone that truly resonated with EVO’s users.

Illustration of a woman with dark purple hair tied up in a bun and wearing a yellow top with a white collar, and a man with blond hair, round glasses, and wearing a blue shirt.

Findings

Our main audience ranged from 30 to 45 years old, followed by younger users 18 to 29 and a smaller group of over 50. All digitally savvy and expecting quick, friendly interactions.

Pains

  • EVO assistant’s tone and persona, was too formal and technical, using language that didn’t match users’ everyday vocabulary.

  • Responses often ended abruptly without guiding users toward completing their task.

  • EVO assistant failed to acknowledge user’s emotion like frustration when an error happened or celebrate when an operation was successful.

Screenshot of a chatbot error message with a robot icon on the left and a speech bubble saying 'A system error has occurred and the operation could not be completed.' On the right, a person with glasses and a question mark above their head looks confused.

Solution

We created tone and voice guidelines that defined how EVO Assistant should communicate:

  • Use simpler, conversational language instead of jargon.

  • Acknowledge emotions and provide clear next steps.

  • Adapt tone based on the channel, the user’s level of knowledge, and the context of the interaction.

These guidelines became the foundation for a more natural, consistent, and user-centred conversational experience.

Chat conversation between a chatbot represented by a robot icon and a user with glasses. The chatbot indicates a transfer problem with a message, and the user responds with a question mark.

1.2 Understand the team

As equally as important as understanding our users, it was to understand how the team worked behind the assistant.

Early in the research phase, I identified a key operational gap: there was no centralized documentation for conversational designs.

Illustration of three faceless diverse people with simple features and round backgrounds, representing a man, a woman, and a man.
Illustration of three people with dialogue bubbles showing they are having a conversation.

Findings

Pains:

  • Lack of a single source of truth: decisions, iterations, and validations were scattered across chat threads and personal folders.

  • Inconsistent updates: without documentation, past decisions were lost or repeated, making it difficult to track progress.

  • Slow onboarding: new team members had no visibility into previous work or next steps.

Solution

Establish a shared documentation system where the team could store, review and update:

  • Conversation flows, intents and entities.

  • Tone and voice guidelines.

  • Design decisions and the reasons behind them.

This centralized workspace improved collaboration across linguists, software engineers, customer success, and marketing teams, and created a scalable foundation for the assistant’s continuous evolution.

Illustration of three people having a conversation, each with a speech bubble above their head.

Phase 2

2.1 Review EVO Assistant’s conversational flows

After identifying who was using the assistant, the next step was to understand how they were interacting with it.

I analyzed thousands of real conversations and key performance metrics to identify the most common and high-impact use cases. Applying an 80/20 approach allowed us to focus on the flows that concentrated the majority of user needs and had the biggest potential to improve satisfaction and reduce operational costs.

Illustration of a woman with red hair standing next to a large smartphone displaying a robot with a microphone symbol, representing AI voice assistant technology.

Findings

The analysis revealed several issues across the existing flows:

  • Generic questions led to confusion: the assistant lacked proper disambiguation prompts, so broad or unclear requests often produced irrelevant answers.

  • No error handling strategy: conversations frequently reached dead ends with no way to guide users back into the happy path.

  • Missing handovers: the assistant didn’t reliably detect moments when human assistance was required, prolonging failures and frustration.

  • Underused services and data: although the bank had transactional APIs and contextual data available, the assistant wasn’t leveraging them.

A comic-style illustration showing a woman requesting help with a money transfer. The conversation includes the woman asking for help, a chatbot instructing how to transfer money, the woman insisting she already made the transfer and wants to talk to a person, and the chatbot expressing confusion and offering to help with something else. The woman appears angry, indicated by a frowning face and an angry emoji.
Conversation between a woman, a robot, and a customer support agent discussing a money transfer. The robot asks about the transfer type, the woman hesitantly responds that the transfer is old, and the robot explains how it handles transfer issues and detects problems to escalate to a live agent.
  • User intent often appears in ambiguous forms, and when the assistant proceeds without confirming it, friction and loss of trust follow. Ensuring that users feel understood is a fundamental part of any conversational experience.

    With that in mind, we implemented disambiguation strategies to clarify intent upfront and avoid wrong turns in the conversation.

  • Conversations can get messy, they don’t always follow a straight line. So when a no-match or no-input happened, we added recovery points at key moments to avoid drop-offs and guide users back on track.

    • What happens when the assistant can’t access a service?

    • What should we do when a user has a complex issue?

    • How should we respond when someone explicitly asks to speak to a person?

    We asked these questions and worked hand in hand with the Customer Support team to define clear handover rules that ensure a smooth escalation to human agents whenever needed.

  • Users don’t want generic answers they want the assistant to do things for them.

    By integrating available transactional services and contextual data, we turned static responses into real actions the assistant could perform (like retrieving balances, blocking cards, or checking past transfers).

  • To handle the complexity of real conversations, we strengthened the assistant’s logic with better prompts, sharper intent definitions, and more reliable entity extraction, ensuring each interaction matched both user behavior and system capabilities.

Solution

To address these issues, we redesigned the core flows with a focus on clarity, usability, and meeting our clients’ needs.

We worked closely with data, software engineering, and business teams to understand the problems behind each interaction and rebuild the experience with a user-centered approach.

In short, the assistant wasn’t guiding users effectively, couldn’t recover when conversations went off track, and wasn’t using the bank’s capabilities to provide smarter, more useful, and meaningful support.

Together, these improvements made conversations more predictable, helpful, and far better aligned with what users expected, while supporting the bank’s operational and business goals.

2.2 Review the team’s workflow

Once we understood how users interacted with the assistant, we also needed to understand how the team worked behind it. Reviewing the internal workflow was essential to improving consistency, quality, and the ability to scale new conversational designs.

Analyzed how the engineering, QA, data, and business teams collaborated during each release cycle to identify gaps, blockers, and missed opportunities that were affecting the assistant’s overall performance.

Findings

The analysis revealed several issues across the existing workflow:

  • Reactive testing: automated tests were used mainly to fix bugs after going live, instead of preventing them before deployment.

  • No documented test cases: there was no shared list of scenarios or flows to validate before each release.

  • Missing quality checkpoints: without a clear review process, critical errors reached production and affected the customer experience.

  • Limited communication with metrics and business: analysts only flagged urgent issues, and there was no open channel to review new flows or report findings after each release.

  • Delayed business updates: new app features were sometimes communicated days after launch, leaving the assistant unable to answer users’ questions about them.

Stylized illustration of people working with computer and voice recognition technology, with charts and speech bubbles in the background.

Solutions

  1. We moved from reactive fixes to a clearer, preventative workflow by documenting test cases, adding quality checkpoints, and opening communication with business and customer success teams. With a proper release-notification process in place, the assistant stayed aligned with new features and delivered a more consistent, reliable experience.

Illustration of a woman analyzing financial data on a large laptop screen showing colorful graphs and charts.

Making testing smarter, not harder

Create a clear testing baseline before and after each release, allowing the team to monitor flows effectively and protect critical paths from breaking.

A woman interacts with an abstract digital chart on a large monitor, with colorful undulating lines representing data trends.

Catching issues before users do

Clear review workflow and mandatory approvals to prevent errors from reaching production.

Two people interacting with a large, cylindrical, robot-like figure with two speech bubbles.

Turning silos into shared insight

Regular meetings with metrics and business teams and used shared dashboards to review new flows, surface insights, and monitor releases.

Taxonomy that simply testing

2. We built a clear intent taxonomy so the team could quickly identify whether a request needed disambiguation, an action, or a simple FAQ. This improved flow consistency, NLU accuracy, and overall user experience.

Diagram illustrating a troubleshooting process with three steps: 'consult.card.product.action' for disambiguation and calling external service, 'block.card.product.action' for specific requests and calling external service, and 'conditions.card.product.info' for specific requests and providing FAQ response.

How EVO Assistant handled real conversations

Screenshot of a mobile banking app showing multiple bank accounts, including 'Cuenta Inteligente' and 'Cuentas Corriente,' with account numbers and obscured balances in euros.

Clear intent creates a clear path

The assistant doesn’t just guess what the user means. Through disambiguation, it understands the real need behind each request.

Screenshot of a mobile banking app showing multiple bank accounts, account types, account numbers, and a total balance.

From intent to action

The assistant not only understands the user’s intent, it can also navigate directly to the right screen in the app, allowing clients to resolve their needs quickly and without friction.

Screenshot of a mobile banking app showing multiple accounts, including a smart account, with masked balances and account numbers, and options to add another bank.

Escalate when it matters

Some cases need a human. We identified those scenarios and designed a smooth escalation path. When detected, the assistant simply schedules a call with a real agent.

Just when we had clarity, alignment, and solid designs, Gen AI changed everything.

The design process

From NLU to LLMs: A New Era

Phase 3

Adding an LLM Layer

Before integrating Gen AI into EVO Assistant, we first explored how LLMs could meaningfully improve conversational management. Given how new the technology was, we didn’t want to design from hype, but from real, measurable benefits that would enhance the experience without breaking it.

Digital illustration of two women interacting with technology. One woman sitting on large stacked books using a laptop, and the other standing beside a large microphone icon on a blue device. The background features icons related to Wi-Fi, temperature, light bulb, and settings.

Findings

User Experience Issues

  • Limited emotional intelligence: the assistant’s rigid responses didn’t acknowledge user frustration, making conversations feel cold and increasing friction.

  • Disambiguation gaps: some requests were too complex, such as multi-intent phrases for the NLU to interpret accurately.

  • Generic responses: static prompts couldn’t adapt to context, making it difficult to handle natural, detailed user explanations.

  • High rate of false negatives: many valid requests were misclassified as “no match,” forcing users into unnecessary handovers and breaking the conversation flow.

An illustration of a person presenting data on a large computer screen, showing bar graphs, pie charts, and line charts with gears in the background.

Business Impact Issues

  • Loss of insights: NLU missed emotional signals and product context, limiting visibility into user needs and opportunities.

  • Unnecessary transfers: false negatives and poor interpretation increased tcall-center load and costs by sending calls to agents the assistant could have handled.

Solution

To address these challenges, we designed and implemented an LLM-powered layer that complemented (not replaced) the existing NLU and flow logic.

We focused on:

  • Using the LLM for comprehension only (not free-form decisions), ensuring safety and predictability.

  • Achieving more accurate intent routing through assisted interpretation and entity extraction.

  • Improving disambiguation by better understanding vague or multi-intent messages.

  • Enhancing fallback behavior by reinterpreting unclear inputs instead of triggering generic error messages immediately.

  • Keeping control through guardrails, ensuring every output followed structure, boundaries, and brand tone.

A comic-style conversation between a woman and a robot about password recovery, with text indicating a chatbot handling a password reset for a package delivery.

This hybrid architecture allowed EVO Assistant to stay consistent, safe, and predictable while becoming significantly more intelligent.

Chat conversation about managing user frustration with IAMX, featuring a woman with blonde hair and a robot assistant. The woman mentions needing to see her balance after a month, and the robot explains resetting passwords can help check balance quickly.

Conversations stopped being purely transactional. The assistant could now acknowledge emotions, adapt its tone, and respond in a way that felt more human.

The anatomy of a hybrid design

Below is how the hybrid architecture works: LLM layers handle comprehension and context, while NLU takes control at critical moments of the flow, ensuring accuracy, safety, and predictability where it matters most.

Flowchart illustrating a process in natural language processing involving large language models and user interactions, with steps for disambiguation, account management, and handling user frustration or out-of-scope requests.

From Conversations to Business Strategy

We added an LLM to process feedback collected across multiple flows, extracting key pain points and turning raw comments into actionable insights. This helped improve processes, uncover blind spots, and prioritize changes based on real user needs.

Here are the key reasons the LLM identified when clients closed their accounts:

One insight stood out clearly: most clients were leaving because they switched to another bank.


With this knowledge, we worked closely with the Customer Success team to go deeper into why users were choosing competitors. This allowed us to shift the focus toward retention, better understand what clients were getting elsewhere, and design targeted strategies to improve the experience and address those gaps.

Designing for measurable impact

  • We saved €6M by preventing unnecessary call transfers.

    2021–2024

  • On average, clients resolved their queries in 1:37 minutes using EVO Assistant.

    2021–2024

  • We achieved a 82% average annual conversational resolution rate.

    2021–2024

The assistant evolved into a strategic brand asset, reinforcing EVO’s positioning as one of the most innovative digital banks in Spain.

Awards & Recognitions

World Finance Banking awards
Most Innovative Bank in Europe

Global Finance Banking awards
Best Consumer Digital Bank in Spain for 2022

2022



World Finance Banking awards
Most Innovative Bank in Europe

2020