VUI Challenge #1: my designs

Benjamin McCulloch (conch.design)
7 min readJul 22, 2021
Photo by HalGatewood.com on Unsplash

I’m doing Jesús Martín’s VUI Challenge to sharpen my skills. Jesus has a great deal of experience to share, and I enjoy doing his course when I have the time. Keeping my writing sharp is vital for great conversation design.

I spend a few hours on each challenge within my daily tasks. Here’s my writing with my thought process…

VUI Challenge Email #1

Each day he sends a new challenge by email.

This is challenge #1:

Challenge: Design a Welcome message for a service called TVguide for a Google Action or an Alexa Skill. The service provides information about what’s is set on TV and allows customers to ask for the information they need.

Tips: The welcome message is the first prompt your customers will receive from your interaction. Is the entrance door for your experience and will need to help customers understand what’s your experience all about while keeping them motivated to carry on.

Most welcome messages have 3 parts:

Greetings

Scope explanation

Call to action

Ben’s notes: Jesus doesn’t say exactly what TV type this is for — I assumed by the words ‘what’s set’ that it’s for classic ‘programmed’ TV listings and not recommendations for NetFlix/Prime/Disney+ etc. If this was for a real client and not just a learning exercise that would be my first question!

Although streaming is now very popular, I know a lot of people who still watch programmed TV in the UK and Czechia (my homes). Therefore it doesn’t seem unlikely that a skill would focus on old-skool programmed TV.

Writing the GREETING

1st attempt:

“Welcome to TV Guide”

(Notes: I don’t like it because I didn’t go anywhere. I find ‘welcome’ in an ambient experience IN MY OWN HOME to be odd. Ambient voice assistants reside in my house — that’s where they help me. The experience isn’t like a TV show or other entertainment where I’m transported to another place)

2nd attempt:

“This is your TV Guide”

(Notes: To me that sounds more personal and helpful. The emphasis is on it being for me. It’s like they’re coming to help ME, rather than me having to go and ask THEM)

Writing the SCOPE EXPLANATION

1st attempt:

“Taking the mystery out of finding the right show to watch”

(Notes: weak)

2nd attempt:

“Find the shows you want to watch”

(Notes: dull)

3nd attempt:

“Let’s quickly find something great to watch and leave mystery and suspense to the actors”

(Notes: better — has more witty character — but could be sharpened)

4th attempt:

“Let’s find something great to watch! We can leave mystery and suspense for actors”

(Notes: better still, but could be improved)

4th attempt:

“Let’s take the drama out of finding great TV! We’ll leave mystery and suspense to actors”

(Notes: that feels strong to me — i like it. this sums up the service in an enjoyable way. It’s engaging if not rib-ticklingly funny… and that’s ok when people just want to find something to watch.

You could say that ‘takes the drama out of’ is an over-used marketing expression but this is a hypothetical project for a hypothetical brand. Using it in a context where drama is common (TV) perhaps gives the tired old words a new lease of life. If I knew the brand and their previous marketing materials I could create something more fitting to their house communication style)

Writing the CALL TO ACTION

1st attempt:

“Are you looking for viewing times for a show you already love or a new recommendation?”

(Notes: Why does 2 options feel too few? By the ‘rule of 3’ I know that a user won’t feel overwhelmed if there’s 1 more choice. As this is the start of the experience I can perhaps give them options that get results quicker…

And what about ‘Viewing times’? Should it be ‘Listing times’? Listings? Information? What’s the most common way to refer to old TV schedules? ‘TV schedules’ sounds horribly archaic, but if that’s what people know then it’s the best wording)

2nd attempt:

“Tell me; do you want me to recommend something that’s on now, to find something you’d love to watch later, or look up the time and day a particular show will air?”

(Notes: I assume some people would like to quickly find something that’s currently playing. I’m happy to say ‘time and day’ because that’s clearly understandable language. Now I don’t need to use the words ‘TV Schedule’!)

3rd attempt:

“Tell me; shall I recommend something on now, find something to watch later, or look up the time and day for a particular show?”

(Notes: I like how the verbs are different for each option — good for NLU — users could just say ‘recommend’, ‘find’ or ‘look up’ and get to the next step. I like how it goes from specific to vague (now, later, anytime). ‘Now’ option could be optimised for users in a hurry, with barge-in so they could immediately interrupt the assistant and move forward!)

4th attempt:

Tell me; shall I recommend something playing now, find something to watch later today, or look up the time and day for a particular show?

(Notes: This feels good to me.

TTS doesn’t know what I intend with the semicolon. I think there should be a tiny but perceptible pause after “tell me” because I want it to catch the user’s ear and make them realise they need to listen closely. It won’t work if the TTS engine just says “tell me shall i recommend…” without a pause.

This line could work without “tell me”. I remember the advice in Evanhoe/Deibel’s Conversations with Things which recommends against assistants of this type referring to themselves as “I” or “me”. Also these words could be seen as just extra noise (and anything superfluous should probably be cut for brevity).

On the other hand I like that it warns the user that they need to listen. Perhaps there’s a better wording out there than “tell me” that wouldn’t have the assistant referring to itself. The wording I want to avoid is “just say” as I think it’s a cliché within conversation design. The more people hear soundalikes the less effect they have.

“Now listen up” would be too commanding and would make it seem like the assistant is stating it’s authority.

“Let’s decide” could work.

“You decide” is better and could work.

“How about?” would work if I reworded the following sentences to fit that structure)

Final wording for the prompt

This is YOUR TV GUIDE. Let’s take the drama out of finding great tv! We’ll leave mystery and suspense to actors. Tell me; shall I RECOMMEND something playing now, FIND something to watch later today, or LOOK UP the time and day for a particular show?

Notes:

  • CAPS — used to mark TTS emphasis
  • ; — TTS should have a tiny but perceptible pause (after ‘Tell me’) — perhaps ~25–200 ms (test it)
  • If it’s too long cut “We’ll leave mystery and suspense to actors.” It would still work without it.

I recorded the prompt as VO for guidance when generating TTS

(This is a quick way to get a feel for the final result before getting lost down the SSML rabbit-hole. I’m not aiming for a perfect read, just something that shows me the cadence and flow — it’s a ‘scratch take’. TTS is ALWAYS going to sound different anyway — any 2 voices would read something differently)

And I created TTS with SSML using Amazon Polly (Matthew Voice)

SSML for first TTS example

<speak>This is <prosody volume="loud"><emphasis level="strong"> your </emphasis> TV Guide. </prosody>Let’s take the drama out of finding great TV! We’ll leave mystery and suspense to actors.Tell me <break time="200ms"/> shall I <prosody volume="loud"> recommend </prosody>something playing now, <prosody volume="loud"> find </prosody>something to watch later today, or <prosody volume="loud"> look up </prosody>the time and day for a particular show?</speak>

With Amazon Polly TTS (Joanna voice)

As you can hear she jams the first sentence together. For an introduction that sounds too rapid (this is the user’s 1st impression of the skill — it needs to be clear and sound confident).

Although it’s a bit awkward, I added a pause just to make it sound stronger. This isn’t a perfect solution but it works better:

SSML for second Joanna TTS example

<speak>This is <break time="5ms"/> <prosody volume="loud"><emphasis level="strong"> your </emphasis> TV Guide. </prosody>Let’s take the drama out of finding great TV! We’ll leave mystery and suspense to actors.Tell me <break time="200ms"/> shall I <prosody volume="loud"> recommend </prosody>something playing now, <prosody volume="loud"> find </prosody>something to watch later today, or <prosody volume="loud"> look up </prosody>the time and day for a particular show?</speak>

The gap is a little awkward, but it does work. It sounds like she’s emphasizing the words ‘your TV Guide’ and that’s what I wanted. Strangely it made no difference how long I set the first break (5ms sounded as long as 50ms).

Final thoughts

I’m doing this challenge as a learning exercise. The process is its own reward — I get to think about how to write prompts that are strong, informative and engaging.

My results could definitely do with being tweaked (as ever — I’m one of those types who need a deadline to know when to stop!) but I wouldn’t be ashamed to show this work to a team-mate for feedback and let it be tested within the live skill.

What do you think?

If this appeals to you, you should sign up for the challenge yourself!

Benjamin McCulloch

Conversation Designer (with audio superpowers)

Conch.design

--

--