There’s a lot online about how easy it is to build chatbots. Just search for create a chatbot in 5 minutes or chatbot in 5 easy steps or do you need to be a programmer to build a chatbot? You’ll find a Disney chorus of easy peasy lemon squeaziness. But is it true? Or are all these articles written by earnest young marketing types who think natural language intents sound jolly useful but have never personally … well … you know … had to use one. Let’s see for ourselves by racing through a real-world example.
For simplicity I’m going to limit the real-world example to the first two questions from a mobile device troubleshooter chatbot — it says hello then asks for a device model and problem symptom. What could be simpler? These are the opening questions from a chatbot built by my startup eXvisory.ai to demonstrate how our dev tools add deep logic (expertise) to chatbots, but the first two questions use a conventional chatbot builder.
There are lots of visual chatbot builder tools out there (which rather disingeniously claim not to require programming because you’re not writing raw code) but our example chatbot uses Google DialogFlow. Don’t worry, its building blocks are common to most tools, and I’ll describe each of them in layperson terms. For fun, you can try out the full mobile troubleshooter on Google Assistant, Telegram or Kik.
Intents, entities and training phrases
Like any programming task you build chatbots by combining simple building blocks. The most important blocks are intents and entities, which you use to model what a user might say. So if your chatbot asks “What is the make and model of your device?” the user might respond “iPhone 7”, or “iPad mini” or “iPad”. To act intelligently the chatbot must handle any possible response and there are often many possibilities. So you define intents and entities that capture likely responses via training phrases.
Training phrases are a set of examples of what the user might say in a given situation. Here it’s responses to a “which device?” question, but it could be the user asking a question. Mercifully, you don’t need to list all the possibilities (there could be thousands) just 10–50 good examples so the magic of machine learning AI gets the picture. The intent is a specific instance of what the user means (here they have a problem with an iPad). The highlighted words within the phrases are entities — the objects referred to within intents. As a chatbot developer you need to figure out intents and entities for your application. It’s not that hard, but you need to think logically.
Actions, webhooks and parameters
When an intent matches a user utterance (the posh term for anything the user says) the intent activates and does something to move the application along, like asking another question. What it does is called its action. The device.ipad intent described above sends an exvisory.event_symptom_switch action to an eXvisory webhook, which is programming code running in the cloud. Marketing folk rarely mention that most chatbots will require some webhook programming. Here the exvisory.event_symptom_switch action activates another intent that asks the user for their problem symptom.
Why create a specific device.ipad intent? Well, we get the model entity (e.g. “iPad mini”) and can deduce iPad-specific troubleshooting information, like the operating system being “ios” and the device a “tablet”. This data is stored in parameters for use by other intents, later in the conversation.
Connecting intents and context
A chatbot conversation is hardly a conversation unless it has multiple connected questions and answers. As you’d imagine this is implemented by connecting intents together, which turns out to be the hardest part of building chatbots, because what you may not have imagined is how quickly it gets complicated. But we’ll see that soon enough.
The symptom.question intent asks the “what symptom?” question in our example chatbot. It’s invoked by the user’s response to the “which device?” question, via the exvisory.event_symptom_switch action (described above), and we also have to define intents for each of its possible answers, for example connectivity symptoms like “can’t connect to WiFi”.
Here’s an example symptom intent. This should now look familiar. The training phrases match connectivity symptoms and we extract specific symptom and device type entities. We then prompt for confirmation that we have identified the correct symptom (see the Responses section). This symptom.connectivity intent is implicitly connected to its preceding symptom.question intent and to its following “Yes” or “No” confirmation intents by shared conversational contexts.
Contexts are an important chatbot concept. They capture the everyday fact that the same words mean different things in different contexts. So an intent should only activate if a user utterance matches both the conversational context and its training phrases. See how the above intent must match the context_symptom context (scroll back up to see how that context was created by the preceding symptom.question intent) and how it in turn creates a symptomconnectivity-followup context, which will only be matched by its “Yes” or “No” confirmation intents. Contexts are also used to store parameters gathered by intents and typically have finite lifetimes (expressed as periods of time or turns of the conversation or both).
So the flow of a chatbot conversation is either explicitly controlled by links from one intent to another or implicitly by matching contexts. But I sense your eyes beginning to cross, so let’s see what it takes to put it all together.
My first intent
It’s true, creating the first intent in a chatbot could hardly be easier. It’s called the Welcome intent and will usually be created for you.
Here we see contexts being reset (clearing out any old data) and training phrases matching what a user might say to start or restart a conversation. One thing we haven’t mentioned before are events. Events are another building block we can use to activate specific intents. Some events are system-defined, for example the Google Assistant Welcome event is sent to the chatbot when a user says “OK Google” on an Android phone, or you can send custom events from your webhook (more programming I’m afraid).
In a simple chatbot you might define a greeting in the welcome intent (“Hi, I’m mobile troubleshooter …”) but here we use the exvisory.reset action to reset the eXvisory deep logic webhook. The webhook then uses an event to invoke another intent that says hello and asks the first question.
Easy peasy. 1 intent and counting.
Question 1: which device?
This is the intent invoked by the exvisory.reset action (after the webhook authenticates the user and does a bunch of housekeeping). It introduces itself and asks the user to identify the device experiencing a problem.
To handle the device responses we define 10 intents. 4 are variations on the device.ipad intent we’ve seen before and you can probably guess what they do from their names (device.iphone, device.make_model and device.make_only). 4 more are a device.fallback intent, its “Yes” or “No” confirmations and device.question.retry — which handle the fallback case where we don’t recognise the device and ask the user if they want to try again. The 2 remaining device.explain and device.continue intents handle user requests to explain the question and to continue (after the explanation).
Still not so bad. 1 question and answer. 11 intents and counting.
Question 2: what symptom?
We saw earlier how the device.* intents are connected to the symptom.* intents, each of which require a pair of “Yes” or “No” confirmation intents (to check we’re recognising the correct symptom).
So far we recognise 9 symptoms (so 27 intents). Add in fallback intents for unkown symptoms and digressions like symptom.back, symptom.explain and symptom.continue to go back to the device question, explain the symptom question or continue, and you end up with a whopping 37 intents.
Gulp. 48 intents for two questions and answers!
Handing over to eXvisory
The multi-step nature of real conversations required to complete real tasks, and the need to react appropriately to all the possibilities, mean that building non-trivial chatbots can require hundreds if not thousands of intents.
So after using conventional intents to introduce itself and perform a quick device and symptom triage our wily mobile troubleshooter chatbot hands off to the eXvisory deep logic webhook for subsequent troubleshooting flow.
For example, the symptom.connectivity-yes intent activates if the user agrees their symptom is related to network connectivity. The intent sends the exvisory.start action to the webhook, which initialises the eXvisory deep logic network with the x_test_* parameters gathered during the triage phase, returns the next question in the troubleshooting conversation and handles its answer. A typical troubleshooting session can have at least 10 more questions and answers, which would normally require adding thousands of intents, but with eXvisory the only new intents are to handle user digressions, for example asking for help how to answer a question.
So do chatbots require programmers?
Yes. Building successful non-trivial chatbots requires quite significant programming skills. The combinatorial complexity of implementing decision trees (see Chatbot Decision Trees — seriously, how hard can they be?) and anticipating unconstrained natural language flows requires the same ultra-logical approach as coding large software applications. My personal opinion is that ideal chatbot developers are graduates with one or two years of software development experience. Time enough to have learnt what is possible, with enough experience to be confident and enough intellect to get their head around the tools and the business process they are automating.
But don’t let this article put you off chatbots. It takes about as much programming effort to build a useful chatbot as for one intelligent human to learn to do a non-trivial task well (I will modestly call this Lambert’s Law 🙂 but the resulting chatbot will work almost for free 24×7 alongside a legion of its clones. High quality chatbots make for revolutionary economics.
In part 2 of this article I’ll show you how the eXvisory.ai deep logic editor works. Follow me from my profile page to be notified when it’s ready. Please leave feedback below or say hello at contact eXvisory.