Chatbots have come a long way since then. They are built on AI technologies, including deep learning, natural language processing and machine learning algorithms, and require massive amounts of data. The more an end user interacts with the bot, the better voice recognition becomes at predicting what the appropriate response is when communicating with an end user.
It won’t be an easy march though once we get to the nitty-gritty details. For example, I heard through the grapevine that when Starbucks looked at the voice data they collected from customer orders, they found that there are a few millions unique ways to order. (For those in the field, I’m talking about unique user utterances.) This is to be expected given the wild combinations of latte vs mocha, dairy vs soy, grande vs trenta, extra-hot vs iced, room vs no-room, for here vs to-go, snack variety, spoken accent diversity, etc. The AI practitioner will soon curse all these dimensions before taking a deep learning breath and getting to work. I feel though that given practically unlimited data, deep learning is now good enough to overcome this problem, and it is only a matter of couple of years until we see these TODA solutions deployed. One technique to watch is Generative Adversarial Nets (GAN). Roughly speaking, GAN engages itself in an iterative game of counterfeiting real stuffs, getting caught by the police neural network, improving counterfeiting skill, and rinse-and-repeating until it can pass as your Starbucks’ order-taking person, given enough data and iterations.
We then ran a second test with a very specific topic aimed at answering very specific questions that a small segment of their audience was interested in. There, the engagement was much higher (97% open rate, 52% click-through rate on average over the duration of the test). Interestingly, drop-off went wayyy down there. At the end of this test, only 0.29% of the users had unsubscribed.
Companies most likely to be supporting bots operate in the health, communications and banking industries, with informational bots garnering the majority of attention. However, challenges still abound, even among bot supporters, with lack of skilled talent to develop and work with bots cited as a challenge in implementing solutions, followed by deployment and acquisition costs, as well as data privacy and security.
Chatbots can have varying levels of complexity and can be stateless or stateful. A stateless chatbot approaches each conversation as if it was interacting with a new user. In contrast, a stateful chatbot is able to review past interactions and frame new responses in context. Adding a chatbot to a company's service or sales department requires low or no coding; today, a number of chatbot service providers that allow developers to build conversational user interfaces for third-party business applications.
Chatbots are a great way to answer customer questions. According to a case study, Amtrak uses chatbots to answer roughly 5,000,000 questions a year. Not only are the questions answered promptly, but Amtrak saved $1,000,000 in customer service expenses in the year the study was conducted. It also experienced a 25 percent increase in travel bookings.
I will not go into the details of extracting each feature value here. It can be referred from the documentation of rasa-core link that I provided above. So, assuming we extracted all the required feature values from the sample conversations in the required format, we can then train an AI model like LSTM followed by softmax to predict the next_action. Referring to the above figure, this is what the ‘dialogue management’ component does. Why LSTM is more appropriate? — As mentioned above, we want our model to be context aware and look back into the conversational history to predict the next_action. This is akin to a time-series model (pls see my other LSTM-Time series article) and hence can be best captured in the memory state of the LSTM model. The amount of conversational history we want to look back can be a configurable hyper-parameter to the model.
Message generator component consists of several user defined templates (templates are nothing but sentences with some placeholders, as appropriate) that map to the action names. So depending on the action predicted by the dialogue manager, the respective template message is invoked. If the template requires some placeholder values to be filled up, those values are also passed by the dialogue manager to the generator. Then the appropriate message is displayed to the user and the bot goes into a wait mode listening for the user input.
ELIZA's key method of operation (copied by chatbot designers ever since) involves the recognition of clue words or phrases in the input, and the output of corresponding pre-prepared or pre-programmed responses that can move the conversation forward in an apparently meaningful way (e.g. by responding to any input that contains the word 'MOTHER' with 'TELL ME MORE ABOUT YOUR FAMILY'). Thus an illusion of understanding is generated, even though the processing involved has been merely superficial. ELIZA showed that such an illusion is surprisingly easy to generate, because human judges are so ready to give the benefit of the doubt when conversational responses are capable of being interpreted as "intelligent".