This article is the second of a series “hop on your Natural Language Processing Journey” about NLP:

Last week, we managed to convince Joe to look into his modus operandi and think about ways of improving it through NLP. One of the main missions of Joe and his team is to handle customer questions and giving them a reply in a timely manner, as no customer likes waiting.

Joe likes simple

Joe likes simple. And that is exactly how their process for handling customer requests is: simple.

 “Simple works, simple never gives you up, simple never lets you down”

But is simple the most efficient?


As much as Joe is stubborn, he still agreed to rethink the process. Why is that? Recent satisfaction surveys clearly highlighted the limits of his strategy:

  • Employee satisfaction is low as their task is repetitive. Most of their day is spent answering the same simple questions, leaving close to no time to solve the real issues.
  • Customers complained a lot about the waiting time for their requests to be handled.

After some workshops and discussions, it was easy to identify where the frustrations originated:

  • The 10 most asked questions are taking 90% of the volume, leaving the agents frustrated to answer the same questions, again and again.
  • There is no central knowledge base with the different questions and answers. Every time an agent receives an email, he must write a new answer.

To improve the efficiency of the process as well as to give the agents a real sense of added value, it was decided to create templates of replies for all the questions that the customers might ask and then implement a NLP solution that would categorize the 10 most asked questions. In doing so, the agents would be able to have a suggestion of answer proposed by the NLP solution for most of the questions asked by the customers.

With all of that in mind, here is what the new process should look like:

Text classifier to the rescue

We want to build a text classifier: a solution that will take an email (text) as an input and will give us a tag as an output with the corresponding classification/category (in our case, the categories are the different questions asked by the clients).

Machine learning text classifiers can learn to make predictions given an unknown mail based on previous observations. In order to do so, those text classifiers must be trained with labelled data, i.e. examples of mails associated with the right category. The classifier will inject that data and learn to extrapolate on unseen data.

Before being able to feed the classifier with labelled data and train it, we need to talk the same language as our classifier. Machines are more comfortable with numbers than texts. The process of transforming the input text into a numerical form (a vector) is called feature extraction. 

One way of doing so (among others) is called the bag of words representation which consists of transforming the text into a vector representing the frequency of a word in a dictionary that is predefined. For example, if we have the sentence “My car just broke, could you help me?” in the dictionary consisting of those words {my ,car, boat, circle, broke, help} then we would have {1, 1, 0,  0, 1, 1} as a vector representation. “my”,”car”, “broke” and “help” appear in the sentence, reflected by a 1-value in the dictionary. The other words do not, and are thus represented by a 0.

Other techniques are used to pre-process the text:

  • We might want to leave out unimportant words which don’t help to understand the sentence in our vector representation (e.g. “My”, “just”, “could”). Those words are called stop words, which represent the most used words in a specific language. Those specific occurrences can be deleted using NLP libraries that consist of models already trained to recognize those words.
  • We also want to group the different forms of the same words together (“be”, “are”, “am”…). The text will be preprocessed to extract the root of each word instead of its specific form in the text.

    Figure 1 – training phase of a classifier –


Once the classifier has been trained, the model can begin to extrapolate on unseen texts.













Figure 2 – prediction phase of a classifier –

Nothing escapes Joe’s attention and he starts wondering:

“How would we make this work as we have no labelled mails?”

Machine learning vs rule-based

Labelled data is not always accessible at first. One way of getting those data would be to manually tag thousands of mails and then use them to train our machine learning model. This solution is very time consuming, and let’s be honest, not the most gratifying task.

We’ve been talking about machine, but that does not mean that we should limit the range of possibilities to that only. The first step could be to implement rule-based system that will classify the mails. The agents will then accept or reject the proposition and then choose the right one. In no time we will have enough data to train our model.

“A rule-based text classifier, how does that work?”

The approach is to use a set of handcrafted linguistic rules to choose the right category. Those rules will instruct the solution to use semantically relevant elements in a text. To do so, we could associate lists of words for each category and then counting the number of occurrences of words in each category to determine the right one. Let’s take an example:

Let’s say we want to categorize sentences in two groups. One category is about “fruits” and the other one about “cars”. We then create list of words for each category, {apple, pear, strawberry…} for fruits and {sport car, BMW, brake, wheel…} for cars. The sentence “Why do apples taste better than pears when I am in my car?” will be detected as a sentence about fruits, as there are 2 occurrences of fruit words against only 1 for car. Other rules can be added, like prioritization, or more importance for certain concepts…

This approach requires a lot of expert knowledge and doesn’t scale well, but it can be easily implemented as a temporary solution to grow our database of labelled data.

What are those algorithms

Soon, Joe can see the real benefit of what has been put in place. Employee satisfaction skyrockets as they can now have a non-repetitive fulfilling mission and customers are happy with how fast and efficient the customer service is. But the curiosity of Joe wasn’t met:

“We didn’t talk about machine learning algorithms, what are those?”

There are different machine learning algorithms that can be used in text classification, such as Support Vector Machines or Naïve Bayes, or even Deep Learning algorithms, based on neural networks. In order to understand better what is behind those words, let’s follow Joe next week as he gets his hand dirty and implements his first simple classifier.

If you have any question; please send us an email at

Written by Charles-Antoine Vanbeers