Why has storytelling become so important for data scientists? Why do data scientists need to learn storytelling?
In my last five years of working in analytics and consulting, I have come across this phrase thousands of times:
“The story is not coming out properly.
We need to work on story first, rest can be done easily.”
Why so much emphasis on story? After all, in data science and analytics what matters most is the accuracy of the model, isn’t it?
Are you sure?
Let’s figure that out.
Let’s assume that you have worked on a churn prediction model in your organization and you have to present the entire engagement to your CEO and CFO, who don’t have any background of the project. Seems easy? Since it’s a data science project, all you have to do is tell them about the model and the accuracy, that’s it. Is it?
Let’s take different scenarios of presenting our results and see which one you will prefer.
Approach 1:
“We have developed a churn prediction model with an accuracy of 82.7%.”
Approach 2:
“We developed an ensemble of random forest and logistic regression to predict the likelihood of a customer churning out in next three months. We have achieved an accuracy of 82.7% using the ensemble. The previous model that was based only on logistic regression had an accuracy of 71.1%.”
Approach 3:
“Our North India zone was facing the challenge of high churn rates. A lot of old customers were churning out from our network. We undertook this engagement with the objective to predict customers who are likely to churn in next 3 months. The existing model lacked the accuracy that we wanted to achieve. For this, we used internal customer data and complemented it with external social media and demographic data to develop an ensemble of model. We developed a random forest model with 1000 trees and logistic regression which is based on maximum likelihood estimation. This model gave us an accuracy of 82.7% which is better than that of our previous model.”
Approach 4:
“Our North India zone was facing the challenge of high churn rates. For the past four quarters, churn rates were over 10% each quarter which was leading to a revenue loss of around USD 100,000 per quarter. A lot of old customers were churning out from our network.
We undertook this engagement with the objective to predict customers who are likely to churn in next 3 months. By estimating this probability, we would want to focus on customers (through offers, promotions or improving customer relationship) who have high likelihood of churning out.
Though we already had a model for the same, but the accuracy provided by the model was not helping us much. So, we developed a new algorithm that combines two machine learning models and provides us better accuracy (82.7%) than the previous model (71.1%). For developing this new model, we used internal customer data and enriched it with external social media and demographic data to develop an ensemble of model. The overall impact of this model is estimated to be around USD 75,000 per quarter.”
Which approach would you choose if you were to explain the engagement to your CEO and CFO?
Approach 1? Hell, No…
Approach 2? No…
Approach 3? Umm… May be
Approach 4? Yes, certainly.
Now, one can always argue that approach 4 is nothing but providing complete details about the project – background, data, algorithm, output and impact.
That’s what storytelling is.
Five Key Aspects of Storytelling
One of the most important aspects you need to consider while providing all these details is your ‘audience.’ Understand who your audience is. If your audience is Chief Data Scientist then you should go into technicalities of the model; however, in this case our audience is CEO and CFO. They would be least interested in the underlying model, rather, they are more concerned about the revenue/cost impact it will have on business.
I will define storytelling as “presenting different aspects of a project in such a manner that they are exhaustive, coherent, succinct and audience-friendly.”
Now, let’s understand different aspects of a data science projects and learn how to present them in your story.
Any data science or data analysis project would essentially cover five different aspects.
Can you relate this with the ‘Approach 4’ mentioned above? Is there any other Approach which covers all the five aspects?
No, right?
When you are working on large datasets and your aim is to develop a model that improve accuracy, it is not uncommon to lose sight of the larger picture. That’s why while presenting your results, you should keep the above framework in your mind and build your entire story accordingly.
Your model may be among the best models in the world but if you can’t convince business users about it, they will always have their apprehensions in implementing that. So, you should give due attention to first and last aspect of the above framework – ‘Background’ and ‘Impact’. These two aspects will help business users understand the criticality and importance of your project.
2 thoughts on “Storytelling for Data Scientists – Why is it important?”
Comments are closed.