Data Strategy to Leverage Artificial Intelligence in Corporations
Authors: Daniel Dippold, Miranda Zachopoulou
Very often we get approached by companies that want to “tap into the power of AI”. Almost just as often, we find that companies which approach us are either too specific or too generic. Some companies have a general interest in exploring AI use cases while others want specific problems solved. The former group of companies often focuses on AI use cases with low impact, while the later group often misses out on the broader potential that AI can leverage beyond a single application.
Moreover, among all the different terminologies such as data science, data engineering, machine learning, artificial intelligence, natural language processing, deep learning and many more, decision makers struggle to get their focus right. Even though all these words are defined in theory, studying mere terminology is not what we would recommend to knowledge workers and executives in the 21st century. Instead, companies should focus on their current progress on the artificial intelligence value chain.
Before the full value of AI can be unlocked, a set of prerequisites needs to be in order. The artificial intelligence value chain is designed for companies that are not AI-first, i.e. are working on solving problems unrelated to AI. For example, BMW is primarily concerned with building cars — autonomous driving is only one of the many AI capabilities that help BMW create more value for the customer.
We define the chain as follows:
- Data Strategy. Understand which data assets are most relevant in your company, identify which already exist and which have to be created in the future. Link the data assets to your key business metrics, such as revenue, costs, customer satisfaction, market share, etc.
- Data Engineering. Clean your data and preprocess it in order to ensure that a meaningful link can be made during the next step: Data analytics. Moreover, investigate the potential of combining different data assets
- Data Analytics. Analyse the relationships between your data sources and your key business metrics. Only meaningful relationships should be explored further.
- Machine Learning. A) Create in-depth, non-linear and multivariate insights through machine learning and B) build powerful machine learning applications to improve business cases.
- Ensembles. Combine different (machine learning) models for maximum performance.
- Learn & Repeat. Draw a resume, adjust your data strategy and restart the process.
While it is possible to begin the AI value chain journey at any point, the benefits gained will be much greater if executives, or in the best case the entire executive board, begin the process in order, that is, they start with data strategy and approach the topic of artificial intelligence methodologically.
It is important to note that a company is going through an exploration-exploitation trade-off when going through this process. In order to maximally capitalise on and benefit from machine learning applications (exploitation), enough time should be spent on evaluating which models are the best fit (exploration). Neither exploring too much nor exploiting too quickly is advisable: The two phases need to be balanced. Next to balance, it is important to keep an explorative attitude until the end of the data analytics phase as many insights are only generated then. We recommend that executives are involved until this stage and make sure that data analytics are in line with strategic priorities and key business metrics.
In the following paragraphs, we want to shed light on each of the steps and walk you through an example of how we experienced this process with one of the biggest financial institutions world-wide.
1 Data Strategy
Before one starts to think about artificial intelligence, we recommend that a company begins to think about their data assets first, as data is the cornerstone of most artificial intelligence algorithms. One should ask: Which valuable data do I possess and how is it linked to my key business outcomes? A retail bank, for example, possesses data about clients and their product usage, their own employees, financial metrics of branches and departments, recorded calls, complaints, written emails, etc. Thinking about all those assets holistically, a retail bank might conclude that customer data is clearly most valuable and that better decisions are made based on these customers: i.e. who gets a loan, which cluster of customers is most likely to buy which additional product, who is most susceptible to marketing material, etc. Linking a data asset with a business metric is what we call a data opportunity. Once a list of all major data opportunities has been created, we recommend investigating them on a two-dimensional plot.
All we recommend there to be are two axes: Technical feasibility (i.e. is it possible to build the technology and if so, how quickly can it be built?) and business viability (i.e. how likely is this going to generate monetary and strategic value?).
Once all data opportunities have been mapped, we recommend applying a threshold. Note that our threshold line is deliberately steeper than a typical 45° line. In our experience, most companies underestimate the difficulty of building and integrating real-life AI-applications. Technologies that are highly strategic but hardly feasible often require companies to shift their entire focus towards developing a strong AI-capacity. This is culturally, monetarily, and strategically very difficult as top talents in the AI-space are incredibly rare and very expensive. Moreover, we recommend that especially for investigating the technical feasibility of certain data opportunities, companies should consult an expert within the field. This can be a consultancy or a freelancer, but the person(s) involved in the process should have a background in building and deploying machine learning applications.
2 Data Engineering
Once a data opportunity has been prioritised, compiling and preprocessing the relevant data is crucial. We recommend that companies moving from data strategy to data engineering, address each of the following points:
- Is the data quality sufficient for exploiting the data opportunity?
- Is the data quantity sufficient for exploiting the data opportunity?
- Can data assets be linked through unique identifiers? If not, can additional code be generated to make data assets linkable?
For example, a large retail bank might notice at this stage that, despite storing them in different locations, customer data sets concerning complaints, product usage, customer satisfaction surveys, internal CRM, profitability data, and call centre data, can all be merged into one master-data table that generates multiple insights at once. Moreover, it is very well possible that larger gaps are also identified in the data.
3 Data Analytics
Now, data opportunities can finally be put into practice: At this stage meaningful relationships can be shown. The data scientists in your company should draw from a variety of different data analytics tools ranging from ordinary statistics to advanced data visualisation techniques. In our retail bank example, we could show that the distribution of customer lifetime value varies greatly and is distributed according to a power law distribution, that is, the most revenue is coming from few customers. Moreover, when opening a bank account, only 40% of people complete the process. From market research, we know that those customers have likely settled for another bank. This, of course, is just one of many insights that are generated during an exploratory data analysis. From those insights, though, we can reasonably conclude that the population of all customers can be summarised in clusters with different likelihoods of completion and different customer lifetime values.
Furthermore, the investigation invites for more specific data analysis: We might start different explanatory data analyses to show, for example, the impact of certain interventions such as a pop up window, a call from the call center or a targeted email. Thanks to the generated insights, we can specifically target customers that are more susceptible to interventions and therefore achieve higher customer lifetime values. The execution of such a system is conducted during the next step: machine learning.
4 Machine Learning
At this point in time, the data scientists should put forward certain recommendations to the management of the company who will in turn decide on the measures that should be taken. Now, a variety of machine learning applications can be implemented and it should be possible to estimate the rough monetary and strategic value of implementing such applications. We recommend that a company hires skilled machine learning engineers that can help the company implement the chosen machine learning models. There is a certain overlap between the machine learning phase and the data analytics phase in that the former investigates which machine learning models might be most useful and the latter focuses on their implementation.
In our retail bank example, we would use machine learning models to define optimal multi-dimensional clusters of customers based on a two-dimensional optimisation problem: splitting for susceptibility to interventions and customer lifetime value. The customer with the highest average customer lifetime value will then be targeted by interventions. In our experience, it is plausible that a bank doubles their profit from new customers with such methods.
5 Ensembles | 6 Learn & Repeat
One might go even further and combine different machine learning models for optimal performance. Especially when one deals with millions of data points and earns significant money with optimising for decimal places, we recommend exploring different advanced machine learning models and combining them to enhance their predictive performance.
Moreover, during the implementation of machine learning models and the exploratory and explanatory data analyses, multiple unique insights are generated. These insights might invite for the exploration of new data opportunities and correction of existing ones. In our experience, the second round of the NEWNOW Strategic Artificial Intelligence Process is even more effective than the first one. As more data has been explored and the team has a deeper understanding of the intricacies of the company, more effective models can be built.
When this process is done diligently, we have always seen a positive return on investment on machine learning models. We recommend tracking and extrapolating the return on investment in order to plan and allocate sufficient resources for the next machine learning projects. Once a strong data strategy has been set and a machine learning capability has been built up, significant competitive advantages can be generated by doubling down on the technology.