In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'deployment':
How easily can data science work products be placed
into production to meet timely business objectives?
Data science comes with the expectation that amazing insights and predictions will transform the business and take the enterprise to a new level of performance. Too often, however, data science projects fail to "lift-off," resulting is significant opportunity cost for the enterprise. A data scientist may produce a predictive model with high accuracy, however, if those scores are not effectively put into production, i.e., deployed, or deployment is significantly delayed, desired gains are not realized.
A more general definition of 'deployment' that seems relevant in this discussion is "the action of bringing resources into effective action." The resources in this context refer to data science work products such as machine learning models, visualizations, statistical analyses, etc. Effective action is to deliver these resources in a way that they provide business benefit: timely insights presented in interactive dashboards, predictions affecting which actions enterprises will undertake with respect to customers, employees, assets, etc.
For data science in general, and machine learning in particular, much of the deployment mechanism - or plumbing - is the same across projects. Yet, enterprises often find individual projects re-inventing deployment infrastructure, requiring logic for data access, spawning separate analytic engines, and recovery along with (often missing) rigorous testing. Leveraging tools that provide such plumbing can greatly reduce overhead and risk in deploying data science projects.
The 5 maturity levels of the "deployment" dimension are:
Level 1: Data science results have limited reach and hence provide limited business value.
At Level 1 enterprises, results from data science projects often take the form of insights documented in slide presentations or textual reports. Data analyses, visualizations, and even predictive models may provide guidance for human decision making, but such results must be manually conveyed on a per-project basis.
Level 2: Production model deployment is seen as valuable, but often involves reinventing infrastructure for each project.
In Level 2 enterprises, the realization that machine learning models can and should be leveraged in front-line applications and systems takes hold. Some insights may be explicitly coded into application or dashboard logic, however, the time between model creation and deployment can significantly impact model accuracy. This deployment latency occurs when the patterns in data used for model building diverge from current data used for scoring. Moreover, manually coding, e.g., predictive model coefficients for scoring in C, Java, or even SQL, for easier integration with existing applications or dashboards takes developer time and can result in coding errors that only rigorous code reviews and testing can reveal. As a result, enterprises incur costs for data science projects, but do not fully realize potential project benefits.
Level 3: Enterprise begins leveraging tools that provide simplified, automated model deployment, inclusive of open source software and environments.
As more data science projects are undertaken, the Level 3 enterprise realizes that one-off deployment approaches waste valuable development resources, incurs deployment latency that reduces model effectiveness, and increases project risk. In today's internet-enabled world, patterns in data, e.g., customer preferences, can change overnight requiring enterprises to have greater agility to build, test, and deploy models using the latest data. Enterprises at Level 3 begin to leverage tools that provide the needed infrastructure to support simplified and automated model deployment.
Level 4: Increased heterogeneity of enterprise systems requires cross-platform model deployment, with a growing need to incorporate models into streaming data applications.
The Level 4 enterprise has a combination of database, Hadoop, Spark, and other platforms for managing data and computation. Increasingly, the enterprise needs models and scripts produced in one environment to be deployed in another. This increases the need for tools that enable exporting models for use in a scoring engine library that can be easily integrated into applications. Level 4 enterprises seek tools that facilitate script and model deployment in real-time or streaming analytics situations as they begin to use data science results involving fast data.
Level 5: Enterprise has realized benefits of immediate data science work product (re)deployment across heterogeneous environments.
The Level 5 enterprise has adopted a standard set of tools to support deployment of data science work products across all necessary environments. Machine learning models and scripts created in one environment can immediately be deployed and refreshed (redeployed) with minimal latency.
In my next post, I'll provide summary Data Science Maturity Model table and a corresponding spreadsheet to aid enterprises in conducting a DSMM assessment.