In this next installment of the Data Science Maturity Model (DSMM) dimension discussion, I focus on 'asset management':
How are data science assets managed and controlled?
Assets are typically both tangible and intangible things of value. For this discussion, we will consider the array of data science work products as assets and can define 'asset management' at a high level as "any system that monitors and maintains things of value to an entity or group." As we introduced earlier in this blog series, work products consist of, e.g., data (raw and transformed), data visualization plots and graphs, requirements and design specifications, code written as R / Python / SQL / other scripts directly or in web-based notebooks (e.g., Zeppelin, Jupyter), predictive models, virtual machine / container images, among others. In this context asset management should cover the full asset life cycle - from creation to retirement. Throughout the life cycle, the need for asset storage / backup / recovery, metadata-based search and retrieval, security (e.g., privilege-based access control, auditability), versioning, archiving, and lineage must be addressed - basically governance. Specific to data science is the need for model management, which encompasses, e.g., model life cycle, governance, repeatability, monitoring, and reporting.
The 5 maturity levels of the "asset management" dimension are:
Level 1: Analytical work products owned, organized, and maintained by individual data science players.
Data science players at Level 1 enterprises are essentially 'winging it', taking an ad hoc approach to asset management. Players are responsible for maintaining their data science work products, typically on their local machines, which may or may not be backed up or secure. Asset loss and an inability to reproduce results are not uncommon. Across the enterprise, data science work products are "hidden" on individual machines, with no effective way to search.
Level 2: Initial efforts underway to provide security, backup, and recovery of data science work products.
The Level 2 enterprise recognizes the need to manage data science work products. This typically begins with organization-based repositories that provide storage with backup and recovery to reduce asset loss, as well as security to control access.
Level 3: Data Science work product governance is systematically being addressed.
The Level 3 enterprise begins to see data science work products as an important corporate asset. As such, tools and procedures are introduced to centrally manage assets throughout their life cycle. As the enterprise expands its data science effort with machine learning models, the need for model management also gains visibility. The need to determine which data and processes were used to produce data science work products is gaining recognition with steps being taken to answer basic questions definitively, e.g., on what is this result based?
Level 4: Data science work product governance is firmly established at the enterprise level with increasing support for model management.
The Level 4 enterprise has adopted best practices for data science work product governance. Data science players as well as the overall enterprise reaps productivity gains through being able to easily locate, execute, reproduce, and enhance project content. The question of "how was this result produced and on what data?" can readily be answered.
Level 5: Systematic management of all data science work products with full support for model management.
The Level 5 enterprise surpasses the Level 4 enterprise by having introduced tools and procedures that support model management. As data science projects are deployed, their outcomes are fully monitored with reporting on value provided to the enterprise. Such outcomes are factored back into the project forming a closed loop - ensuring data science projects continue to provide value based on current relevant data and trends.
In my next post, we'll cover the 'tools' dimension of the Data Science Maturity Model.