Data Science / R News

Best Practice in Data Science – a view of Docker production

by cswindell · September 14, 2021

This article is originally published at https://www.mango-solutions.com

Best practice in data science can lead to long-lived business results. A structure that encourages repeatable processes for generating value from data, leads to a fully productive team working, allowing reproducible results, time and time again. When this process is ingrained across a company’s culture and the business and data teams are working together in harmony with the business goals, then the value of data can be realised into an overall centre of excellence and a shared language for best practice.

A shared language of best practice

Layers of operational best practice allow a standard practice to be adopted – ensuring the best possible outcome of your data science investment. For a data science team, best practices could relate to developing models or structuring analysis, quality standards or how a project is delivered. Alternatively, they could even align to the selection of your data and analysis tools, as these can easily impact the success of your project.

With data science teams coming from a diverse range of backgrounds and experiences, what may be obvious to one can be a novelty to another. A shared language of best practice allows collaborators to focus on the all-important value generated. A workflow that adheres to a best practice ensures quality, whether that be business value of insights to the accuracy of models. Best practices take the guess work out, minimise mistakes and create a platform for future success.

4 best practices every data delivery teams should focus on:

Reproducibility – Whatever the task is. If your results can’t be repeated, then is it really done?
Robustness – Results and quality of analysis can have a huge impact, ensuring your best practices that has checks and balances will lead to better quality
Collaboration – What use are your results if they are difficult to share. Having standards for collaboration means business value can be attained
Automation – It is very easy to do work with no automation, frameworks for automation can help accelerate teams

Best practice in Dockerisation

My talk at the Big Data London Meet Up ‘ How Docker can help you become more reproducible’, takes one element of best practice in data science, focusing on Dockerisation which is proving to be a powerful tool – one that is already turning established best practices in teams on its head. The tools allow teams to collaborate much easier, to be much more reproducible and automate workflows, in an impressive way. Yet, it has not had as much adoption within data science as it has within software engineering. My talk will explore just how Docker can super charge workflow and your valuable use cases.

This talk will be of interest to any data scientist who has had trouble with, deploying or working with engineering teams, reproducing colleagues’ analysis. It will also be of interest to anyone wanting to know how docker can scale a team, making it less intimidating and perfectly arming practitioners with the tools to give it a go.

I look forward to seeing you at Mango’s Big Data London, Meet Up, 22nd September 6-8pm, Olympia AI & MLOPS Theatre. You can sign up here {sign up link with blog or email link for tracking?}

Kapil Patel is one of Mango’s Data Science Consultants.

The post Best Practice in Data Science – a view of Docker production appeared first on Mango Solutions.

Thanks for visiting r-craft.org
This article is originally published at https://www.mango-solutions.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Best Practice in Data Science – a view of Docker production

You may also like...

Categories

Best Practice in Data Science – a view of Docker production

A shared language of best practice

4 best practices every data delivery teams should focus on:

Best practice in Dockerisation

You may also like...

First mlverse survey results – software, applications, and beyond

Laminar flow with ggplot2 and gganimate

Summer interns

Categories