Azure / Data Science / Python / R News

Advent of 2023, Day 9 – Building custom environments

by tomaztsql · December 10, 2023

This article is originally published at https://tomaztsql.wordpress.com

In this Microsoft Fabric series:

Dec 01: What is Microsoft Fabric?
Dec 02: Getting started with Microsoft Fabric
Dec 03: What is lakehouse in Fabric?
Dec 04: Delta lake and delta tables in Microsoft Fabric
Dec 05: Getting data into lakehouse
Dec 06: SQL Analytics endpoint
Dec 07: SQL commands in SQL Analytics endpoint
Dec 08: Using Lakehouse REST API

We have explored the Data Engineering in Fabric, and today we will check the “Environment”.

Environment (still in preview)

Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.

Building a list of public libraries is a straightforward and simple process. By adding from PyPI and selecting the version with dependencies. In this case, I have selected boto3, pandas and urllib3. Every time, you add a library to the environment, you have to save it first. In the end, you have to publish the environment

You can also add libraries by using .yml file. Just import the file and you should be good. The YAML file should be your regular file; like this one:

dependencies:
  - pip:
      - boto3==1.33.11
      - urllib3==2.1.0
      - pandas==2.1.4

For the spark compute, you can also customize the settings, by selecting different runtimes, environment pool, number of spark drivers, memory, and cores.

Spark properties will also be available to tweak, but as of writing this blogpost, the properties are empty. Since Spark enables a lot of properties to be defined, I am confident, that there will also be capabilities available down the road.

After you have finished, remember to publish your environment (which might take a couple of minutes). Once the custom environment is published, you can attach it to the new notebook or to your current workspace. Go to your workspace setting -> Data engineering/Science -> Spark settings -> Environment.

Make sure to “Set default environment” is set to “On” and you can choose the desired environment.

You have to restart the session for the environment to take action.

Same way, you will have environment available in the drop-down menu, when using a notebook in the same workspace.

Tomorrow we will look into Spark job definitions.

Complete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Microsoft-Fabric

Happy Advent of 2023!

Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Advent of 2023, Day 9 – Building custom environments

You may also like...

Categories

Advent of 2023, Day 9 – Building custom environments

Environment (still in preview)

You may also like...

Duplicate Breusch-Godfrey Test Logic in SAS Autoreg Procedure

Latin Hypercube Sampling in Hyper-Parameter Optimization

Consensus clustering in R

Categories