Advent of 2023, Day 9 – Building custom environments
This article is originally published at https://tomaztsql.wordpress.com
In this Microsoft Fabric series:
- Dec 01: What is Microsoft Fabric?
- Dec 02: Getting started with Microsoft Fabric
- Dec 03: What is lakehouse in Fabric?
- Dec 04: Delta lake and delta tables in Microsoft Fabric
- Dec 05: Getting data into lakehouse
- Dec 06: SQL Analytics endpoint
- Dec 07: SQL commands in SQL Analytics endpoint
- Dec 08: Using Lakehouse REST API
We have explored the Data Engineering in Fabric, and today we will check the “Environment”.
Environment (still in preview)
Microsoft Fabric provides you with the capability to create a new environment, where you can select different Spark runtimes, configure your compute resources, and create a list of Python libraries (public or custom; from Conda or PyPI) to be installed. Custom environments behave the same way as any other environment and can be used and attached to your notebook or used on a workspace. Custom environments can also be attached to Spark job definitions.
Building a list of public libraries is a straightforward and simple process. By adding from PyPI and selecting the version with dependencies. In this case, I have selected boto3, pandas and urllib3. Every time, you add a library to the environment, you have to save it first. In the end, you have to publish the environment
You can also add libraries by using .yml file. Just import the file and you should be good. The YAML file should be your regular file; like this one:
dependencies:
- pip:
- boto3==1.33.11
- urllib3==2.1.0
- pandas==2.1.4
For the spark compute, you can also customize the settings, by selecting different runtimes, environment pool, number of spark drivers, memory, and cores.
Spark properties will also be available to tweak, but as of writing this blogpost, the properties are empty. Since Spark enables a lot of properties to be defined, I am confident, that there will also be capabilities available down the road.
After you have finished, remember to publish your environment (which might take a couple of minutes). Once the custom environment is published, you can attach it to the new notebook or to your current workspace. Go to your workspace setting -> Data engineering/Science -> Spark settings -> Environment.
Make sure to “Set default environment” is set to “On” and you can choose the desired environment.
You have to restart the session for the environment to take action.
Same way, you will have environment available in the drop-down menu, when using a notebook in the same workspace.
Tomorrow we will look into Spark job definitions.
Complete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Microsoft-Fabric
Happy Advent of 2023!
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.