Advent of 2021, Day 3 – Getting around CLI and WEB UI in Apache Spark
This article is originally published at https://tomaztsql.wordpress.com
Series of Apache Spark posts:
- Dec 01: What is Apache Spark
- Dec 02: Installing Apache Spark
Today, we will get familiarised with Apache Spark CLI and Web UI. Assuming, that you have read the previous blogpost and installed the Spark on your client.
Open your Command line tool and run:
Spark-Shell
And you should get the Spark instance up and running:
Accessing the WEB UI, there is already a hint in this printscreen and you can access the following pages:
http://localhost:4040/ | Spark WEB UI on client |
http://localhost:4040/storage/ | Storage manager |
http://localhost:4040/executors/ | Node executor infor |
http://localhost:4040/jobs/ | Spark job Tracker |
Spark WEB UI (or Spark shell application UI) looks like this:
Putting Spark to test
In CLI we will type and run a simple Scala script and observe the behaviour in the WEB UI.
We will read the text file into RDD (Resilient Distributed Dataset). Spark engine resides on location:
/usr/local/Cellar/apache-spark/3.2.0 for MacOS and
C:\SparkApp\spark-3.2.0-bin-hadoop3.2 for Windows (based on the blogpost from Dec.1)
But files that we want to use, can be stored anywhere, so let’s create two text files and store them on a desired location. So I will be making a folder on /Users/TomazKastrun/SparkDataFiles and storing two txt files:
Accordingly, we can use this path and get the file content.
println("##spark read text files from a directory into RDD")
val rddFromFile = spark.sparkContext.textFile("/Users/TomazKastrun/SparkDataFiles/day3_1.txt")
println(rddFromFile.getClass)
println("##Get data Using collect")
rddFromFile.collect().foreach(f=>{
println(f)
})
And you will get the content of the file outputted into console:
And the Scala code returns the actual content of the txt file. Since we have the Web UI at our disposal, let’s dive in to check if the job was executed.
So the Spark job has been triggered and we can further examine the detailed stages of this job running:
This were the first steps in getting around the CLI and Web UI. But tomorrow we will also introduce the GUI for easier work with Scala and Spark.
Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Spark-for-data-engineers
Happy Spark Advent of 2021!
Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.