R News / Spark

Advent of 2021, Day 3 – Getting around CLI and WEB UI in Apache Spark

by tomaztsql · December 4, 2021

This article is originally published at https://tomaztsql.wordpress.com

Series of Apache Spark posts:

Dec 01: What is Apache Spark
Dec 02: Installing Apache Spark

Today, we will get familiarised with Apache Spark CLI and Web UI. Assuming, that you have read the previous blogpost and installed the Spark on your client.

Open your Command line tool and run:

Spark-Shell

And you should get the Spark instance up and running:

Accessing the WEB UI, there is already a hint in this printscreen and you can access the following pages:

http://localhost:4040/	Spark WEB UI on client
http://localhost:4040/storage/	Storage manager
http://localhost:4040/executors/	Node executor infor
http://localhost:4040/jobs/	Spark job Tracker

Spark WEB UI (or Spark shell application UI) looks like this:

Putting Spark to test

In CLI we will type and run a simple Scala script and observe the behaviour in the WEB UI.

We will read the text file into RDD (Resilient Distributed Dataset). Spark engine resides on location:

/usr/local/Cellar/apache-spark/3.2.0 for MacOS and
C:\SparkApp\spark-3.2.0-bin-hadoop3.2 for Windows (based on the blogpost from Dec.1)

But files that we want to use, can be stored anywhere, so let’s create two text files and store them on a desired location. So I will be making a folder on /Users/TomazKastrun/SparkDataFiles and storing two txt files:

Accordingly, we can use this path and get the file content.


   println("##spark read text files from a directory into RDD")

  val rddFromFile = spark.sparkContext.textFile("/Users/TomazKastrun/SparkDataFiles/day3_1.txt")
  println(rddFromFile.getClass)

  println("##Get data Using collect")
  rddFromFile.collect().foreach(f=>{
    println(f)
  })

And you will get the content of the file outputted into console:

And the Scala code returns the actual content of the txt file. Since we have the Web UI at our disposal, let’s dive in to check if the job was executed.

So the Spark job has been triggered and we can further examine the detailed stages of this job running:

This were the first steps in getting around the CLI and Web UI. But tomorrow we will also introduce the GUI for easier work with Scala and Spark.

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Spark-for-data-engineers

Happy Spark Advent of 2021!

Thanks for visiting r-craft.org
This article is originally published at https://tomaztsql.wordpress.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Advent of 2021, Day 3 – Getting around CLI and WEB UI in Apache Spark

You may also like...

Categories

Advent of 2021, Day 3 – Getting around CLI and WEB UI in Apache Spark

Putting Spark to test

You may also like...

Differential Privacy with TensorFlow

R Weekly 2018-41 Video, r2d3, SQL

RStudio and APIs

Categories