Kick Off Spark
This article is originally published at https://statcompute.wordpress.com
My first Spark section:
scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> val sdf = spark.read.option("header", true).csv("Documents/spark/credit_count.txt") sdf: org.apache.spark.sql.DataFrame = [CARDHLDR: string, DEFAULT: string ... 12 more fields] scala> sdf.printSchema() root |-- CARDHLDR: string (nullable = true) |-- DEFAULT: string (nullable = true) |-- AGE: string (nullable = true) |-- ACADMOS: string (nullable = true) |-- ADEPCNT: string (nullable = true) |-- MAJORDRG: string (nullable = true) |-- MINORDRG: string (nullable = true) |-- OWNRENT: string (nullable = true) |-- INCOME: string (nullable = true) |-- SELFEMPL: string (nullable = true) |-- INCPER: string (nullable = true) |-- EXP_INC: string (nullable = true) |-- SPENDING: string (nullable = true) |-- LOGSPEND : string (nullable = true) scala> sdf.createOrReplaceTempView("tmp1") scala> spark.sql("select count(*) as obs from tmp1").show() +-----+ | obs| +-----+ |13444| +-----+
Pyspark section doing the same thing:
In [1]: import pyspark as spark In [2]: sc = spark.SQLContext(spark.SparkContext()) In [3]: sdf = sc.read.csv("Documents/spark/credit_count.txt", header = True) In [4]: sdf.printSchema() root |-- CARDHLDR: string (nullable = true) |-- DEFAULT: string (nullable = true) |-- AGE: string (nullable = true) |-- ACADMOS: string (nullable = true) |-- ADEPCNT: string (nullable = true) |-- MAJORDRG: string (nullable = true) |-- MINORDRG: string (nullable = true) |-- OWNRENT: string (nullable = true) |-- INCOME: string (nullable = true) |-- SELFEMPL: string (nullable = true) |-- INCPER: string (nullable = true) |-- EXP_INC: string (nullable = true) |-- SPENDING: string (nullable = true) |-- LOGSPEND : string (nullable = true) In [5]: sdf.createOrReplaceTempView("tmp1") In [6]: sc.sql("select count(*) as obs from tmp1").show() +-----+ | obs| +-----+ |13444| +-----+
Thanks for visiting r-craft.org
This article is originally published at https://statcompute.wordpress.com
Please visit source website for post related comments.