Scala introduction and cheatsheet
This post is part of my preparation series for the Cloudera CCA175 exam, “Certified Spark and Hadoop Developer”. It is intentionally concise, to serve me as a cheat sheet.
You need familiarity with Python as well as Scala for the Cloudera exam. I have never coded in Scala before, so I am taking this course and preparing a brief Scala summary in this post.
Fortunately, the spark-shell
is very helpful in the exam, and you don’t need to compile any .jar files.
Scala Basics
// Values are immutable (can't be changed once defined):
val hello: String = "Hello!"
println(hello)
// Variables are mutable:
var good_morrow: String = hello
good_morrow = hello + " Good morning!"
// Variable types:
val myInt: Int = 1
val myBool: Boolean = true
val myLetter: Char = 'a'
val pi: Float = 3.14159265f
val pi: Double = 3.1415926535
val myByte: Byte = 127
val largeInt: Long = 123451234512
// println() can do string interpolation:
println(f"Pi is approx. $pi%.3f")
println(s"$hello Good day! $good_morrow")
println(s"Three plus four equals ${3+4}")
// Expressions get evaluated and the last statement is returned:
println({val x=10; x + 20}) // this will print out 30.
Strings
val s = List("I see a little silhouetto of a man",
"scaramouche, scaramouche, will you do the fandango")
s.map(_.startsWith("scar"))
s.filter(_.contains("little"))
s.flatMap(_.split(" "))
s.flatMap(_.split(" ")).filter(_.length < 4)
Regular expressions
val myString: String = "And I would walk 500 miles"
// create a regex pattern with triple quotes and an appended '.r'
val pattern = """.* ([\d]+).*""".r // this matches the first number in a string
val pattern(resultString) = myString // Scala's very weird syntax to apply a regular expression
val result = resultString.toInt // .toFloat etc. all exist too, of course
Boolean logic
val certainly = 1 < 2
val unlikely = 4 < 0
val nope = certainly & unlikely
val nope2 = certainly && unlikely // same result, but this version bails early, i.e. does not evaluate 'unlikely'
val password = "dootdoot"
val entry = "123456"
val match: Boolean = entry == password
Flow control
// If/else
if (1<3) println("Yes!") else println("No :(")
if (1 < 3){
println("Yes!")
} else {
println("No :(")
}
// Matching (switch/case)
val n = 2
n match {
case 1 => println("One")
case 2 => println("Two")
case 3 => println("Three")
case _ => println("Default output")
}
// For loops
for (x <- 1 to 4) { // The '<-' is a range operator in Scala
val squared = x*x
println(squared)
}
// While loops
var x = 100
while(x >= 0){
println(s"$x bottles of beer on the wall.")
x -= 1
}
do {
println(x)
x += 1
} while (x <= 10)
Functions
// Definition:
def isEven(x: Int) : Boolean = {
x%2 == 0
}
println(isEven(15))
// void return type:
def print_something(text: String) = {
println("### " + text + " ###")
}
// Functions as parameters:
def squareIt(x: Int) : Int {
x*x
}
def transformInt(x: Int, fct: Int => Int) : Int {
fct(x)
}
val result = transformInt(2, squareIt)
// Lambda functions (anonymous functions)
transformInt(15, (x: Int) => x*x*x) // full notation
transformInt(15, x => x*x*x) // type inferred
transformInt(15, scala.math.pow(_, 3)) // shortcut via placeholder
transformInt(12, x => {val y=x+3; y*y}) // this implements (x+3)^2
Data structures
// Tuples: Much like in Python, a list of immutables:
val myTuple = ("Good", "morning", "Sir", 3.14159)
// Weird indexing syntax:
println(myTuple._1) // index is one-based
// Key/value pair
val myKV = "house" -> "Haus"
println(myKV._2) // "Haus"
// Lists: Elements must all be of same type.
val inventory = List("Apples", "Oranges", "Bananas")
println(inventory(0)) // Lists have *zero based indexing*
println(inventory.head) // first element
println(inventory.tail) // all but first element
for (item <- inventory){ // Range operator '<-' for lists
println(item)
}
Working with lists
// Mapping
val myList = List("I", "see", "a", "little", "silhouetto", "of", "a", "man")
val reversedStrings = myList.map( (x: String) => {x.reverse})
for (str <- reversedStrings) {println(str)}
// Reducing
val intList = List(1, 2, 3, 4, 5)
val sum = intList.reduce( (x: Int, y:Int) => x+y)
println(sum)
intList.reduce(_ + _) // Shortcut
// Filtering
intList.filter( (x:Int) => x != 3)
intList.filter( _ != 3) // A shortcut: Wildcard syntax
// Iterating
intList.foreach(x => println(x))
intList.foreach(println) // same
// Concatenating lists
val newList = List(6, 7, 8)
val bigList = intList ++ newList
// Some list methods
bigList.reverse
bigList.sorted
bigList.distinct
bigList.max
bigList.sum
bigList.contains(3)
Maps
// Like dictionaries in Python
val EngGer = Map("House" -> "Haus", "Bench" -> "Bank", "Bank" -> "Bank")
println(EngGer("Bench"))
println(EngGer.contains("Tree"))
val TreeInGerman = util.Try(EngGer("Tree")) getOrElse "Unknown"
// Adding entries:
EngGer += ("Car" -> "Auto")