My Python Cheatsheet
Lists
>>> x = [2, 4, 8, 16, 32, 64, 128, 256]
>>> x[3:]
[16, 32, 64, 128, 256]
>>> x[1:6:2]
[4, 16, 64]
>>> x.append(512)
>>> x
[2, 4, 8, 16, 32, 64, 128, 256, 512]
Dictionaries
>>> dict = {'car': 'Auto', 'boat': 'Boot', 'boot': 'Stiefel'}
>>> dict
{'car': 'Auto', 'boat': 'Boot', 'boot': 'Stiefel'}
>>> dict.items()
dict_items([('car', 'Auto'), ('boat', 'Boot'), ('boot', 'Stiefel')])
>>> dict[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 1
>>> dict['car']
'Auto'
>>> dict.keys()
dict_keys(['car', 'boat', 'boot'])
>>> dict.values()
dict_values(['Auto', 'Boot', 'Stiefel'])
>>> 'car' in dict
True
>>> 'Auto' in dict
False
Modules
- Packages (e.g. pyspark) contain modules (e.g. pyspark.sql), which in turn define classes, functions, etc. (e.g. pyspark.sql.HiveContext(SparkContext)).
- Load a package or a module by issuing
import numpy
. You can then access its functions bynumpy.array([2,4,6])
. If you want to type less, you can importnumpy
asnp
, which allows you to callnp.array([2,4,6])
. - You can hand-pick single functions with
from numpy import array
and can then directly callarray([2,4,6])
, but this is discouraged as it could lead to namespace clashes. - A possibility (although not recommended) is to use
from numpy import *
. This imports all definitions in numpy directly. Again, not recommended.
Functional programming
The following two versions are equivalent:
# Version 1: Anonymous function
rdd.map(lambda x: x*x)
# Version 2: Named function
def squareIt(x):
return x*x
rdd.map(squareIt)