Alpha Epsilon | The Alpha Epsilon Blog

All things data science

Welcome to my blog! Here, I will publish tutorials related to data science, which will also serve as convenient cheatsheets and references for myself. I tend to learn best when simultaneously organizing and summarizing the material for presentation purposes, so this blog serves me as a learning vehicle, as well.

My projects

This list contains "mother" posts for my larger undertakings, each spanning multiple blog posts.

Cloudera CCA175 exam

Browse Categories

All posts

All Posts

All posts ordered by newest

Date	Title	Category	Tags
10 October 2020	Starting a new Python Package	Programming	Python
27 August 2019	Interpretable Machine Learning / Explainable Artificial Intelligence	DataScience	CaseStudy, MachineLearning
22 July 2018	An auto-scaling Shiny Server on AWS	R	AWS, Shiny
21 July 2018	Restoring a Wordpress site from a manual backup	Misc	Wordpress, AWS
13 July 2018	A Data Science Case Study With R and mlr	R	DataScience, CaseStudy
04 April 2018	An LSTM-based Startup Name Generator	Python	DeepLearning, NLP
22 March 2018	How to set up a Jupyter Notebook server for Deep Learning on AWS	DataScience	MachineLearning, Python, DeepLearning, AWS
06 December 2017	Personal Extreme Programming	Programming	Agile
15 November 2017	Find and delete unused R functions	R	codestyle
09 November 2017	My git cheatsheet	Programming	Git
08 November 2017	Why become an Open Source Developer?	Programming	Python, OpenSource
17 October 2017	What is Data Science?	DataScience
08 October 2017	The differences when using Spark with Scala	CCA175	Spark, Scala
07 October 2017	Spark SQL with Python	CCA175	Spark, SQL
13 September 2017	Filter, aggregate, join, rank, and sort datasets (Spark/Python)	CCA175	Python, Spark
07 September 2017	Reading and writing data with Spark and Python	CCA175	Spark, Python
08 August 2017	A basic Spark/Python script	CCA175	Spark
05 August 2017	The LaTeX for WordPress plugin and PHP 7.0 / 7.1	Misc	LaTeX, MathJaX, Wordpress
30 July 2017	Scala introduction and cheatsheet	CCA175	Scala, Spark, Cheatsheet
26 July 2017	Disabling IPv6 on Arch Linux and NetworkManager	Linux	IPv6, VPN
25 July 2017	Command-line options for spark-submit	CCA175	Spark
23 July 2017	My Python Cheatsheet	Python	Cheatsheet
22 July 2017	How to design a Hadoop architecture	BigData	Architecture, Hadoop
21 July 2017	Using Sqoop to move data between HDFS and MySQL	CCA175	MySQL, SQL, Sqoop
21 July 2017	Spark Streaming	BigData	Hadoop, Spark, Streaming
21 July 2017	Load data into and out of HDFS using the Hadoop File System commands	CCA175	Hadoop
18 July 2017	Getting streaming data with Kafka and Flume	BigData	Flume, Hadoop, Kafka, Streaming
16 July 2017	Preparing for the Cloudera Exam CCA175: Spark and Hadoop Developer	CCA175	Hadoop, Spark, Cloudera
11 July 2017	MongoDB	BigData	MongoDB, NoSQL
09 July 2017	NoSQL: non-relational databases	BigData	Hadoop, NoSQL, SQL
09 July 2017	Cassandra	BigData	NoSQL
08 July 2017	Hive	BigData	Hadoop
04 July 2017	Spark	BigData	Hadoop, Spark
30 June 2017	The Hadoop core: HDFS and MapReduce	BigData	Hadoop, MapReduce
29 June 2017	The Hadoop ecosystem: An overview	BigData	Hadoop
28 June 2017	Connect R with Access2007 via RODBC	R	ODBC, Access
17 June 2017	Dear Recruiters: Please send e-mails	Work	Freelancing
13 June 2017	Sharing confidential data with nginx and htaccess	Linux	VPS
11 June 2017	Administrating your own git server	Linux
12 April 2017	lFTP usage	Linux	FTP, Linux
12 April 2017	SSH and scp	Linux
16 December 2016	diff tips and tricks	Linux	Diff, Linux
14 December 2016	grep - Tips and Tricks	Linux
14 August 2015	Cluster computing on the Sun Grid Engine	Programming	Cluster computing, Sun Grid Engine
02 May 2014	Awk tips and tricks and Bioinformatics applications	Programming	awk
08 January 2014	Data analysis Hadley Wickham style	R
12 October 2013	Arch Linux on a MacBook Pro 9.2	Linux

All posts by category

Posts in Linux

Date	Title	Tags
26 July 2017	Disabling IPv6 on Arch Linux and NetworkManager	IPv6, VPN
13 June 2017	Sharing confidential data with nginx and htaccess	VPS
11 June 2017	Administrating your own git server
12 April 2017	lFTP usage	FTP, Linux
12 April 2017	SSH and scp
16 December 2016	diff tips and tricks	Diff, Linux
14 December 2016	grep - Tips and Tricks
12 October 2013	Arch Linux on a MacBook Pro 9.2

Posts in R

Date	Title	Tags
22 July 2018	An auto-scaling Shiny Server on AWS	AWS, Shiny
13 July 2018	A Data Science Case Study With R and mlr	DataScience, CaseStudy
15 November 2017	Find and delete unused R functions	codestyle
28 June 2017	Connect R with Access2007 via RODBC	ODBC, Access
08 January 2014	Data analysis Hadley Wickham style

Posts in Programming

Date	Title	Tags
10 October 2020	Starting a new Python Package	Python
06 December 2017	Personal Extreme Programming	Agile
09 November 2017	My git cheatsheet	Git
08 November 2017	Why become an Open Source Developer?	Python, OpenSource
14 August 2015	Cluster computing on the Sun Grid Engine	Cluster computing, Sun Grid Engine
02 May 2014	Awk tips and tricks and Bioinformatics applications	awk

Posts in Work

Date	Title	Tags
17 June 2017	Dear Recruiters: Please send e-mails	Freelancing

Posts in BigData

Date	Title	Tags
22 July 2017	How to design a Hadoop architecture	Architecture, Hadoop
21 July 2017	Spark Streaming	Hadoop, Spark, Streaming
18 July 2017	Getting streaming data with Kafka and Flume	Flume, Hadoop, Kafka, Streaming
11 July 2017	MongoDB	MongoDB, NoSQL
09 July 2017	NoSQL: non-relational databases	Hadoop, NoSQL, SQL
09 July 2017	Cassandra	NoSQL
08 July 2017	Hive	Hadoop
04 July 2017	Spark	Hadoop, Spark
30 June 2017	The Hadoop core: HDFS and MapReduce	Hadoop, MapReduce
29 June 2017	The Hadoop ecosystem: An overview	Hadoop

Posts in CCA175

Date	Title	Tags
08 October 2017	The differences when using Spark with Scala	Spark, Scala
07 October 2017	Spark SQL with Python	Spark, SQL
13 September 2017	Filter, aggregate, join, rank, and sort datasets (Spark/Python)	Python, Spark
07 September 2017	Reading and writing data with Spark and Python	Spark, Python
08 August 2017	A basic Spark/Python script	Spark
30 July 2017	Scala introduction and cheatsheet	Scala, Spark, Cheatsheet
25 July 2017	Command-line options for spark-submit	Spark
21 July 2017	Using Sqoop to move data between HDFS and MySQL	MySQL, SQL, Sqoop
21 July 2017	Load data into and out of HDFS using the Hadoop File System commands	Hadoop
16 July 2017	Preparing for the Cloudera Exam CCA175: Spark and Hadoop Developer	Hadoop, Spark, Cloudera

Posts in Python

Date	Title	Tags
04 April 2018	An LSTM-based Startup Name Generator	DeepLearning, NLP
23 July 2017	My Python Cheatsheet	Cheatsheet

Posts in Misc

Date	Title	Tags
21 July 2018	Restoring a Wordpress site from a manual backup	Wordpress, AWS
05 August 2017	The LaTeX for WordPress plugin and PHP 7.0 / 7.1	LaTeX, MathJaX, Wordpress

Posts in DataScience

Date	Title	Tags
27 August 2019	Interpretable Machine Learning / Explainable Artificial Intelligence	CaseStudy, MachineLearning
22 March 2018	How to set up a Jupyter Notebook server for Deep Learning on AWS	MachineLearning, Python, DeepLearning, AWS
17 October 2017	What is Data Science?

All things data science

My projects

Browse Categories

All posts

All posts by category

Newest blog posts