Monday, September 22, 2014

Learn Spark with Python

1. Install Spark
cd ~/tools/
tar -zxvf spark-1.1.0.tgz

2. Build spark for hadoop2
cd ~/tools/spark-1.1.0
SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly

3. Install py4j
sudo pip install py4j

4. Modify ~/.bash_profile by adding two lines
export SPARK_HOME=$HOME/tools/spark-1.1.0

5. source the ~/.bash_profile
source ~/.bash_profile

6. Test. Start a python shell and type
import pyspark

No comments:

Post a Comment