Setup Spark in Ubuntu Server

Install Java (OpenJDK)

Spark requires Java.

sudo apt update
sudo apt install openjdk-21-jdk -y

Verify:

java -version

Install Python & pip (if not installed)

sudo apt install python3 python3-pip python3-venv -y

Install Spark

Download Spark binary (pre-built for Hadoop 3)

cd /opt
sudo wget https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3.tgz
sudo tar -xzf spark-4.0.0-bin-hadoop3.tgz
sudo mv spark-4.0.0-bin-hadoop3 spark
sudo rm spark-4.0.0-bin-hadoop3.tgz

Now Spark is in /opt/spark

Set Environment Variables

Edit your shell profile:

nano ~/.bashrc

Add at the end:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64

Apply changes:

source ~/.bashrc

Test Spark

spark-shell

You should get a Spark interactive Scala shell. Exit with :quit.

For PySpark:

pyspark

Install MySQL JDBC Driver

To allow Spark to write to MySQL:

sudo mkdir -p /opt/spark/jars
wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/9.3.0/mysql-connector-j-9.3.0.jar -P /opt/spark/jars/