Setup Spark in Ubuntu Server
Install Java (OpenJDK)
Spark requires Java.
sudo apt update
sudo apt install openjdk-21-jdk -y
Verify:
java -version
Install Python & pip (if not installed)
sudo apt install python3 python3-pip python3-venv -y
Install Spark
Download Spark binary (pre-built for Hadoop 3)
cd /opt
sudo wget https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3.tgz
sudo tar -xzf spark-4.0.0-bin-hadoop3.tgz
sudo mv spark-4.0.0-bin-hadoop3 spark
sudo rm spark-4.0.0-bin-hadoop3.tgz
Now Spark is in /opt/spark
Set Environment Variables
Edit your shell profile:
nano ~/.bashrc
Add at the end:
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64
Apply changes:
source ~/.bashrc
Test Spark
spark-shell
You should get a Spark interactive Scala shell. Exit with :quit.
For PySpark:
pyspark
Install MySQL JDBC Driver
To allow Spark to write to MySQL:
sudo mkdir -p /opt/spark/jars
wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/9.3.0/mysql-connector-j-9.3.0.jar -P /opt/spark/jars/