Configuration¶
Apache Zeppelin supports different interpreters in its interactive notebooks (see Getting-Started). Each interpreter can be configured in the Zeppelin interpeter settings. They can be accessed by selecting ‘Interpreters’ in the drop-down menu that appears by clicking on the username in the top-right corner of the Zeppelin UI.
Spark Interpreter Configuration¶
The following table details selected settings of the Spark interpreter. Further configuration values can be found in the documentation of Apache Spark.
| Parameter | Description | Env Var | Default Value |
|---|---|---|---|
| spark.master | Spark Master URI for Kubernetes | SPARK_MASTER | local |
| spark.kubernetes.namespace | Kubernetes Namespace | SPARK_K8S_NAMESPACE | default |
| spark.executor.instances | Number of Spark executor pods | SPARK_EXECUTOR_INSTANCES | 1 |
| spark.kubernetes.container.image | Image for Spark executor pods | SPARK_IMAGE | 709825985650.dkr.ecr.us-east-1.amazonaws.com/dxc-technology/robodrive:rda_1.14_java_1.8_spark_3.1.2_python_3.7 |
Per default, the spark.master property is set to local.
This allows out-of-the-box execution of the demo notebooks once they have been started up as detailed in the Setup RDA.
In order to run the notebook in distributed setups, this property should be changed to
k8s://https://kubernetes.default.svc.cluster.local:443.
In this case, spark.executor.instances can be increased to increase the number of pods that are executed in parallel.
Information on tuning S3 access via ‘spark.hadoop.fs.*’-settings can be found here.