Configuration

Apache Zeppelin supports different interpreters in its interactive notebooks (see Getting-Started). Each interpreter can be configured in the Zeppelin interpeter settings. They can be accessed by selecting ‘Interpreters’ in the drop-down menu that appears by clicking on the username in the top-right corner of the Zeppelin UI.

Spark Interpreter Configuration

The following table details selected settings of the Spark interpreter. Further configuration values can be found in the documentation of Apache Spark.

Parameter Description Env Var Default Value
spark.master Spark Master URI for Kubernetes SPARK_MASTER local
spark.kubernetes.namespace Kubernetes Namespace SPARK_K8S_NAMESPACE default
spark.executor.instances Number of Spark executor pods SPARK_EXECUTOR_INSTANCES 1
spark.kubernetes.container.image Image for Spark executor pods SPARK_IMAGE 709825985650.dkr.ecr.us-east-1.amazonaws.com/dxc-technology/robodrive:rda_1.14_java_1.8_spark_3.1.2_python_3.7

Per default, the spark.master property is set to local. This allows out-of-the-box execution of the demo notebooks once they have been started up as detailed in the Setup RDA. In order to run the notebook in distributed setups, this property should be changed to k8s://https://kubernetes.default.svc.cluster.local:443. In this case, spark.executor.instances can be increased to increase the number of pods that are executed in parallel. Information on tuning S3 access via ‘spark.hadoop.fs.*’-settings can be found here.

Contact

RD_analyzer_support@dxc.com


© Copyright 2022, DXC Technology