Getting-Started¶
This guide details how to access the Robotic Drive Analyzer Notebook on AWS Marketplace once a pod has been started following the guide in Setup RDA. As a first step, a connection to the pod is required. We establish the connection via port-forwarding:
kubectl -n <Namespace> port-forward --address 0.0.0.0 svc/<deployment-name>-zeppelin-svc 8080:8080
The deployment-name can be specified as a parameter during the setup process.
Alternatively, the name of the service can be inferred from kubectl get svc.
Accessing the provided url (here localhost:8080) opens the UI of the Apache Zeppelin server.
Once the link is accessed, we are prompted to enter a password.
The default user and password are set to admin. This can be changed using the ADMIN_PASSWORD configuration of the run_cfn.sh script (see Setup RDA).
Demo Notebooks:¶
We provide two demo notebooks 1. DXC Robotic Drive Analyzer Rosbag, which analyzes a rosbag file, and 2. DXC Robotic Drive Analyzer DataFusion, which analyzes and relates a rosbag and mdf4 file.
Once the Zeppelin UI can be accessed as described above, the notebooks can be opened by following the link with the respective name
on the lower left side of the website. Starting with the first notebook, they include self-contained instructions about how to analyze demo files.
Per default, Spark will be executed on a driver pod only. Due to this setting, the notebooks can be executed out-of-the-box. The change to a distributed
setup is described in 1. DXC Robotic Drive Analyzer Rosbag.
Recommended Readings:¶
The follwing list provides an overview to that helps to familiarize with the concepts, tools and formats used by the provided demo notebooks:
Apache Spark: Apache Spark is a framework that enables distributed execution of workloads on a cluster. By managing the processes between different worker nodes, it can be used to heavily parallelize data selection and code execution on huge amounts of data. Here, we leverage it on a Kubernetes cluster in order to distribute analysis on automotive file formats. Further information can be found on their Homepage and in the courses listed under the overview link provided below.
Apache Zeppelin: Apache Zeppelin offers an interactive environment for code integration in various formats, such as Angular, Java, Markdown, Python, PySpark and Shell. Cells (such as this one) can be independently created, deleted and executed. Additionaly, during the runtime of an interpreter, it keeps the states of defined variables. I.e., if a cell was incorrect or a result should be repeated, in many cases, it is enough to restart it. Due to this reason, Apache Zeppelin is ideal for data exploration and to design data transformation steps. More information can be found on the Apache Zeppelin Homepage.
Rosbag: Rosbag is the data recording format of ROS (Robot Operating System). ROS is a framework of tools that supports the development of robotics applications. Actuators, Sensors and Control Systems are connected via databusses called topics. Data (e.g. Sensor Data) is transmitted over these in the form of messages. By recording the messages of each bus in rosbag files, their analysis and playback (e.g. for simulations) is enabled. More information can be found on the Rosbag Homepage.
MDF4: ASAM MDF4 is an automotive file format created by the ASAM Working Group. The format is for example employed to standardize storage and compression of sensor data and bus data. More information can be found here.
DataFusion: It is difficult to relate messages that were recorded by different sensors, as they might have been recorded at different timestamps. The DataFusion functionality of the Analyzer solves this problem by providing several functions that join Spark dataframes based on their timestamps. An overview of fusion types is given in the User Guide of our documentation page.
Tips & Tricks:¶
- Sometimes when starting a new tunnel, the cache needs to be updated as well, i.e., CTRL+F5 should be used in order to reload the UI.
- The Analyzer will create index files to describe how .bag-files can be read. When there are changes to the underlying rosbag file, it is recommended to delete the index file, such that it will be recreated.
- In case of timeouts when a large file is being processed, try to set file system options via :code:`spark.hadoop.fs.* in the Spark interpreter settings (see Configuration). Tuning parameters are described here.