Setup RDA¶
This page describes how to setup the Robotic Drive Analyzer demo provided on AWS marketplace.
Prerequisites¶
The demo is deployed in an existing Kubernetes cluster. The following prerequisites should be fulfilled:
Unix/WSL should be used to run the commands below.
The aws cli should be installed.
Kubectl should be installed.
An AWS EKS cluster is available and an OIDC provider is configured (see OIDC Providers). A guide for setting up a new EKS cluster can be found here. A simple two node cluster can be created using
eksctl create cluster --name <cluster-name> --region <aws-region> --with-oidc. The eksctl utility can be installed withcurl --silent --location \ "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \ | tar xz -C /tmp && sudo mv /tmp/eksctl /usr/local/bin
Afterwards, the cluster can be deleted using
eksctl delete cluster --name <cluster-name> --region <aws-region>. Note that, the user who is deploying the Robotic Drive Analyzer, should have permissions to add IAM roles and to add these roles as cluster administrators (to theaws-authconfigmap). When an EKS cluster is created by the above command, the user should automatically have this permission.An existing namespace in the EKS cluster should be provided. A new namespace can be created with
kubectl create namespace <name>.(Optional) If the demo should be used in a larger distributed setup, the EKS cluster should have auto-scaling configured (see EKS autoscaling).
(Optional) A writeable S3 bucket should be available, when the distributed features of the demo are used.
Setup¶
To deploy the demo, two files are required:
- run_cfn.sh - A convenience script, which will automatically invoke necessary cloudformation templates from S3 and spin up the instance of the Analyzer on a cluster (the next section details, which resources are being created). This script should not be modified.
- analyzer.conf - A configuration file with details of the account and the cluster to deploy to. The parameters in this file should be adapted as required.
In order to invoke the script, the configuration file needs to be specified as the first parameter (e.g. run_cfn.sh analyzer.conf).
The following table lists the available configuaration paramters.
Required parameters must be set by the user while optional parameters can be left at default.
Note that, while optional, we recommend to change the default parameter ADMIN_PASSWORD from admin.
| Parameter | Default | Required | Description |
|---|---|---|---|
| AWS_REGION | Yes | The AWS region of the EKS cluster to deploy to. | |
| AWS_ACCOUNT_ID | Yes | The AWS account ID of an AWS account that is subscribed to the Robotic Drive Analyzer on AWS marketplace. | |
| CLUSTER_NAME | Yes | The name of an EKS cluster the Robotic Drive Analyzer is deployed to. | |
| NAMESPACE | Yes | The name of an existing namespace on the EKS cluster the Robotic Drive Analyzer is deployed to. | |
| STACK_NAME | DXC-Robotic-Drive-Analyzer-Demo | No | A unique name a cloudformation stack that is created furing deployment. |
| IAM_ROLE_NAME | dxc-robotic-drive-analyzer-awsqs-role | No | A unique name for an IAM role that is created during deployment (for AWSQS type activations). |
| SERVICE_ACCOUNT_NAME | dxc-robotic-drive-analyzer-serviceaccount | No | A unique name for a kubernetes service acount that is created for the deployment. |
| DEPLOYMENT_NAME | dxc-robotic-drive-analyzer | No | A unique name for a kubernetes deployment. |
| ADMIN_PASSWORD | admin | No | A password that is used to login to the Apache Zeppelin UI. |
| PYTHON_PACKAGES | No | A comma separated list of python packages to install at runtime. E.g. “numpy,pillow==6.0.0”. |
Executing the script creates a server pod in the specified namespace.
Guidance for accessing the deployed application is provided in Getting-Started.
The stack can be deleted via aws cloudformation delete-stack --stack-name <stack name>.
Resources¶
The demo creates the follow resources:
- If the tool
eksctlis not available, it will be installed to/usr/local/bin. - IAM Execution Role: An IAM role used by AWSQS extensions to create resources in EKS. The role has the permissions recommended on the documentation of AWSQS Resource. Further, the role is added as EKS admin by using the eksctl installed n the previous step. I.e., the role is added to the aws-auth configmap. The role is necessary, as AWSQS only supports roles and not users.
- Type activations AWSQS::Kubernetes::Resource and AWSQS::Kubernetes::Resource. These are third party resources for deployment on Kubernetes clusters. As described before, they require the previously created IAM role. Further, they are configured to log to Cloudwatch under awsqs-helm and awsqs-resource.
- A nested stack is created from the first one:
- A lambda function + custom resource to acquire the OIDC of the EKS cluster by name
- IAM Service Account Role: An IAM role that is used as IRSA role for a service account in Kubernetes. The Role has permissions for the AWS marketplace metering API to track usage. Also, it allows S3 full access in order to able RDA to read from s3 using the s3a protocol
- Kubernetes Service Account: A service account that is used for pod creation and to allow s3 access in pods
- Edit-Role: The service account is assigned with an edit role at namespace level
- Helm deployment: The service account is used to deploy the RDA helm charts
The progress of the stack deployment can be checked on the Cloudformation Console.
Retained Resources¶
The following resources need to be manually removed, if necessary:
- Entry of the IAM role in the aws-auth configmap
- Installation of eksctl
Troubleshooting¶
The user executing the script has no admin permissions on the EKS cluster (will throw an error on command line):
2021-11-24 15:42:23 [ℹ] eksctl version 0.74.0
2021-11-24 15:42:23 [ℹ] using region eu-central-1
Error: getting auth ConfigMap: Unauthorized
The stack failed and cannot be deleted:
This can have multiple causes. For example, it occurs, when the AWSQS resource that creates the service account in the EKS cluster fails because of missing permissions (their execution role was not added to the cluster admins). Try to delete the stacks on the cloudformation console, if necessary, retain resources (e.g. the non-existing service account has to be retained). If the nested stack cannot be deleted, either wait, until a second, working deployment is available, or manually activate the AWSQS third party extensions to enable the deletion.
The stack failed with ‘internal error’:
This means that the AWSQS resource failed to create resources in the EKS cluster. Please check that:
- the clustername and existing namespace are provided correctly in the
analyzer.confconfiguration file - the executing account has permissions to edit the cluster’s aws-auth configmap (
kubectl edit configmap aws-auth -n kube-system) - the clusternodes are in ‘READY’ state
Debugging¶
Accessing cfn-stack logs in AWS Cloudwatch:
- Logs for the AWSQS resources can be found in the log-groups awsqs-resource and awsqs-helm
- Logs regarding the cluster detection via OIDC can be found including the tag OIDCLambda
Cloudformation logs:
- The deployment process can be viewed on the ‘Events’ page of the Cloudformation console for the stack with the given stack name, as well as the nested stack that is created with it
- The activation of the AWSQS extensions can be viewed on the Cloudformation console under ‘Registry’ > ‘Activated extensions’ > ‘Activated third-party’
Debugging the EKS pods:
The logs of the pods on Kubernetes can be viewed using the kubectl logs -n <namespace> <pod-name>.
Further, pods can be described with kubectl describe pod -n <namespace> <pod-name>. Events in the namespace can be displayed
with kubectl get events --sort-by=.metadata.creationTimestamp.