Setup RDA

This page describes how to setup the Robotic Drive Analyzer demo provided on AWS marketplace.

Prerequisites

The demo is deployed in an existing Kubernetes cluster. The following prerequisites should be fulfilled:

  • Unix/WSL should be used to run the commands below.

  • The aws cli should be installed.

  • Kubectl should be installed.

  • An AWS EKS cluster is available and an OIDC provider is configured (see OIDC Providers). A guide for setting up a new EKS cluster can be found here. A simple two node cluster can be created using eksctl create cluster --name <cluster-name> --region <aws-region> --with-oidc. The eksctl utility can be installed with

    curl --silent --location \
    "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \
    | tar xz -C /tmp && sudo mv /tmp/eksctl /usr/local/bin
    

    Afterwards, the cluster can be deleted using eksctl delete cluster --name <cluster-name> --region <aws-region>. Note that, the user who is deploying the Robotic Drive Analyzer, should have permissions to add IAM roles and to add these roles as cluster administrators (to the aws-auth configmap). When an EKS cluster is created by the above command, the user should automatically have this permission.

  • An existing namespace in the EKS cluster should be provided. A new namespace can be created with kubectl create namespace <name>.

  • (Optional) If the demo should be used in a larger distributed setup, the EKS cluster should have auto-scaling configured (see EKS autoscaling).

  • (Optional) A writeable S3 bucket should be available, when the distributed features of the demo are used.

Setup

To deploy the demo, two files are required:

  • run_cfn.sh - A convenience script, which will automatically invoke necessary cloudformation templates from S3 and spin up the instance of the Analyzer on a cluster (the next section details, which resources are being created). This script should not be modified.
  • analyzer.conf - A configuration file with details of the account and the cluster to deploy to. The parameters in this file should be adapted as required.

In order to invoke the script, the configuration file needs to be specified as the first parameter (e.g. run_cfn.sh analyzer.conf).

The following table lists the available configuaration paramters. Required parameters must be set by the user while optional parameters can be left at default. Note that, while optional, we recommend to change the default parameter ADMIN_PASSWORD from admin.

Parameter Default Required Description
AWS_REGION   Yes The AWS region of the EKS cluster to deploy to.
AWS_ACCOUNT_ID   Yes The AWS account ID of an AWS account that is subscribed to the Robotic Drive Analyzer on AWS marketplace.
CLUSTER_NAME   Yes The name of an EKS cluster the Robotic Drive Analyzer is deployed to.
NAMESPACE   Yes The name of an existing namespace on the EKS cluster the Robotic Drive Analyzer is deployed to.
STACK_NAME DXC-Robotic-Drive-Analyzer-Demo No A unique name a cloudformation stack that is created furing deployment.
IAM_ROLE_NAME dxc-robotic-drive-analyzer-awsqs-role No A unique name for an IAM role that is created during deployment (for AWSQS type activations).
SERVICE_ACCOUNT_NAME dxc-robotic-drive-analyzer-serviceaccount No A unique name for a kubernetes service acount that is created for the deployment.
DEPLOYMENT_NAME dxc-robotic-drive-analyzer No A unique name for a kubernetes deployment.
ADMIN_PASSWORD admin No A password that is used to login to the Apache Zeppelin UI.
PYTHON_PACKAGES   No A comma separated list of python packages to install at runtime. E.g. “numpy,pillow==6.0.0”.

Executing the script creates a server pod in the specified namespace. Guidance for accessing the deployed application is provided in Getting-Started. The stack can be deleted via aws cloudformation delete-stack --stack-name <stack name>.

Resources

The demo creates the follow resources:

  • If the tool eksctl is not available, it will be installed to /usr/local/bin.
  • IAM Execution Role: An IAM role used by AWSQS extensions to create resources in EKS. The role has the permissions recommended on the documentation of AWSQS Resource. Further, the role is added as EKS admin by using the eksctl installed n the previous step. I.e., the role is added to the aws-auth configmap. The role is necessary, as AWSQS only supports roles and not users.
  • Type activations AWSQS::Kubernetes::Resource and AWSQS::Kubernetes::Resource. These are third party resources for deployment on Kubernetes clusters. As described before, they require the previously created IAM role. Further, they are configured to log to Cloudwatch under awsqs-helm and awsqs-resource.
  • A nested stack is created from the first one:
    • A lambda function + custom resource to acquire the OIDC of the EKS cluster by name
    • IAM Service Account Role: An IAM role that is used as IRSA role for a service account in Kubernetes. The Role has permissions for the AWS marketplace metering API to track usage. Also, it allows S3 full access in order to able RDA to read from s3 using the s3a protocol
    • Kubernetes Service Account: A service account that is used for pod creation and to allow s3 access in pods
    • Edit-Role: The service account is assigned with an edit role at namespace level
    • Helm deployment: The service account is used to deploy the RDA helm charts

The progress of the stack deployment can be checked on the Cloudformation Console.

Retained Resources

The following resources need to be manually removed, if necessary:

  • Entry of the IAM role in the aws-auth configmap
  • Installation of eksctl

Troubleshooting

The user executing the script has no admin permissions on the EKS cluster (will throw an error on command line):

2021-11-24 15:42:23 []  eksctl version 0.74.0
2021-11-24 15:42:23 []  using region eu-central-1
Error: getting auth ConfigMap: Unauthorized

The stack failed and cannot be deleted:

This can have multiple causes. For example, it occurs, when the AWSQS resource that creates the service account in the EKS cluster fails because of missing permissions (their execution role was not added to the cluster admins). Try to delete the stacks on the cloudformation console, if necessary, retain resources (e.g. the non-existing service account has to be retained). If the nested stack cannot be deleted, either wait, until a second, working deployment is available, or manually activate the AWSQS third party extensions to enable the deletion.

The stack failed with ‘internal error’:

This means that the AWSQS resource failed to create resources in the EKS cluster. Please check that:

  • the clustername and existing namespace are provided correctly in the analyzer.conf configuration file
  • the executing account has permissions to edit the cluster’s aws-auth configmap (kubectl edit configmap aws-auth -n kube-system)
  • the clusternodes are in ‘READY’ state

Debugging

Accessing cfn-stack logs in AWS Cloudwatch:

  • Logs for the AWSQS resources can be found in the log-groups awsqs-resource and awsqs-helm
  • Logs regarding the cluster detection via OIDC can be found including the tag OIDCLambda

Cloudformation logs:

  • The deployment process can be viewed on the ‘Events’ page of the Cloudformation console for the stack with the given stack name, as well as the nested stack that is created with it
  • The activation of the AWSQS extensions can be viewed on the Cloudformation console under ‘Registry’ > ‘Activated extensions’ > ‘Activated third-party’

Debugging the EKS pods:

The logs of the pods on Kubernetes can be viewed using the kubectl logs -n <namespace> <pod-name>. Further, pods can be described with kubectl describe pod -n <namespace> <pod-name>. Events in the namespace can be displayed with kubectl get events --sort-by=.metadata.creationTimestamp.

Contact

RD_analyzer_support@dxc.com


© Copyright 2022, DXC Technology