Oct 8, 2020

Upgrading vRealize Automation 8.x to 8.2

Arun Nukula
Oct 8, 2020
4 min read

Updated: Nov 24, 2020

Rated NaN out of 5 stars.

Before we initiate an upgrade on vRealize Automation 8.x to 8.2, we have to upgrade vRLCM to 8.2

Now once we have vRLCM ready on 8.2, let’s go ahead and discuss steps taken to upgrade vRA to version 8.2

User validations

Validate Postgres Replication

I’ve ensured there are no Postgres replication issues by executing the below command

seq 0 2 | xargs -r -n 1 -I {} kubectl -n prelude exec postgres-{} -- chpst -u postgres repmgr node status

DEBUG: connecting to: "user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10 dbname=repmgr-db host=postgres-0.postgres.prelude.svc.cluster.local fallback_application_name=repmgr"
Node "postgres-0.postgres.prelude.svc.cluster.local":
        PostgreSQL version: 10.10
        Total data size: 936 MB
        Conninfo: host=postgres-0.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10
        Role: primary
        WAL archiving: enabled
        Archive command: /bin/true
        WALs pending archiving: 0 pending files
        Replication connections: 2 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Replication lag: n/a

DEBUG: connecting to: "user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10 dbname=repmgr-db host=postgres-1.postgres.prelude.svc.cluster.local fallback_application_name=repmgr"
Node "postgres-1.postgres.prelude.svc.cluster.local":
        PostgreSQL version: 10.10
        Total data size: 933 MB
        Conninfo: host=postgres-1.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10
        Role: standby
        WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
        Archive command: /bin/true
        WALs pending archiving: 0 pending files
        Replication connections: 0 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Upstream node: postgres-0.postgres.prelude.svc.cluster.local (ID: 100)
        Replication lag: 0 seconds
        Last received LSN: 2/DA9C5A00
        Last replayed LSN: 2/DA9C5A00

DEBUG: connecting to: "user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10 dbname=repmgr-db host=postgres-2.postgres.prelude.svc.cluster.local fallback_application_name=repmgr"
Node "postgres-2.postgres.prelude.svc.cluster.local":
        PostgreSQL version: 10.10
        Total data size: 933 MB
        Conninfo: host=postgres-2.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/scratch/repmgr-db.cred connect_timeout=10
        Role: standby
        WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective)
        Archive command: /bin/true
        WALs pending archiving: 0 pending files
        Replication connections: 0 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Upstream node: postgres-0.postgres.prelude.svc.cluster.local (ID: 100)
        Replication lag: 0 seconds
        Last received LSN: 2/DA9C5DA8
        Last replayed LSN: 2/DA9C5DA8

My vRA 8.x environment is a distributed instance. hence it consists of 3 vRA nodes.

Each Postgres Instance belongs to one node in the background which is constantly replicated

No LB Changes Needed

I have not made any changes to my Load Balancer which is managing my distributed vRA 8.2 instances.

Validate Pods Health

Ensure All Pods are in Running and Ready state

Trigger Inventory Sync

Trigger Inventory sync before the upgrade

Submitting Upgrade Request and Prechecks

Step:1

Create a snapshot using vRLCM

Browse through the vRA environment and then select UPGRADE

Step:2

This will bring you an upgrade UI where you have to select Repository Type

In my case, I’ve downloaded 8.2 beforehand and

had it ready under my Product Binaries

Step:3

This pane would give you an option to trigger inventory sync if this was not performed before. If this has been done before triggering an upgrade then you may ignore it.

Once Inventory Sync is complete you may proceed to the next step

Step:4

In this step, one has to perform a precheck before performing an upgrade

Once you click on run precheck, you would be presented with a pane where you have to agree that all manual validations have been performed. This is talking about vIDM Hardware resources

Prechecks start

There is a failure. VMware introduced a check to ensure /services/logs has enough space on all the vRealize Automation appliances

This is a mandatory step that should not be missed.

If we click on VIEW under the Recommendations pane we will be presented with a pane that has all the steps to resolve the above problem.

The exception is stating that /dev/sdc which is Hard Disk 3 on the Virtual Appliance does not have enough space

Remember, I’ve taken snapshots for my vRealize Automation Appliances before. So to extend I had to remove to snapshots

Then extend Hard Disk 3 size from 8 GB to 30 GB, adding additional 22 GB of space

In the below screenshot as you can see my /dev/sdc was only 8 GB

Even after performing a resize the new size was not reflecting

Resize was throwing an error

[2020-10-08T04:41:12.050Z] Disk size for disk /dev/sdb has not changed.
[2020-10-08T04:41:12.079Z] Rescanning disk /dev/sdc...
[2020-10-08T04:41:12.222Z] Disk size for disk /dev/sdc has increased from  8589934592 to 32212254720.
[2020-10-08T04:41:12.423Z] Resizing physical volume...
  Physical volume "/dev/sdc" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
[2020-10-08T04:41:12.559Z] Physical volume resized.
[2020-10-08T04:41:12.722Z] Extending logical volume services-logs...
  Size of logical volume logs_vg/services-logs changed from <8.00 GiB (2047 extents) to <30.00 GiB (7679 extents).
  Logical volume logs_vg/services-logs successfully resized.
[2020-10-08T04:41:12.903Z] Logical volume resized.
[2020-10-08T04:41:12.916Z] Resizing file system...
resize2fs 1.43.4 (31-Jan-2017)
open: No such file or directory while opening /dev/mapper/logs_vg-services-logs
[2020-10-08T04:41:13.029Z] ERROR: Error resizing file system.
[2020-10-08T04:41:13.053Z] Rescanning disk /dev/sdd...
[2020-10-08T04:41:13.178Z] Disk size for disk /dev/sdd has not changed.

This was the same instruction present under the View pane. if you hit this exception, then we have to follow Step#3 from KB article 79925

After this step, the new size is reflected and we can now move forward as know that the prechecks will be successful

As stated earlier, after resolving /services-logs partition sizing issue all prechecks validations have been successful

Now when we click on next, we now head into the final phase of submitting the vRA upgrade request

Once you click on submit, the upgrade is initiated

Upgrade

There is nothing a user has to do once an upgrade request is submitted. It takes 2 hours and 35 minutes to complete 2 stages of the upgrade

Stage 1 is called as vRealize Automation Upgrade/Patch/Internal Network Range Change

Stage 2 is called as productupgradeinventoryupdate

Stage 1 in detail

Starts the upgrade
Checks vRealize Automation version
Copies vIDM Admin token to vRA
Initiates vRA upgrade
Upload vRA upgrade pre-hook script
Run vRA upgrade pre-hook script
vRA upgrade status check
Prepare vRA for an upgrade, this goes in a loop for a while till all the nodes are prepared
Proceed to take a snapshot
Extract vRA nodes
Extract vMoid from VM’s for vRA
Take a snapshot of vRA using vMOID

Power On vRA using vMOID
Performs Hostname and IP checks until the appliance is back
Upgrade vRealize Automation is triggered
This goes in a loop with upgrade status check
Waits for initialization after vRA upgrade
Finalization

That’s it for Stage:1, it takes a lot of time, 2 hours and 35 minutes for a 3 node architecture at the 15th and 16th step which is quite obvious

The second stage of productupgradeinventoryupdate takes hardly any milliseconds

Logs to check during an upgrade

These are a few logs which can be monitored or involved during the upgrade

The order of the logs is not the way it’s being upgraded

/var/log/vmware/prelude/upgrade-YYYY-MM-DD-HH-NN-SS.log
/var/log/vmware/prelude/upgrade-report-latest
/var/log/vmware/prelude/upgrade-report-latest.json
/var/log/deploy.log
/opt/vmware/var/log/vami/vami.log
/opt/vmware/var/log/vami/updatecli.log

We will deep-dive from logs perspective in my next blog