Written by Suhas Savkoor
So recently I was working on a colleague's case where the backup verification, both manual and automatic were failing with VMware tools timeout. Now, there were multiple restore points being verified here and only one of them was failing. The VDP admin guide for verification states the below:
"Set the heartbeat timeout interval to its optimal value, depending on the environment. Note that some VMs may take longer to send and receive the VMware Tools heartbeat than others"
On the verification job logs, the following was observed. ">> " Marks the explanation of the event:
2016-07-14T21:52:10.541-02:00 avvcbimage Info <19672>: vmAction powerOnVM()
>> The VM power On function is called
2016-07-14T21:52:10.549-02:00 avvcbimage Info <17787>: Modifying E1000 adapter: Network adapter 1 to not Connect at Power On
>> The network adapter is disconnected on power on to avoid IP conflict
2016-07-14T21:52:17.665-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power On task completed, moref=
>> VM is powered ON
2016-07-14T21:52:17.665-02:00 avvcbimage Info <19674>: vmAction waitforToolRunning()
>> Waiting for heartbeat verification
2016-07-14T21:57:22.177-02:00 avvcbimage Warning <19675>: wait for toolrunning is timed out or cancelled
>> Verification timed out
2016-07-14T21:57:22.178-02:00 avvcbimage Info <19685>: vmAction poweroffVM()
>> power off VM function is called
2016-07-14T21:57:25.225-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power Off task completed, moref=
>> power off completed
2016-07-14T21:57:25.225-02:00 avvcbimage Info <19686>: vmAction deleteVM()
>> delete VM function is called
2016-07-14T21:57:28.295-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' deletion task completed, moref=
>> VM is deleted successfully
2016-07-14T21:57:28.311-02:00 avvcbimage Error <19702>: ABV failed
>> Automatic backup verification failed.
So, there is a parameter that we need to add in the avvcbimageAll.cmd file to increase the heartbeat wait timeout for the appliance.
The steps would be:
1. Open a SSH to the VDP appliance
2. Browse to the following directory
# usr/local/avamarclient/var/avvcbimageAll.cmd
--validate_script_max_in_min=20
service avagent-vmware restart
Post this, the VDP appliance will wait for 20 minutes for VM tools heartbeat to be received before it could time out.
If the tools validation complete before 20 minutes, then it will proceed to the next steps. So setting the timeout to a large value will not have any performance impact on the VMs which are not facing the VM tools timeout during verification.
Simple, yes!