virtuallyPeculiar

Once you setup your Avamar Server, you will proceed to add the VMware vCenter as a client to this. You will provide the vCenter IP/FQDN along with the administrator user credentials and the default https port 443. However, it errors out stating:

Failed to communicate to vCenter. Unable to find valid certification path to the vCenter.

The error is due to Avamar not being able to acknowledge the VMware vCenter certificate warning. To do this, we will have to force the avamar to accept the vCenter certificate.

# vi /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

Locate the vcenter certificate ignore parameter by searching for /cert in the vi editor. You will notice the below line.

Change this value from false to true and save the file.

Change the access to admin mode using sudo su - admin and run the below script to restart MCS.

# mcserver.sh --restart

Post this, you should be able to add this vCenter as a client to avamar successfully

That should be it.

Well, what do we have here? Let's say, we created one backup Job in vSphere Data Protection Appliance, and we added 30 virtual machines to it. Next, we started the backup for this job, and you let it run overnight. Next morning when you login, you see that 29 VMs completed successfully and 1 VM failed for backup.

You login to the SSH of the VDP and browse to /usr/local/avamarclient/var and notice that there are 30 logs for the same backup Job with different IDs which makes no sense. You don't know which backup log this failed VM is contained in.

Here's a sneaky way to get this done.

I have created 3 VMs: Backup-Test, Backup-Test-B and Backup-Test-C
I have a backup Job called Job-A

I initiated a manual backup and all 3 VMs completed successfully. If I go to the backup job logs folder, I see the following:

-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root 15K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.alg
-rw-r----- 1 root root 40K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.log
-rw-r----- 1 root root 18K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel_avtar.log
-rw-r----- 1 root root 383 Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew-xmlstats.log
-rw-r----- 1 root root 15K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.alg
-rw-r----- 1 root root 40K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.log
-rw-r----- 1 root root 19K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew_avtar.log
-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root 15K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.alg
-rw-r----- 1 root root 41K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log
-rw-r----- 1 root root 18K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel_avtar.log

Looking at this, neither you or I have any idea on which log here contains logging for Backup-Test-C.

Step 1 of the sneaky trick:

Run the following command:

# mccli activity show --verbose=true

The output:

0,23000,CLI command completed successfully.

ID Status Error Code Start Time Elapsed End Time Type Progress Bytes New Bytes Client Domain OS Client Release Sched. Start Time Sched. End Time Elapsed Wait Group Plug-In Retention Policy Retention Schedule Dataset WID Server Container

---------------- ------------------------ ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- ------------- ------------------------------------ ------------- -------------- -------------------- -------------------- ------------ ------------------------------------------ -------------------- ---------------- --------- ------------------------ ------------------------------------------ ------------------- ------ ---------

9147942378903409 Completed w/Exception(s) 10020 2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB <0.05% Backup-Test /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118 2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:11s /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A D Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942378902609 Completed 0 2016-11-18 04:33 IST 00h:00m:21s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB <0.05% Backup-Test-C /vc65.happycow.local/VirtualMachines rhel7_64Guest 7.2.180-118 2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:21s /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image Job-A DWMY Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942357262909 Completed w/Exception(s) 10020 2016-11-18 04:29 IST 00h:00m:43s 2016-11-18 04:30 IST On-Demand Backup 64.0 MB <0.05% Backup-Test /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118 2016-11-18 04:29 IST 2016-11-19 04:29 IST 00h:00m:12s /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A DWMY Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423572598 Avamar N/A

9147942378903009Completed 0 2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:34 IST On-Demand Backup 64.0 MB <0.05% Backup-Test-B /vc65.happycow.local/VirtualMachines centosGuest 7.2.180-118 2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:31s /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image Job-A DWMY Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

Again, all the confusing? Well, we need to look at two fields here. The Job ID field and the Work Order ID Field.

The Job ID filed is the one highlighted in Red and the Work Order ID is the one highlighted in Orange.

The Work Order ID will match the first Name-ID in the log directory, but still this will not be helpful if there are too many VMs in the same backup Job as they will all have the same Work Order ID.

Step 2 of the sneaky trick:

We will use the Job ID to view the logs. The command would be:

# mccli activity get-log --id=<Job-ID> | less

The first thing you will see as the output is:

0,23000,CLI command completed successfully.

Attribute Value

Followed by tons of blank spaces and dashes.

Step 3 of the sneaky trick

The first Event ID of any backup is performing the logging and this event ID is 5008.

So as soon as you run the get-log command, search for this Event ID and you will be directly taken to the start of the client's backup logs. You will see:

2016-11-18T04:33:29.903-05:-30 avvcbimage Info <5008>: Logging to /usr/local/avamarclient/var/Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log

Not only you can view the logs from here, you also have the complete log file name too.

That's it. Sneaky tricks in place.

I recently ran into an issue, where the Backup Job tab was extremely slow in loading the jobs and when I say extremely slow it was taking forever to load the jobs. This was the same with the other tabs as well in the Web Client VDP GUI.

In the axis2.log under /usr/local/avamar/var/mc/server_log the following was logged:

2017-01-16 12:57:35,105 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet - Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuffer.append(Unknown Source)
at java.io.StringWriter.write(Unknown Source)
at com.ctc.wstx.sw.BufferingXmlWriter.flushBuffer(BufferingXmlWriter.java:1103)
at com.ctc.wstx.sw.BufferingXmlWriter.fastWriteRaw(BufferingXmlWriter.java:1114)
at com.ctc.wstx.sw.BufferingXmlWriter.writeStartTagEnd(BufferingXmlWriter.java:743)
at com.ctc.wstx.sw.BaseNsStreamWriter.closeStartElement(BaseNsStreamWriter.java:388)
at com.ctc.wstx.sw.BaseStreamWriter.writeCharacters(BaseStreamWriter.java:446)
at org.apache.axiom.util.stax.wrapper.XMLStreamWriterWrapper.writeCharacters(XMLStreamWriterWrapper.java:100)
at org.apache.axiom.om.impl.MTOMXMLStreamWriter.writeCharacters(MTOMXMLStreamWriter.java:289)
at org.apache.axis2.databinding.utils.writer.MTOMAwareXMLSerializer.writeCharacters(MTOMAwareXMLSerializer.java:139)
at com.avamar.mc.sdk10.EventMoref.serialize(EventMoref.java:155)
at com.avamar.mc.sdk10.EventMoref.serialize(EventMoref.java:78)
at com.avamar.mc.sdk10.ActivityInfo.serialize(ActivityInfo.java:889)
at com.avamar.mc.sdk10.ActivityInfo.serialize(ActivityInfo.java:198)
at com.avamar.mc.sdk10.ArrayOfActivityInfo.serialize(ArrayOfActivityInfo.java:216)
at com.avamar.mc.sdk10.TaskInfo.serialize(TaskInfo.java:630)
at com.avamar.mc.sdk10.DynamicValue.serialize(DynamicValue.java:244)
at com.avamar.mc.sdk10.DynamicValue.serialize(DynamicValue.java:152)
at com.avamar.mc.sdk10.ArrayOfDynamicValues.serialize(ArrayOfDynamicValues.java:216)
at com.avamar.mc.sdk10.ArrayOfDynamicValues.serialize(ArrayOfDynamicValues.java:160)
at com.avamar.mc.sdk10.GetDynamicPropertyResponse.serialize(GetDynamicPropertyResponse.java:165)
at com.avamar.mc.sdk10.GetDynamicPropertyResponse.serialize(GetDynamicPropertyResponse.java:109)
at com.avamar.mc.sdk10.GetDynamicPropertyResponse$1.serialize(GetDynamicPropertyResponse.java:97)
at org.apache.axis2.databinding.ADBDataSource.serialize(ADBDataSource.java:93)
at org.apache.axiom.om.impl.llom.OMSourcedElementImpl.internalSerialize(OMSourcedElementImpl.java:692)
at org.apache.axiom.om.impl.util.OMSerializerUtil.serializeChildren(OMSerializerUtil.java:556)
at org.apache.axiom.om.impl.llom.OMElementImpl.internalSerialize(OMElementImpl.java:874)
at org.apache.axiom.soap.impl.llom.SOAPEnvelopeImpl.internalSerialize(SOAPEnvelopeImpl.java:230)
2017-01-16 12:58:20,038 [1690894503@qtp-1786872722-26] ERROR org.apache.axis2.transport.http.AxisServlet - Java heap space
java.lang.OutOfMemoryError: Java heap space
2017-01-16 12:59:07,070 [486089829@qtp-1786872722-25] ERROR org.apache.axis2.transport.http.AxisServlet - Java heap space
java.lang.OutOfMemoryError: Java heap space

Because of this the backup jobs never used to run and the mccli commands took forever to report the outputs.
The solution was to obviously increase the Java heap memory. The steps would be to:
(Backup the files before editing them)

1. Browse to the following location:

# cd /usr/local/avamar/var/mc/server_data/prefs

2. Make a backup of the mcserver.xml file before editing it. Open the mcserver.xml file using a vi editor

# vi mcserver.xml

3. Locate the following line. <entry key="maxJavaHeap" value="-Xmx1G" /> and change the value from 1G to 2G

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" />

4. Save this file

5. Go to the following location

# cd /usr/local/avamar/lib/

6. Make a copy of mcserver.xml file in this location and open it in vi editor and edit the same parameter in this file too:

Before Edit:
<entry key="maxJavaHeap" value="-Xmx1G" merge="newvalue" />
After Edit:
<entry key="maxJavaHeap" value="-Xmx2G" merge="newvalue" />

7. Save the file

8. Switch to admin mode of VDP sudo su - admin and restart MCS using:

# mcserver.sh --restart

Post this, the GUI should show some relief in terms of loading and you should no longer see the java heap error in axis2.log

Hope this helps.

Earlier, we had seen an issue where the VDP appliance crashes when we backup a virtual machine residing on a vSAN datastore. This was documented in this VMware KB article here.

On VDP 6.1.3 we use VDDK version 6.5, VixDiskLib (6.5) Release build-4241604. The earlier issue in VDDK 6.0 as mentioned in the above KB article was unable to mount a vSAN sparse delta disk on a proxy server or the VDP server (when using internal proxy) when it was residing on a non vSAN datastore.

In VDDK 6.5, the behavior was changed. The issue still exists but rather than crashing the VDP appliance, the VDDK library crashes causing the backups to fail. The workaround as per the same article was to:

1. If internal proxy was being used, then deploy an external proxy and place it on the vSAN datastore.
2. If space usage is not an issue and internal proxy is being used, move the entire VDP appliance to the vSAN datastore.

However, in some cases this did not work and the backups still used to fail, however the task and events in vSphere would show the progress at 92 percent and mccli activity show --active would not show any progress bytes.

I do not have a vSAN storage in my setup, the back-trace I am using is from couple of cases that I worked on and a community discussion thread which can be found here.

If you view the backup failure vimagew.log under /usr/local/avamarclient/var, you will notice the following back-trace:

2017-01-17T04:02:16.101+04:00 avvcbimage FATAL <5889>: Fatal signal 11 in pid 15065
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000006b93b1
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000006ba127
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000006ba44b
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000006ba53e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 0000000000628810
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e56c90850
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e5050193f
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e5051c277
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e5051c75e
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e5052b142
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e5052b678
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e56ecab73
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e56ecb123
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000004284bc
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000004291f9
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00000000004993c4
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 000000000070ac4c
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e56c88806
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] | 00007f2e558cd9bd
2017-01-17T04:02:16.103+04:00 [VcbImageBackupAssistThread] ERROR: <0001> uapp::handlefatal: aborting program pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage FATAL <5890>: handlefatal: Aborting program with code 176, pid=15065, sig=11
2017-01-17T04:02:16.103+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Exit: Unmatched Init calls so far: 1.

To fix this, change the protocol of backup to nbd. The steps would be:

1. SSH into VDP and browse to the below location:

# cd /usr/local/avamarclient/var

2. Edit the file called as avvcbimageAll.cmd using a vi editor

3. Add the below parameter into this file:
--transport=nbd

4. Save the file and restart the avagent service using:

 # service avagent-vmware restart

Now your backups should complete successfully. I will update this thread when a fix is released for this bug.
Hope this helps.

There was an interesting case opened quite a while back which took almost few months to figure out the solution. The issue was, a backup verification job was performed for a set of VMs. The verification job used to complete successfully, however, it used to leave a folder behind with the name of the VM's verification job ID and in that folder was one file which was the vmxf file of that virtual machine.

If you took a look at the verification job logs on the VDP server, you will not see any errors or warnings. The VM will be test restored, powered on, tested for VM tools heartbeat and then powered off and deleted.

So, on the ESXi datastore, the folder left behind was something similar to this name, VDP_VERIFICATION_Windows_clone_2_1485298729541. And in this directory the file left behind was VDP_VERIFICATION_Windows_clone_2.vmxf

If you look at the hostd.log during this process, you will notice the following:

2017-01-24T23:44:04.127Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.nvram'.

2017-01-24T23:44:04.128Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/vmware.log'.

2017-01-24T23:44:04.129Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteFile dele
ted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmsd'.

2017-01-24T23:44:04.129Z info hostd[234C2B70] [Originator@6876 sub=vm:SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFI
CATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmx opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] lck'.

2017-01-24T23:44:04.130Z info hostd[234C2B70] [Originator@6876 sub=Libs opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] SNAPSHOT: SnapshotDeleteVMInt: Couldn't remove directory '/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541'.

2017-01-24T23:44:04.130Z verbose hostd[234C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/57ac84be-efa08410-3cf9-549f3515a09a/VDP_VERIFICATION_Windows_clone_2_1485298729541/VDP_VERIFICATION_Windows_clone_2_1485298729541.vmx opID=3a74dd6f-58-d7b3 user=vpxuser:VMWARE.LOCAL\Administrator] Virtual machine delete completed.

If you notice here, the verification job is removing all the test restored files. However, it says it is unable to remove the VDP Verification directory. At this point, if you run a list command in the datastore of ESXi, you will notice there is a vmxf file present. And because of this VDP is unable to remove the verification folder.

During backup, VDP will only backup vmx, nvram, vmdk and flat files. It will not backup the vmxf file. So if it does not backup the vmxf file, it will neither restore the vmxf file. So the question is, what is this vmxf file? Vmxf file was initially created to contain the virtual machine meta data. Then later on, the tools information was added to this file. So what this means is, if you have a virtual machine with VM tools running, you should see a vmxf file associated with that VM. If there is no tools installed on a VM, then there will be no vmxf file.

So, if you perform a ABV of a VM without any tools, then you will not have this vmxf file left behind. So during the ABV, the VDP will test for VM tools heartbeat. So when the VM is powered On, the vmxf file gets created under the ABV VM folder. But since VDP did not create this file, it will neither delete this file. It will delete all the files that was restored by it, leaving this vmxf file under the verification folder. And the clean up in this case has to be done manually.

There is no much use of vmxf file in your production environment. If you delete this file manually and take a new backup of the VM and verify the restore point, the file is still coming back. This means that something in the "vmx" file is causing the "vmxf" file to appear.

The resolution:

1. Power off the virtual machine

2. If you vi the virtual machine's vmx file, you will see the following entry:

extendedConfigFile = "Windows_clone_2.vmxf"

3. Use the vi editor to hash this entry out.

Before edit:

extendedConfigFile = "Windows_clone_2.vmxf"

After edit:

#extendedConfigFile = "Windows_clone_2.vmxf"

4. Save the file and reload the virtual machine's vmx. To reload, you will have to find the VM ID. To do this, run the below command:

# vim-cmd vmsvc/getallvms

5. Once you get the ID, reload it using:

# vim-cmd vmsvc/reload <vm-id>

6. Remove the vmxf file manually from the virtual machine directory

7. Power On the VM. Wait for the VM to be completely up and the tools status to be Running (Current) Now, if you browse the VM directory, the vmxf is no longer created.

8. Perform another backup for this VM and then perform the verification and this time there will be no stale files left behind.

Note:

If you make any changes to the VM tools, such as reinstall or upgrade, then another entry for vmxf will be added in the virtual machine's vmx file and this entire process has to be re-done.

Hope this helps!

Today, a restore of a virtual machine was failing. Infact, restore of all virtual machines were failing. The data was sent to a data domain, but regardless of what the backend storage is, this issue would still occur.

In the restore logs under /usr/local/avamarclient/var with the keyword MOD-EPOCH-vmimage1.log, the following was seen:

2017-01-26T13:11:53.027+04:00 avvcbimage Info <6688>: Process 13007 (/usr/local/avamarclient/bin/avtar) finished (code 1: Operation not permitted)
2017-01-26T13:11:53.027+04:00 avvcbimage Info <7238>: Stream complete: 0 bytes read from stream
2017-01-26T13:11:53.027+04:00 avvcbimage Warning <6690>: CTL workorder "MOD-1485454308259" non-zero exit status 'code 1: Operation not permitted'
2017-01-26T13:11:53.028+04:00 avvcbimage Error <0000>: [IMG0011] Avtar exited with 'code 1: Operation not permitted'
2017-01-26T13:11:53.028+04:00 avvcbimage Info <19647>: Restore Job Wrap-Up Stats:

The avtar log for the same had this:

2017-01-26T13:11:53.025+04:00 avtar Warning <10786>: Cannot use --verbose with --stream=&1, remove the verbose option flag to stop this error

The restore progress used to fail right at 0 percent.

As the error message stated, we had some additional unwanted entry in the avtar.cmd file. To resolve this:

1. Go to /usr/local/avamarclient/var

2. Open the avtar.cmd file in a vi editor

3. Remove the "--verbose" flag from this and save the file

4. Restart the avagent using:

# service avagent-vmware restart

Post this, the restore should work without issues.

Hope this helps.

In VDP, you can configure SMTP so that the appliance can forward the appliance status, backup job status periodically to the email configured. Once you configure all the settings in Configuration > Email, you will use the Send test email option to check if this configuration is working good. However, once you select this, you might receive the following message:

The troubleshooting to this would be to verify if the SSO, vCenter and VDP time are all synchronized. More about this is discussed in the article here.

Now there are two ways of looking at this issue:

1. If you had the NTP issue mentioned above and you corrected it. But still you are running into the same message over and over again though the time is synced. (This is the issue I worked on)

2. Perhaps you are receiving something else, a different error. But the vdr-server.log stack trace matches below:

2017-01-31 13:07:22,607 ERROR [http-nio-8543-exec-3]-email.EmailBase: On try 1to send the email, recieved a connection error from the external email server:null
2017-01-31 13:07:22,607 ERROR [http-nio-8543-exec-3]-services.EmailConfigServiceWS: Unable to trigger test email
javax.mail.AuthenticationFailedException
at javax.mail.Service.connect(Service.java:306)
at javax.mail.Service.connect(Service.java:156)
at javax.mail.Service.connect(Service.java:105)
at javax.mail.Transport.send0(Transport.java:168)
at javax.mail.Transport.send(Transport.java:98)
at com.emc.vdp2.common.email.EmailBase.sendReport(EmailBase.java:174)
at com.emc.vdp2.common.email.EmailBase.sendReport(EmailBase.java:63)
at com.emc.vdp2.email.EmailSummaryReport.sendTestEmail(EmailSummaryReport.java:129)
at com.emc.vdp2.services.EmailConfigServiceWS.runEmailTest(EmailConfigServiceWS.java:93)
at com.emc.vdp2.rest.ReportService.runEmailTest(ReportService.java:241)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:540)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:715)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at com.emc.vdp2.services.AuthenticationFilter.doFilter(AuthenticationFilter.java:117)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:613)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:421)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1074)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1739)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1698)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Unknown Source)

This is only specific to the error javax.mail.AuthenticationFailedException which was observed in the above log. If you do not see this, then this will probably not help you.

Again two causes for this:

1. If you ran into the same NTP issue, then the credentials are not updated or synced prior and post to the time change

2. You are indeed entering a wrong password.

If the issue is (1), then Edit the SMTP settings and re-enter the password again post which the Test email should work. If the issue is (2) then you will have to get the right password.

Hope this helps.

There might be scenarios where you execute a backup for a virtual machine. It starts successfully, takes a snapshot successfully and completes the backup process, however at the very end, it fails to remove the snapshot for the VM. This would be seen persistently for one or more virtual machine.

At the very end of the backup job log you would see something like:

2017-02-06T11:53:39.552+04:00 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"Connection timed out"
Detail: connect failed in tcp_connect()"

2017-02-06T11:53:39.552+04:00 avvcbimage Error <17773>: Snapshot (snapshot-5656) removal for VM '[Suhas-Store-2] VM01/VM01.vmx' task failed to start

2017-02-06T11:53:39.552+04:00 avvcbimage Info <18649>: Removal of snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83' is not complete, moref 'snapshot-5656'.

2017-02-06T11:53:39.552+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, Problem with the snapshot removal. (wrap-up stage)

If you see there is a Connection time out message once the snapshot remove call is handed down to the virtual machine. For this VDP-ID if you look into the vmware.log, you will notice the following:

2017-02-06T16:13:02.636Z| vmx| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1486397576f70379edb62fb81285abbf68dfadc0bd0758ba83': 55

2017-02-06T16:58:30.826Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:11.235Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2017-02-06T16:59:13.117Z| vmx| I125: GuestRpcSendTimedOut: message to toolbox-dnd timed out.

We see there is a lot of timeout occurring from the VMtools. And at the same time if you notice the datastore where this VM resides, you will see that it is on a NFS storage:

2017-02-06T11:13:40.355+04:00 avvcbimage Info <0000>: checking datastore type for special processing, checking for type VVOL, actual type = NFS

And if you see the mode of backup you see it is a hot-add mode of backup:

2017-02-06T11:13:40.337+04:00 avvcbimage Info <9675>: Connected with hotadd transport to virtual disk [Suhas-Store-2] VM01/VM01.vmdk

Now, when the VM is residing on NFSv3 there are issues with timeout due to NFS lock during snapshot consolidation. This KB explains the cause of it. The workaround here is to disable hot-add mode of backup and switch to NBD or NBDSSL.

1. SSH into the VDP appliance and browse to the below directory:

# cd /usr/local/avamarclient/var

2. Edit the avvcbimageAll.cmd using a vi editor and enter the below line:

--transport=nbd

3. Save the file and restart avagent using:

# service avagent-vmware restart

Post this the backup should complete successfully. Hope this helps.

File Level Recovery client or FLR is used when you want to restore certain files rather than an entire VM. To understand completely on login mode of FLR and how to perform a FLR you can refer this link here

Now, you might come across an issue where you are able to mount the required backup. However, you are unable to expand that mount point and view the drives and the folders. All you will see is the name of the VM image and an arrow that does not expand.

Occasionally, you might see an error stating:

"Failed to get disks: Unable to browse as proxies are unavailable"

And you might also have persistent or intermittent login issues to FLR client (either basic login or advanced) with the below message.

There are few explanations for this, but the base cause is unresponsive proxy.

Case 1: When internal proxy is being used

By default VDP will be having an internal proxy which caters to the backup and restore tasks. If the internal proxy is down, then FLR authentication will fail persistently. You will not even have an opportunity to mount the backup.
Login in to the https://vdp-ip:8543/vdp-configure page and verify if the internal proxy is in a healthy state.

Restart the proxy service using the below command:

# service avagent-vmware restart

If this does not provide relief, disable and re-enable the internal proxy from the vdp-configure page.

Case 2: When multiple external proxies are used.

There might also be a case where multiple external proxies (Maximum of 8) are being used. Out of these few might be responsive and the rest are unresponsive. In this case, the issue is intermittent. Let's say, you have 4 proxies, and 2 are responding and the rest 2 are not. When a login in request or a backup/restore comes in, it will utilize any one of these proxies. If the request goes to a proxy that is up and running, then your FLR login and expand restore point will work. If the request is handed to that unresponsive proxy, then the login / expand mount point fails.

To restart external proxy, login to the vdp-configure page, select the unresponsive proxy, then click the Gear Icon and click "Restart Proxy". Once all the proxies are confirmed to be ticked green, attempt the FLR again.

Hope this helps.

If you have a virtual machine added to a VDP backup job and then a Storage vMotion is performed, the next backup of this client might fail. The failure error you will see in the vSphere Client is:

VDP: An unexpected error occurred with the following error code: 10058

The backup jobs log would record the following:

2017-03-02T03:18:42.561-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'
2017-03-02T03:18:42.561-05:-30 avvcbimage Info <19728>: - connected to 'VirtualCenter' - version: 'VMware vCenter Server 6.0.0 build-3634793', apiVersion:'6.0'
2017-03-02T03:18:42.604-05:-30 avvcbimage Warning <16004>: Soap fault detected, Find VM - NOT ok, Msg:''
2017-03-02T03:18:42.605-05:-30 avvcbimage Error <0000>: [IMG0014] Problem opening vCenter:'HappyCow-DC', path:'[datastore1 (1)] VM-A/VM-A.vmx'.
2017-03-02T03:18:42.605-05:-30 avvcbimage Info <9772>: Starting graceful (staged) termination, Failed to log into web service. (wrap-up stage)
2017-03-02T03:18:42.606-05:-30 avvcbimage Warning <40657>: Login failed
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40654>: isExitOK()=208
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <17823>: Body- abortrecommended(t)
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40658>: vmparams (vcenter-prod.happycow.local)
2017-03-02T03:18:42.606-05:-30 avvcbimage Info <40654>: isExitOK()=208
2017-03-02T03:18:42.615-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'
2017-03-02T03:18:42.616-05:-30 avvcbimage Info <19728>: - connected to 'VirtualCenter' - version: 'VMware vCenter Server 6.0.0 build-3634793', apiVersion:'6.0'
2017-03-02T03:18:42.651-05:-30 avvcbimage Warning <16004>: Soap fault detected, Find VM - NOT ok, Msg:''
2017-03-02T03:18:42.651-05:-30 avvcbimage Error <0000>: [IMG0014] Problem opening vCenter:'HappyCow-DC', path:'[datastore1 (1)] VM-A/VM-A.vmx'.
2017-03-02T03:18:42.658-05:-30 avvcbimage Info <18664>: Login(https://vcenter-prod.happycow.local:443/sdk) Datacenter: 'HappyCow-DC'

A similar scenario is discussed in this release notes here. Refer section "Backup of a VM fails when ESX is moved from one datacenter to other within same vCenter inventory (207375)"

This is a known issue that occurs in the VDP due to the vmx_path variable not being updated after a Storage Migrate operation.

In this case the initial datastore of my virtual machine was datastore1 (1) and then later it was moved to datastore2. However, post this migration the backup job still picked the vmx_path as datastore1 (1) and since the VM is no longer available there, the backup failed.

Couple of solutions:
The update of this cache value will take up-to 24 hours. So, it should automatically update after some time without any manual changes.

Another workaround is to perform another storage migrate of this virtual machine to any datastore.

To force sync the cache use the "mccli vmcache sync" command from the above release notes.

A sample command would be:

# mccli vmcache sync --domain=/vcenter-FQDN(or)IP/VirtualMachines --name=<VM-Name>

Post this, run a backup to verify if things are in place. Hope this helps!

Earlier, we had seen a compatibility issue with VDP 6.1.3 on an ESXi 5.1 environment, where the backup used to fail with the message "Failed to attach disks". More about this can be read here
However, this is a very generic message and can mean different if we are running VDP on a compatible version.

In this case, the VDP was 6.1.3 on a 6.0 vSphere environment and the backup used to fail only when an external proxy was deployed. If the external proxy was discarded the backups utilized the internal VDP proxy and completed successfully.

With the external proxy, the logs are on the proxy machine under /usr/local/avamarclient/var
The backup job logs had the following entry:

2017-03-02T16:13:18.762Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_PrepareForAccess: Disable Storage VMotion failed. Error 18000 (Cannot connect to the host) (fault (null), type GVmomiFaultInvalidResponse, reason: (none given), translated to 18000) at 4259.

2017-03-02T15:46:19.092Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: Error 18000 (listener error GVmomiFaultInvalidResponse).

2017-03-02T15:46:19.092Z avvcbimage Warning <16041>: VDDK:VixDiskLibVim: Login failure. Callback error 18000 at 2444.

2017-03-02T15:46:19.092Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: Failed to find the VM. Error 18000 at 2516.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_FreeNfcTicket: Free NFC ticket.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: Error occurred when obtaining NFC ticket for [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) (fault (null), type GVmomiFaultInvalidResponse, reason: (none given), translated to 18000) at 2173.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_OpenEx: Cannot open disk [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) at 4964.

2017-03-02T15:46:19.093Z avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Open: Cannot open disk [Datastore_A] Test_VM/Test_VM.vmdk. Error 18000 (Cannot connect to the host) at 5002.

2017-03-02T15:46:19.093Z avvcbimage Error <0000>: [IMG0008] Failed to connect to virtual disk [Datastore_A] Test_VM/Test_VM.vmdk (18000) (18000) Cannot connect to the host

In the mcserver.log, the following was noted:

WARNING: com.avamar.mc.sdk10.McsFaultMsgException: E10055: Attempt to connect to virtual disk failed.
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFaultMsg(McsBindingUtils.java:35)
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFault(McsBindingUtils.java:59)
at com.avamar.mc.sdk10.util.McsBindingUtils.createMcsFault(McsBindingUtils.java:63)
at com.avamar.mc.sdk10.mo.JobMO.monitorJobs(JobMO.java:299)
at com.avamar.mc.sdk10.mo.GroupMO.backupGroup_Task(GroupMO.java:258)
at com.avamar.mc.sdk10.mo.GroupMO.execute(GroupMO.java:231)
at com.avamar.mc.sdk10.async.AsyncTaskSlip.run(AsyncTaskSlip.java:77)

The cause of this is due to an Ipv6 AAAA record. VDP does not support a dual stack networking and needs to have either IPv4 settings or IPv6 settings.

Resolution:
1. Login to the external proxy machine using root credentials
2. Run the below command to test DNS resolution:

# nslookup -q=any <vcenter-fqdn>

An ideal output should be as follows:

root@vdp-dest:~/#: nslookup -q=any vcenter-prod.happycow.local
Server: 10.109.10.140
Address: 10.109.10.140#53

Name: vcenter-prod.happycow.local
Address: 10.109.10.142

But, if you see the below output, then you have an IPv6 AAAA record as well:

root@vdp-dest:~/#: nslookup -q=any vcenter-prod.happycow.local
Server: 10.109.10.140
Address: 10.109.10.140#53

Name: vcenter-prod.happycow.local

Address: 10.109.10.142

vcenter-prod.happycow.local has AAAA address ::9180:aca7:85e7:623d

3. Run the below command to set IPv4 precedence over IPv6:

echo "precedence ::ffff:0:0/96  100">> /etc/gai.conf

4. Restart the avagent service using the below command:

# service avagent-vmware restart

Post this, the backups should work successfully. If the Ipv6 entry is not displayed in the nslookup and the backup still fails, then please raise a support request with VMware.

In vSphere 6.5 GA there have been a lot of reported instances where we are unable to deploy any ova template. In this article, I will be talking in specific to vSphere Data Protection. As you know, vSphere 6.5 supports only 6.1.3 of VDP. If you try to deploy this via the Web Client, you will run into issues stating "Use a 6.5 version of web client". The workaround would be to use the OVF-tool to have this appliance deployed on a vCenter. Personally, I find ovf tool to be a bit challenging for first time users. A simple way would be to use the govc CLI to have this template deployed. William Lam has this written about this in a greater detail as to what this tool is all about and you can read it here

Here, I am using a CentOS machine to stage the deployment.

1. The first step would be to download the appropriate govc binary. To access the list, you can visit the gitHub link here. Once you have the required binary listed out in that link, run the below command to download the binary on to your Linux box.

# curl -L https://github.com/vmware/govmomi/releases/download/v0.5.0/govc_linux_386.gz | gunzip -d > /usr/local/bin/govc

I am using govc_linux_386.gz as it is compatible with my CentOS, if you are using a different distro or a windows based system, choose accordingly.

You should see a below task to indicate the download is in progress:

[root@centOS /]# curl -L https://github.com/vmware/govmomi/releases/download/v0.5.0/govc_linux_386.gz | gunzip -d > /usr/local/bin/govc

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 6988k 100 6988k 0 0 713k 0 0:00:09 0:00:09 --:--:-- 1319k

2. Once the download is done, provide execute permissions to this binary. Run this command:

# chmod +x /usr/local/bin/govc

Note: The download can be done to any required directory.

3. Verify the download is successful and govc is working by checking for the version:

# govc version

You should see:

govc v0.5.0

4. We will have to set few environment variables to define which host, storage and network this VDP virtual machine should be deployed on.

export GOVC_INSECURE=1

export GOVC_URL=<Specify-ESXi-or-VC-FQDN>

export GOVC_USERNAME=<User-login-for-the-above>

export GOVC_PASSWORD=<Your-Password>

export GOVC_DATASTORE=<Datastore-Name>

export GOVC_NETWORK=<Network-Portgroup>

export GOVC_RESOURCE_POOL=<Resource-pool-if-you-have-one>

5. Next, we will have to create a json specification file to provide the details of the VDP appliance. Run the below command to view the specification:

# govc import.spec /vSphereDataProtection-6.1.3.ova | python -m json.tool

You will notice the below:

6. Redirect this output to a file so that we can edit and provide the necessary details. Run this command:

# govc import.spec /vSphereDataProtection-6.1.3.ova | python -m json.tool > vdp.json

7. Open the file in a vi editor and enter the details for networking. Remove the first line which says "Deployment": "small", If this is not done, your deployment will fail with:

" govc: ServerFaultCode: A specified parameter was not correct: cisp.deploymentOption "

You should have something similar post you edit the file:

Save the file.

8. Lastly, we will be deploying the ova template using the import.ova function. And during this deployment we will make use of the json file we created which has the networking details and the environment variables where we specified the location for ova deployment. The command would be:

# govc import.ova -options=vdp.json /vSphereDataProtection-6.1.3.ova

You should now see a progress bar:

[root@centOS ~]# govc import.ova -options=vdp.json /vSphereDataProtection-6.1.3.ova

[06-03-17 14:14:57] Warning: Line 139: Invalid value 'Isolated Network' for element 'Connection'.

[06-03-17 14:15:03] Uploading vSphereDataProtection-0.0TB-disk1.vmdk... (1%, 5.1MiB/s)

9. Post this you can power on your VDP appliance and begin the configuration.

Hope this helps.

This script allows you to mass cancel active backup jobs from command line of vSphere Data Protection Appliance.

#!/bin/bash
# This script cancels all active backup jobs from the command line
value=$(mccli activity show --active | cut -c1-16 | sed -e '1,3d')
if [ -z "$value" ]
then
echo "No active job to cancel"
else
for p in $value
do
mccli activity cancel --id=$p
done
fi

If you would like to cancel a set of backup jobs, like 13 jobs out of 20 running jobs, then you need to add those Job ID's to a file and then run the script to pull inputs from that file

#!/bin/bash
# This script cancels jobs from IDs provided in the id.txt file
while read p; do
mccli activity cancel --id=$p
done <id.txt

This script can be modified for other backup states like waiting-client. Just Grep, and cut, and remove the first three rows and feed the job ID's to a loop.

Chmod a+x to the file for execute. Hope this helps!

Cowsay has been around for quite a while now, but I came across it recently. I wanted to have a more interesting login for couple of data protection VMs and other CentOS boxes. If you follow this blog, you will know my domain is happycow.local, as the name "HappyCow" is quite fascinating, also it is my GamerTag on GTA5 (Hehe!).

Cowsay came to the rescue here to get this up and running in few steps. First, I had to get the cowsay package. You can download the package from here. SSH into your Linux box and have this package copied over.

Unzip the tar file by:

# tar -zxvf cowsay_3.03+dfsg2.orig.tar.gz

Post this, get into the directory cowsay-3.03+dfsg2 and run the installation script

# sh install.sh

Post this, create the below file:

# vi ~/.ssh/rc

Paste the content you want here for SSH login. My content was:

#!/bin/bash
clear
echo -e "Welcome to VDP \n If it is broken, redeploy" | cowsay
echo -e "\nYour system is been up for $(uptime | cut -d '' -f 4,5,6,7)"

Provide chmod u+x to rc file and then restart the sshd service

# service sshd restart

Log back into the terminal and you will see the "Zen-Cow" greeting you.

Looks fun!

You might come across issues, where backup scheduler does not start when you try it from the vdp-configure page or the command line using dpnctl start sched. It fails with:

2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

And the dpnctl.log will have the following:

2017/03/22-18:58:53 - - - - - - - - - - - - - - - BEGIN
2017/03/22-18:58:53 1,22631,Server has reached the capacity health check limit.
2017/03/22-18:58:53 Attribute Value
2017/03/22-18:58:53 --------- -------------------------------------------------------------------------------
2017/03/22-18:58:53 error Cannot enable scheduler until health check limit reached event is acknowledged.
2017/03/22-18:58:53
2017/03/22-18:58:53 - - - - - - - - - - - - - - - END
2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

If you run the below command you can see there are quite a few unacknowledged alarm that speaks about health check events not being acknowledged.

# mccli event show --unack=true | grep "22631"

1340224 2017-03-22 13:58:53 CDT WARNING 22631 SYSTEM PROCESS / Server has reached the capacity health check limit.
1340189 2017-03-22 13:58:01 CDT WARNING 22631 SYSTEM PROCESS / Server has reached the capacity health check limit.

To resolve this, acknowledge these events using the below command:

# mccli event ack --include=22631

Post this start the schedule either from GUI or command line using dpnctl start sched.

Hope this helps.

On April 5, VMware announced the end of vSphere Data Protection. vSphere 6.5 would be the last release to support VDP. Which means post this, you will need to migrate to third party backup.

The EOA details can be found in this link here:

http://www.vmware.com/products/vsphere/data-protection.html

The EOA KB article is published here:

https://kb.vmware.com/kb/2149614

"On April 5th, 2017, VMware announced the End of Availability (EOA) of the VMware vSphere Data Protection (VDP) product.

VMware vSphere 6.5 is the last release to include vSphere Data Protection and future vSphere releases will no longer include this product. We have received feedback that customers are looking to consolidate their backup and recovery solutions in support of their overall software-defined data center (SDDC) efforts. As a result, we are focusing our investments on vSphere Storage APIs – Data Protection to further strengthen the vSphere backup partner ecosystem that provides you with a choice of solution providers.

All existing vSphere Data Protection installations with active Support and Subscription (SnS) will continue to be supported until their End of General Support (EOGS) date. The EOGS dates for vSphere Data Protection are published on the VMware Lifecycle Product Matrix under the dates listed for different versions. After the EOA date, you can continue using your existing installations until your EOGS dates.

VMware supports a wide ecosystem of backup solutions that integrate with vSphere and vCenter using vSphere Storage APIs – Data Protection framework. You can use any data protection products that are based on this framework.

Beginning today, Dell EMC is offering you a complimentary migration to the more robust and scalable Dell EMC Avamar Virtual Edition. VMware vSphere Data Protection is based on Dell EMC Avamar Virtual Edition, a key solution for protecting and recovering workloads across the SDDC. To learn more about this offer please go to the Dell EMC website.

If you have additional questions please contact your VMware Sales Representative or read the FAQ document "

However, the Support for VDP will continue to follow as per VMware SnS agreement from this link:

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/support/product-lifecycle-matrix.pdf

Dell EMC will provide an offer to migrate VDP to AVE (Avamar Virtual Edition) here:

http://dellemc.com/vdpeoa

Any questions on the migration, refer the below FAQ:

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsphere/vmw-vdp-eoa-faqs.pdf

I will continue to post articles on VDP and answer your questions as long as I am supporting it. I will be exploring more into the vRealize Suite from today with vRealize Operations to begin with.

Comment to leave your thoughts.

Well, you never know what you got until it's gone.

So, you might run into issues where you are unable to configure VDP to vCenter and you run into this error.

Unable to find this VDP in the vCenter inventory

In the vdr-configure.log you will notice the following. Again, for all issues with vdp-configure page refer the vdr-configure.log

2017-04-10 10:41:13,365 WARN [http-nio-8543-exec-2]-vi.VCenterServiceImpl: No VCenter found in MC root domain
2017-04-10 10:41:13,365 INFO [http-nio-8543-exec-2]-reconfig.VcenterConfigurationImpl: Failed to locate vCenter Client in Avamar, reconfiguration is required
2017-04-10 10:41:13,365 INFO [http-nio-8543-exec-2]-sso.VmwareSsoServiceImpl: Getting SSL certificates for https://psc-prod:7444/lookupservice/sdk
2017-04-10 10:41:13,715 INFO [http-nio-8543-exec-2]-services.VcenterConnectionTestService: Finished vCenter Connection test with result:
<?xml version="1.0"?><vCenter><certValid>true</certValid><connection>true</connection><userAuthorized>true</userAuthorized><ave_in_vcenter>false</ave_in_vcenter><switch_needed>true<
/switch_needed><persistent_mode>true</persistent_mode><ssoValid>true</ssoValid><httpPortValid>true</httpPortValid></vCenter>

2017-04-10 10:41:13,025 WARN [http-nio-8543-exec-2]-vi.VCenterServiceImpl: Failed to get root domain from MC
2017-04-10 10:41:13,025 WARN [http-nio-8543-exec-2]-vi.VCenterServiceImpl: No VCenter found in MC root domain
2017-04-10 10:41:13,025 INFO [http-nio-8543-exec-2]-vi.ViJavaServiceInstanceProviderImpl: visdkUrl = https://vc-prod:443/sdk
2017-04-10 10:41:13,337 INFO [http-nio-8543-exec-2]-util.UserValidationUtil: vCenter user has sufficient privileges to run VDP.
2017-04-10 10:41:13,339 INFO [http-nio-8543-exec-2]-network.NetworkInfoApi: Found IP Address: [10.116.189.178] link local? [false], site local? [true], loopback? [false]
2017-04-10 10:41:13,339 INFO [http-nio-8543-exec-2]-network.NetworkInfoApi: Found IP Address: 10.116.189.178

2017-04-10 10:41:13,353 ERROR [http-nio-8543-exec-2]-vi.ViJavaAccess: getPoweredOnVmByIpAddr(): Cannot determine appropriate powered on AVE virtual machine with IP Address [10.x.x.x] since there exist many of them (2): type=VirtualMachine name=vdp-vm mor-id=vm-208, type=VirtualMachine name=Windows-Jump mor-id=vm-148

So in this case, 10.x.x.x is the IP of my VDP machine and there is a duplicate IP used by another VM in the vCenter and this is Windows-Jump. If this is the case, determine if you can remove the duplicate IP or change the IP of the VDP appliance. The configuration test should then complete without issues.

Hope this helps.

Mostly after an upgrade most of your backups fail with a status of "No eligible proxies" or "No data"
You will not be able to run on demand backups in some cases and this would fail with an error "Adhoc Backup Request Error - Exception"

root@vdp-dest:/data01/home/admin/#: mccli client backup-dataset --domain=/vcenter-prod.happycow.local/VirtualMachines --name=VM-C
1,22253,Client Adhoc Backup Request Error - Exception.

If you try to enable Internal proxy from the vdp-configure page, it will fail with the below error:

In the vdr-configure.log you will notice the following:

2017-04-15 03:50:52,463 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log

2017-04-15 03:50:52,463 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Error <7531>: Unable to register clients/vdp-dest with Administrator 127.0.0.1:28001

2017-04-15 03:50:52,464 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: 'Could not reconcile proxy with vCenter.' (203)

2017-04-15 03:50:52,464 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log

You will see vCenter connections down if you run the below command:

# mccli server show-services

You will something similar to:

0,23000,CLI command completed successfully.

Name Status

---------------------------------- -----------------------------

Hostname vdp-dest.happycow.local

IP Address 10.109.10.167

Load Average 0.24

Last Administrator Datastore Flush 2017-04-15 04:45:00 IST

PostgreSQL database Running

Web Services Error

Web Restore Disk Space Available 256,417,868K

snmp sub-agent Disabled

ConnectEMC Disabled

snmp daemon Disabled

ssh daemon Running

Data Domain SNMP Manager Not Running

Remote Backup Manager Service Running

RabbitMQ Not Running

Replication cron job Not Running

/vcenter-prod.happycow.local 5 vCenter connection(s) down.

If you try to register proxy from the command line using the below command, it will fail as well.

# /usr/local/avamarclient/etc/initproxy.sh start

avagent.d Info: Stopping Avamar Client Agent (avagent-vmware)...

avagent.d Info: Client Agent stopped.

avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log

avagent Error <7531>: Unable to register clients/vdp-dest with Administrator 127.0.0.1:28001

'Could not reconcile proxy with vCenter.' (203)

avagent.d Info: Client activation error.

avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log

avagent Info <5417>: daemonized as process id 351

avagent.d Info: Client Agent started.

Registration Failed.

initproxy.sh FAIL: registerproxy failed

The cause:

This is because, there is a key called as "ignore_vc_cert" which will be flipped to false. The VDP will always be waiting for process to acknowledge the certificate warning which will never work and hence the proxy fails to start.

The fix:

1. Run the below command to verify the key value:

# grep -i ignore /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

The output should be similar to:

2. Edit this mcserver.xml file and replace the ignore_vc_cert value to true and save the file

3. Switch to admin mode of VDP (sudo su - admin) and restart the mcs using:

# mcserver.sh --restart

4. Register the internal proxy from GUI and it should work successfully and none of the vCenter connections will be reported as down.

Hope this helps.

You might sometimes restart your appliance and you will be presented with the message:

The server is still starting. Depending on the configuration, this could take up to 25 minutes. Try again later.

No matter, how many times you try to login you will run into the same message.

Again, if you look at the vdr-configure.log, you will notice the following:

2017-04-15 05:44:49,242 INFO [http-nio-8543-exec-3]-services.LoginService: Login service called with action: [login]
2017-04-15 05:44:49,243 INFO [http-nio-8543-exec-3]-services.LoginService: Checking if the server is in a running state...
2017-04-15 05:44:49,243 INFO [http-nio-8543-exec-3]-services.LoginService: Server is not running
2017-04-15 05:45:06,592 WARN [pool-21-thread-1]-backupagent.BackupAgentUpdaterImpl: No proxy-clients are available.

This does not really help much to understand that what is going on.The cause here is due to missing .av_sys_state_marker_running file. I guess this file records the state of the VDP appliance. If this file goes missing, the server is unable to determine the state, which is why vdr throws up "Server is not running" in the logs.

The file is located under /usr/local/avamar/var

Go to this directory and recreate this file using:

# touch .av_sys_state_marker_running

Post this, refresh the vdp-configure page and you should have access.

Since VMware announced, the end of vSphere Data Protection, there is a choice to migrate existing deployment to EMC Avamar. More about EOA can be found in this link here.

In this article, we will be looking at deploying Avamar Virtual Edition 7.2. You can go ahead and download the required version of AVE from EMC download portal. The version I will be using is Avamar 7.2.1.

Login to Web Client or vSphere Client, Select the ESXi host where you want to deploy your AVE and select File > Deploy OVF. Browse the location for the AVE download and add the file. Click Next.

Review the details of the OVF template and click Next.

Accept the EULA and click Next.

Provide a name for this AVE virtual machine. Click Next.

If available and required select a resource pool in which you want to place this VM. Click Next.

Select the datastore where you want to deploy this. Remember the vmdk bundled with AVE is just the OVF, the data drives are configured later just like a VDP appliance. Click Next.

Select a disk provisioning type. Thick provision is recommended. Click Next.

Select a network where this AVE should be connected to. Click Next.

Review the changes and click Finish. Do not check Power On after deployment, because there are couple of steps to be done once the OVF deployment is completed.

Just like VDP, the AVE comes with 4 supported available backup storage. You can refer the below table to size your AVE accordingly.

Once you choose the deployment type, you will need to refer the below table to plan the drive sizes. Just like in VDP a 512GB deployment will have 3 drives of 256 GB each. The additional space is for the checkpoint maintenance overhead.

So the rule goes like:
Total size = GSAN capacity + 1/2 of GSAN capacity.

GSAN capacity would be the actual space for storing backup data.

So go ahead and add three disks manually (Depending on your AVE configuration) to this VM. Only Thick Provisioning is supported for AVE. I will be using Thin because of space constraints.

Once the drives are added, power on the AVE virtual machine.

The default login is root and changeme

First, we will have to configure network settings for the AVE machine. Post the login to AVE from VM console, run yast2 to begin the networking configuration. You will see a similar interface:

Select Network Devices and then Network Services to begin the network configuration wizard and you should see something similar:

You will need to Set IP Configuration in Overview, Hostname / DNS settings and Gateway under Routing.

Once the appliance is configured with network, restart the guest and then verify the network by ping and nslookup. If this works good proceed to Part 2 in the below link.

VDP - Avmar Migration: Part-2: Configuring Avamar Virtual Edition 7.2

Avamar Virtual Edition 7.1: Failed To Communicate To vCenter During Client Registration

Dude, Where's My Logs?

Slow GUI Response On VDP 6.1.3

VDP 6.1.3 Backups Hang At 92 Percent When Backing Up VMs On vSAN

VDP Verification Job Leaves Undeleted Folders

VDP Restore Fails With "Operation Not Permitted"

Unable To Send Test Email In VDP 6.x

VDP Backup Fails With "Failed To Remove Snapshot"

VDP FLR Does Not Expand Mount Points

VDP Backup Fails After A Storage vMotion

VDP Backup Fails With The Error "Failed To Attach Disks"

Deploying vSphere Data Protection From govc CLI

Automating Backup Cancellation From Command Line

Cowsay For Linux SSH Login

Unable To Start Backup Scheduler In VDP 6.x

Farewell vSphere Data Protection - End of Availability.

Unable To Configure VDP To vCenter - Unable to find this VDP in the vCenter inventory

Failed To Start Internal Proxy In VDP 6.x

VDP Configure Page Reports - Server Is Still Starting

VDP - Avmar Migration: Part-1: Deploying Avamar Virtual Edition 7.2