Error While Binding iSCSI To VMKernel Adapters: IscsiManager.QueryBoundVnics

May 11, 2016, 1:07 pm

≪ Previous: VDP Backup With External Proxy

Written by Suhas Savkoor

So when you try to click Properties of the iSCSI adapter to modify certain settings, you see the following error:

Error:

Call "IscsiManager.QueryBoundVnics" for object "iscsiManager-##" on vCenter Server "VC-name" failed

Cause:

This event occurs when ESXi's internal iSCSI daemon becomes corrupted, requiring a cleanup of the daemon's files in ESXi.

How to resolve this:

1. Migrate all virtual machines using vMotion to ESXi hosts that are unaffected.

2. Review the existing iSCSI configuration and copy the IQN and adapter settings for your iSCSI software adapter to a text document.

3. Disable the iSCSI adapter.

4. Navigate to the /etc/vmware/vmkiscsid/ directory on your ESXi host and back up the contents of the folder to a safe location.

5. Delete the contents of /etc/vmware/vmkiscsid/

6. Write the changes to the ESXi boot bank using this command:

# backup.sh

7. Reboot the ESXi host.

8. Create a new software iSCSI adapter and configure it as per the backup you saved in step 2.

9. Add the iSCSI port bindings and targets.

That's pretty much it. You should be good to go.

↧

VDP File Level Recovery Client

May 11, 2016, 2:23 pm

≫ Next: VDP Backup Job Not Appearing In Recent Tasks

≪ Previous: Error While Binding iSCSI To VMKernel Adapters: IscsiManager.QueryBoundVnics

Written by Suhas Savkoor

So, you would be familiar with restoring virtual machines with VDP which is done from the Restore tab of the VDP appliance from the Web Client GUI. However, there are cases, where the necessity is to restore only certain files or folders and not the entire virtual machine or virtual disk. In this case, instead of using disk restore or virtual machine restore we perform File Level restore.

Like your VDP configure page, FLR also has a client. The FLR client has to be accessed on the virtual machine where you want to perform the file restores. The URL for this would be:

https://VDP-IP:8543/flr

The login screen would look something like this:

This is the simple login page and there is an option for advanced login as well. We will have a look at the advanced login a little later in this article.

For the simple login, you will have to provide the Windows machine's local administrator credentials. Now, for a small demo, on the machine where I have accessed the FLR, I had a file under "Desktop" directory called "Critical data" which had some data called "Critical Text". To give a little more background, once this file was created, a regular backup was executed for this virtual machine. The backup completed successfully. However, some hours later the file was lost. Now, we will be restoring that file using the FLR client.

So I have logged in with my local admin credentials to the FLR client and this is the first screen I would see after a successful login.

I have just one restore point created for this virtual machine since only one backup was taken. If you have multiple restore points after a set of backups, you can choose the restore point that you think is the best and select the option called Mount.

You will then see this directory view. I will expand the drive and folders until, the necessary file is located. So my file was created under C > Users > Administrator > Desktop. Here the critical data text file is seen.

I will select only this file to restore and select the "Restore selected files" option. Upon selecting that you will have to provide a location as to where you want to restore the file. I will select the destination where it was originally residing. If you want to specify a different destination, you can do the same as well here. Click the Restore option once the directory is decided.

You will receive a generic prompt stating that the operation might be time consuming. Select Yes to start the restore.

Click Monitor Resources tab to check the restore status. Here the progress is not shown, and the only view available is as the one seen below:

From the vSphere client you can monitor the actual status of the restore.

Once the restore is completed successfully you can go to that directory and verify the file is restored successfully.

The next option is restoring using the advanced login method. The advanced login method is used when, the machine where you actually want to restore the file has no access. In this case, with the advanced login, you will provide the credentials of the vCenter along with local credentials of the machine.

The login would something like below:

Here the restore point screen looks a bit different. Instead of seeing only the restore point of the virtual machine, you will be seeing all the restore points for the virtual machines which were backed up.

Click the drop down for the virtual machine required, or you can also filter by name to locate it quickly. Click the required restore point from the list after selecting the drop-down and click mount.

The remaining process is going to be same, you can restore the file to the same location or you can restore the file to a different location.

One more thing to note:

Let's say, initially you had a text file with contents ABC. Then you took a backup of the virtual machine. After the backup, you made changes to the text file so your contents are now ABCDE. Now, if you restore this file on the same directory where the original file exists then the new contents of the file will be over written with the contents present in the restored file. So make sure that restore is done appropriately.

That's it!

To know more about limitations of FLR you can click this link here to read the VDP admin guide. (See page 152)

↧

VDP Backup Job Not Appearing In Recent Tasks

May 13, 2016, 12:33 pm

≫ Next: Unable To Connect VDP 6.1 To Web Client

≪ Previous: VDP File Level Recovery Client

Written by Suhas Savkoor

This is going to be a small article for few issues that I worked on recently. When you login to vSphere Web Client and run a backup job, the task progress does not show up under the "Recent Tasks" pane. When you login to vSphere Windows client, you can see the Create Snapshot and VDP backup job task. And in Web Client, when you click the Tasks section on the left hand side, you can see this task.

This was the situation when my VDP user (The user with which I configured my VDP to vCenter) was different to the user that I had currently logged into web client with.

The VDP user that I have configured is vsphere.local\Suhas and the user that I was logged in was vsphere.local\Administrator (Single Sign On user)

You cans see the user with which VDP is registered below:

Next, we will login to web client with the user that the VDP was registered to vCenter with, which is, vsphere.local\Suhas. And once logged in, I started another backup job and this time I can see the task in the "Recent tasks" pane. See screenshot below:

The resolution is simple, really simple. When you are logged in with a different user when compared to the user with which the VDP is registered with, Under the Recent Tasks pane, you have a drop-down which reads, My Tasks. Click this drop-down and select All Tasks, now you can see all the VDP backup tasks. See the screenshot below for the different user (SSO user)

That's it, really!

↧

Unable To Connect VDP 6.1 To Web Client

May 17, 2016, 2:44 pm

≫ Next: Fatal Error While Creating Storage For A New VDP Appliance Deployment

≪ Previous: VDP Backup Job Not Appearing In Recent Tasks

Written by Suhas Savkoor

So once the new VDP 6.1 appliance is deployed, we login to Web Client and select the vSphere Data Protection plugin. From the drop-down, we will select the required VDP appliance and click connect. However, upon performing this, the screen grays out forever, and this operation does not fail either with an error.

The thing to notice here is:

1. This issue occurs when the VDP 6.1 is on a distributed switch.

2. If the VDP virtual machine is migrated to a standard switch, the appliance is able to connect to the web client

3. If the VMs that need to be backed up are on distributed switch, then the backup job create task grays out forever.

4. If the entire environment is migrated to standard switch, the working goes back to normal.

So all in short, VDP 6.1 on a vDS environment has issues. Now, since migrating your entire networking to standard switch is obviously not a feasible or recommended task, there is a hot-patch released to fix this.

Now, I am sharing these steps here along with the patch is solely because, I have distributed the patch to every customer who had opened a case with us to get this fixed along with the steps, so why not share it here for ease of access.

Before we get to the resolution, this is what was noticed in the virgo logs in vCenter when the connect option was clicked when the VDP appliance was on a distributed switch.

[2016-04-11 18:36:45.604] [INFO ] http-bio-9443-exec-3 System.out [BlazeDS]Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'.
[2016-04-11 18:36:45.604] [INFO ] http-bio-9443-exec-3 System.out flex.messaging.MessageException: Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'. Type 'com.vmware.vim.binding.vim.dvs.PortConnection' not found.

[2016-04-11 18:36:45.608] [INFO ] http-bio-9443-exec-3 System.out [BlazeDS]Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'.
[2016-04-11 18:36:45.608] [INFO ] http-bio-9443-exec-3 System.out flex.messaging.MessageException: Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'. Type 'com.vmware.vim.binding.vim.dvs.PortConnection' not found.

[2016-04-11 18:36:45.612] [INFO ] http-bio-9443-exec-3 System.out [BlazeDS]Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'.
[2016-04-11 18:36:45.612] [INFO ] http-bio-9443-exec-3 System.out flex.messaging.MessageException: Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'. Type 'com.vmware.vim.binding.vim.dvs.PortConnection' not found.

[2016-04-11 18:36:45.616] [INFO ] http-bio-9443-exec-3 System.out [BlazeDS]Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'.
[2016-04-11 18:36:45.616] [INFO ] http-bio-9443-exec-3 System.out flex.messaging.MessageException: Cannot create class of type 'com.vmware.vim.binding.vim.dvs.PortConnection'. Type 'com.vmware.vim.binding.vim.dvs.PortConnection' not found.

So the resolution:

1. Download the vdr-ui-war-6.1.2.war file from this link here.
2. Determine the version of vCenter and navigate to one of the following directories accordingly:

5.5 vCenter
Windows:

C:\ProgramData\VMware\vSphere Web Client\vc-packages\vsphere-client-serenity\com.vmware.vdp2-<version>\plugins

Appliance:

var/lib/vmware/vsphere-client/vc-packages/vsphere-client-serenity/com.vmware.vdp2-<version>/plugins

6.0 vCenter
Windows:

C:\ProgramData\VMware\vCenterServer\cfg\vsphere-client\vc-packages\vsphere-client-serenity\com.vmware.vdp2-6.1.*\plugins

Appliance:

etc/vmware/vsphere-client/vc-packages/vsphere-client-serenity/com.vmware.vdp2-6.1.*/plugins/

3. Here you can see two files, a .jar file and a .war file.
4. Rename the existing .war file to vdr-ui-war-6.1.x-old.war (where x is the version of your plugin)
5. Copy the patched .war file in the attachment, and paste it, named accordingly into this folder.
6. In vCenter appliance 6.0, I had to perform couple of additional steps. The older jar and war file were having the following permissions:

Owner: vsphere-client
Group: Users

7. The applied patch has Owner and Group as root. Change this accordingly.
You can do this easily by opening a WinSCP connection to vCenter > Right click the File > Properties and these two options will be available here.
8. Restart the web client service.

Windows, you can find this service in services.msc

Appliance

service vsphere-client stop

service vsphere-client start

9. Login to web client > Connect, and now we should be able to successfully connect to the appliance from web client.

That's pretty much it.
If something does not work, comment below! And always take care while patching.

Thank you!

↧

Fatal Error While Creating Storage For A New VDP Appliance Deployment

May 20, 2016, 12:34 pm

≫ Next: VDP Full and File Level Restore Fails: Failed to get disks: Unable to browse as proxies are unavailable

≪ Previous: Unable To Connect VDP 6.1 To Web Client

Written by Suhas Savkoor

After you deploy the ova template for VDP we need to configure the VDP appliance from the vdp-configure page. During the storage configuration, I had chosen about 8 TB of storage for the VDP and during the configuration progress, the setup encountered a Fatal Error. The exact message and the screenshot is:

“A fatal error occurred during the storage configuration and the appliance is unrecoverable. This could be due to the datastore not having enough free space, does not support the required VMDK size, or is in inactive state. Please check the vCenter Tasks pane for the exact error and redeploy a new VDP appliance”

Couple of pre-checks:

1. Make sure the destination VMFS / NFS datastore has sufficient space to accommodate the selected VDP data storage disks.

2. If we are using VMFS datastore, make sure that the block size for the VMFS datastore supports the VMDK size being created on it

3. Then lastly, check for the permissions for the user that the VDP is being registered to vCenter with, also called as VDP user. The user should have disk create role, or best yet, administrator role.

So, I used a domain user with administrator privilege or you can also use the SSO user to get the VDP appliance to registered to the vCenter server.

The disk creation completed successfully and I was able to continue to use the VDP appliance.

↧

VDP Full and File Level Restore Fails: Failed to get disks: Unable to browse as proxies are unavailable

May 30, 2016, 1:31 pm

≫ Next: Unable to vMotion Due To Difference In vDS Vendor Information Between Source And Destination vDS

≪ Previous: Fatal Error While Creating Storage For A New VDP Appliance Deployment

Written by Suhas Savkoor

So when trying to perform a full virtual machine restore or a File Level Restore for any virtual machine the task fails, at various possible percentages. The failure descried here is,

Failed to get disks: Unable to browse as proxies are unavailable

If I try to restore this virtual machine on any other host, datastore or restore as a new virtual machine, replace existing virtual machine, the task still fails with the same error.

The restore logs, in the following location shows:

Location: # cd /usr/local/avamarclient/var

The log with following MOD-*EPOCH*-vmimage1.log would be the restore logs:

The logs had the following;

2016-05-23T16:01:39.723-02:00 avvcbimage Error <0000>: [IMG0011] Timeout on wait for spawned restore metadata avtar process to complete

2016-05-23T16:06:11.947-02:00 avvcbimage FATAL <17824>: GetDiskAttributed Failed

2016-05-23T16:06:12.121-02:00 avvcbimage Error <17771>: Invalid request to create a VM.

2016-05-23T16:06:12.121-02:00 avvcbimage Error <0000>: [IMG2012] VM creation failed during restore.

Here the timeout is occurring while the AvVcbimage is spawning the avTar for metadata restore. This in-turn is failing to create the virtual machine for restore.

The resolution:

Increase the AvVcbimage timeout using the below steps:

1. If the VDP is using external proxy, then you will have to SSH to your proxy machine. If the VDP is using internal proxy, then we will have to use the SSH to VDP appliance. Change the directory to:

# cd /usr/local/avamarclient/var

2. Edit the following file:

# vi vvcbimage.cmd

3. Add the below and save the file

--subprocesstimeout=600

4. Restart the avAgent service using the below command:

# avagent-vmware restart

Run the restore operation now, and it should work good.

If not, leave a comment! Thank you.

↧

Unable to vMotion Due To Difference In vDS Vendor Information Between Source And Destination vDS

June 2, 2016, 1:41 pm

≫ Next: Unable To Add A Client To A Backup Job In VDP: Already Registered

≪ Previous: VDP Full and File Level Restore Fails: Failed to get disks: Unable to browse as proxies are unavailable

Written by Suhas Savkoor

So I was trying to vMotion my virtual machines between two vDS and it was failing with the following error:

"The destination distributed switch has a different version or vendor than the source distributed switch"

A quick search brings up the following KB article which is great, however, there are certain cases where people do not want to upgrade the DVS to change the vendor ID.

In that situation, there are two workarounds:

1. Changing the entry from the MOB page of the vCenter:

Open a browser and go to the following IP address to open the Managed Object Browser page:

https://<vcenter-IP/mob

Click Content
Under Name: rootFolder select the value: group-d1 (Datacenters)
Under Name: childEntity select the value data-center
Under Name: networkFolder select group-n* (network)
Under Name: childEntity locate the DV Switch and prior to the name of the DVSwitch you will have the moid of the Switch in the format dvs-*number*
Go to the following location once the moID is obtained.

https://vCenter-IP/mob/?moid=dvs-141

Click PerformDvsProductSpecOperation_Task on the page
Enter the following, For Fill operation: Upgrade
Fill productSpec:

<vendor>VMware, Inc.</vendor>

</productSpec>

Click Invoke Method

Go back to the vCenter Server and verify the Vendor Name now on your vDS.

2. If you are using a SQL database, you can run the below query:

update vpx_dvs set product_vendor='VMware, Inc.' where id=XXX(replace XXX with your dvs id);

dvs ID can be found from the steps above.

However, before performing any steps, have snapshots, backup of your vCenter and database.

↧

Unable To Add A Client To A Backup Job In VDP: Already Registered

June 2, 2016, 2:00 pm

≫ Next: Unable To Connect VDP To Web Client: Failed To Validate vCenter certificate

≪ Previous: Unable to vMotion Due To Difference In vDS Vendor Information Between Source And Destination vDS

Written by Suhas Savkoor

So today I was working on a case where a vCenter virtual machine and a PSC virtual machine were added to a backup job. One fine day, the PSC machine started behaving weirdly and had to be discarded. So the end solution was to re-deploy the PSC machine completely. The name of the faulty PSC machine was XYZ. This was renamed to XYZ-Old and a new PSC was deployed with the same old name, XYZ.

The vCenter was back up and running good now. So, we went to the web client and connected to the VDP plugin and discarded the old backup job. So now, the backup tab had no backup jobs at all. Then, we went ahead and created a new backup job and added the new PSC machine, XYZ to this and clicked Finish. The screen grayed out for few seconds and the task finished with an error stating:

Backup job was created successfully, but the system failed to add the following clients to the job. ERROR: Host Named (<host-name>) is already registered as........

Why this is caused:

It is saying that the PSC, XYZ is already registered because it still had the old machine entry in the protected client list in the VDP backup client domain. This stale entry was preventing us to add the new machine with the old name, even though the old PSC machine was renamed and the backup job corresponding to it was removed.

The solution:

1. Open a SSH to the VDP appliance and run the below command:

mccli client show --recursive

This will show all the protected clients associated with this VDP appliance.

Here I noticed that the XYZ PSC machine entry was still present. I need to remove this stale entry.

2. Run the below command to retire that VM from the protected list

mccli client retire --domain=/vCenter-IP/VirtualMachines --name="VM-Name"

You will receive an output saying virtual machine is retired successfully.

Once this was done, we went ahead and added the XYZ PSC virtual machine to the backup job successfully.

Sharing is caring! You would not find much of VDP articles, believe me!!

Thank you!

↧

Unable To Connect VDP To Web Client: Failed To Validate vCenter certificate

June 6, 2016, 12:01 pm

≫ Next: Configure Syslog For VDP Appliance

≪ Previous: Unable To Add A Client To A Backup Job In VDP: Already Registered

So out of nowhere today, when I tried to connect the VDP appliance to web client, I got the following error message:

"Failed to validate the vCenter certificate. Either install or verify the certificate by using the vSphere Data configuration utility"

A screenshot of the message:

The vdr-configure logs displayed the following:

Location: /usr/local/avamar/var/vdr/server_logs

2016-06-06 20:42:26,067 INFO [http-nio-8543-exec-6]-rest.AuthenticationService: Vcentre Certificate validation had failed

2016-06-06 20:42:27,454 INFO [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: vcenter-ignore-cert ? true

2016-06-06 20:42:27,514 INFO [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: visdkUrl = https://192.168.1.1:443/sdk

2016-06-06 20:42:31,127 ERROR [Thread-15]-vi.ViJavaServiceInstanceProviderImpl: Failed To Create ViJava ServiceInstance owing to Remote VCenter connection error

So I was not using any custom certificate here. All default VMware certificate on my 5.5 vCenter.

The resolution:

1. Restart all VDP services using the below command:

dpnctl stop all

dpnctl start all

2. When I logged into https://VDP-IP:8543/vdp-configure page I noticed the following:

I have an external proxy here, and it looked like the proxy VM was down. Restarted the proxy VM, yet the services and proxies status was not updated, which is when I proceeded to Step (3).

3. Re-register your VDP appliance to vCenter. You can follow this link here to get this task done.

Once the re-register (Same VDP user or a different user) was done (Without any changes to vCenter port or IP) the appliance was able to connect successfully to the web client.

Interesting enough, I was unable to find the cause of this, although I suspect that the connectivity between vCenter and VDP was broken. If there is something more detailed, do comment below. Always looking for VDP deep-dives.

That's pretty much it!

↧

Configure Syslog For VDP Appliance

June 24, 2016, 1:51 pm

≫ Next: VDP Backup Fails: Error In Creating Quiesce Snapshot

≪ Previous: Unable To Connect VDP To Web Client: Failed To Validate vCenter certificate

Written by Suhas Savkoor

So you want to forward VDP logs to a syslog server? Okay. Is it supported officially? No, VDP does not support forwarding its logs to a different machine or a log solution. Is a workaround available? Yes.

The VDP appliance is based on a SUSE box and we have the syslog-ng file which can be used to configure remote syslog for this machine.

In my lab, I have a VDP 6.1.2 appliance which is configured to a 6.0 U2 vCenter Server. The syslog collector solution I have here is VMware vRealize Log Insight.

So how do we achieve this?

1. So William Lam has written a script which can be used to configure syslog for various VMware products. We will be using that script to configure syslog for VDP Server. Download the script from this link here.

2. So once this shell script is downloaded, copy this file into your VDP appliance root directory or /tmp directory. I used WinSCP to move this file to the appliance.

3. Run the script using the below format:

# ./configurevCloudSuiteSyslog.sh vdp <Remote-Syslog-IP>

4. Once this command is executed browse to the following directory to view the syslog-ng configuration file.

# cd /etc/syslog-ng

#less syslog-ng.config

5. The end of the file will have these lines appended:

# Configured using vCloud Suite Syslog Configuration Script by William Lam

source vdp {

file("/space/avamar/var/log/av_boot.rb.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/space/avamar/var/log/dpnctl.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/space/avamar/var/log/dpnnetutil-av_boot.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/log/dpnctl.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/log/av_boot.rb.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/log/av_boot.rb.err.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/log/dpnnetutil-av_boot.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/avi/server_log/flush.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/avi/server_log/avinstaller.log.0" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/vdr/server_logs/vdr-server.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/vdr/server_logs/vdr-configure.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamar/var/flr/server_logs/flr-server.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/data01/cur/err.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamarclient/bin/logs/VmMgr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamarclient/bin/logs/MountMgr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamarclient/bin/logs/VmwareFlrWs.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

file("/usr/local/avamarclient/bin/logs/VmwareFlr.log" log_prefix("vdp: ") follow_freq(1) flags(no-parse));

};

# Remote Syslog Host

destination remote_syslog {

udp("192.168.10.6" port (514));

};

log {

source(vdp);

destination(remote_syslog);

};

The port used here is UDP and the IP 192.168.10.6 is my Log Insight server IP, this will of course vary for you.

6. Over to the Log Insight, Under the Hosts section, you can see my VDP section added:

7. When I click vdpnew (Hostname of my VDP appliance), I see the logging being available:

Well, that's pretty much it. Just a workaround, not officially supported though.

↧

VDP Backup Fails: Error In Creating Quiesce Snapshot

June 29, 2016, 11:48 am

≫ Next: Perform A VDP Backup When The Datastore For Client VM Is Running On Low Space

≪ Previous: Configure Syslog For VDP Appliance

Written by Suhas Savkoor

So there are many causes of failure for unable to quiesce a VM while taking a snapshot and this is one of them I came across while working on an issue.

A brief summary what was happening here:
There was a virtual machine which was being backed up by VDP and it used to fail when VDP was initiating a quiesce snapshot on it.

The backup job logs were showing the below:

2016-06-10T14:00:47.451+04:00 avvcbimage Info <0000>: vm-2412

2016-06-10T14:00:47.469+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec

2016-06-10T14:00:50.524+04:00 avvcbimage Info <0000>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task still in progress, sleep for 3 sec

2016-06-10T14:00:53.576+04:00 avvcbimage Warning <19733>: vSphere Task failed (quiesce, snapshot error=45): 'An error occurred while saving the snapshot: Failed to quiesce the virtual machine.'.

2016-06-10T14:00:53.576+04:00 avvcbimage Error <17775>: Snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e' creation for VM '[XYZ] abc/abc.vmx' task creation encountered a quiesce problem

2016-06-10T14:00:53.576+04:00 avvcbimage Warning <0000>: The VM could not be quiesced prior to snapshot creation and this backup will not be used as a base for subsequent CBT backups if successful.

So we tried a couple of basic troubleshooting steps such as uninstalling VMware tools and reinstalling it, moving the virtual machine to a different ESXi host and a different datastore. Everything produced the same result, Failed to quiesce the virtual machine. So if I quiesce snapshot the virtual machine from vCenter, it completes successfully. Only quiesce from VDP failed.

The next step was to check the vss writer status on the virtual machine. So from an elevated command prompt, I ran the below command

vssadmin list writers

All the VSS writers were in a healthy state without any error. Then, I proceeded to check the Event Logs for VSS issues, and there was none.

Next, over to the vmware.log for this virtual machine, displayed the below logging:

2016-06-13T18:01:55.531Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e', deviceState=0, lazy=0, logging

=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=B986C80, cbData=323CE050

2016-06-13T18:01:55.739Z| vcpu-2| I120: ToolsBackup: changing quiesce state: IDLE -> STARTED

2016-06-13T18:01:57.770Z| vcpu-0| I120: Msg_Post: Warning

2016-06-13T18:01:57.770Z| vcpu-0| I120: [msg.snapshot.quiesce.vmerr] The guest OS has reported an error during quiescing.

2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error code was: 5

2016-06-13T18:01:57.770Z| vcpu-0| I120+ The error message was: 'VssSyncStart' operation failed: IDispatch error #8451 (0x80042303)

2016-06-13T18:01:57.770Z| vcpu-0| I120: ----------------------------------------

2016-06-13T18:01:57.775Z| vcpu-0| I120: ToolsBackup: changing quiesce state: STARTED -> ERROR_WAIT

2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: ERROR_WAIT -> IDLE

2016-06-13T18:01:59.797Z| vcpu-0| I120: ToolsBackup: changing quiesce state: IDLE -> DONE

2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VDP-1465581646a5cf9db1234fe3558e090fec0f29a7eaa4807b2e': 0

2016-06-13T18:01:59.797Z| vcpu-0| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).

2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMX_Consolidate: Starting

2016-06-13T18:02:08.245Z| vmx| I120: SnapshotVMXConsolidateOnlineCB: nextState = 0 uid 0

So from vmware.log I saw logging regarding guest was unable to quiesce the virtual machine and VSS sync driver failed to start which led to failure of snapshot creation.

Something was going on the VSS end. The VSS writers were good, so I proceeded to check the VSS providers.

vssadmin list providers

This listed two providers with their own UUID. However, in the registry, there were three UUID entries. See the below image:

If you see here, 7669 is not a provider UUID. So, I took a backup of registry, and deleted this entry from the registry post which quiesce snapshots were successful from VDP.

Hope this helps to someone out there facing this issue.

↧

Perform A VDP Backup When The Datastore For Client VM Is Running On Low Space

July 1, 2016, 1:01 pm

≫ Next: Automatic/Manual Backup Verification Fails In VDP

≪ Previous: VDP Backup Fails: Error In Creating Quiesce Snapshot

Written by Suhas Savkoor

If you try to backup a virtual machine which is residing on a datastore running low on space, the backup fails. The VDP will try to issue a snapshot take call, however, the task is not initiated for either Create Snapshot or VDP: Backup. Instead you see the following in the reports tab.

VDP: Failed to create snapshot

The backup job log located at /usr/local/avamarclient/var will have the following logging:

2016-06-30T09:59:11.371+07:00 avvcbimage Info <19704>: DataStore Storage Info:Local-esxi02 capacity=4831838208 free=138412032

2016-06-30T09:59:11.372+07:00 avvcbimage Info <19716>: DS Capacity=4831838208 FreeSpace=138412032 / HD committed=3951319082 unCommitted=1680 unShared=3758096384

2016-06-30T09:59:11.372+07:00 avvcbimage Info <19717>: DS(Local-esxi02) does not have enough free space (138412032 ) for disks used (197565952).

2016-06-30T09:59:11.372+07:00 avvcbimage Error <19661>: Datastore does not have enough free space for snapshot

2016-06-30T09:59:11.372+07:00 avvcbimage Info <9772>: Starting graceful (staged) termination, failed to create snapshot (wrap-up stage)

2016-06-30T09:59:11.372+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation or pre/post snapshot script failed

2016-06-30T09:59:11.372+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation/pre-script/post-script failed

2016-06-30T09:59:11.372+07:00 avvcbimage Info <40654>: isExitOK()=202

2016-06-30T09:59:11.372+07:00 avvcbimage Info <40659>: snapshot created:false NOMC:false ChangeBlTrackingAvail:true UsingChBl:true, ExitOK:false, cancelled:false, fatal: true

2016-06-30T09:59:11.372+07:00 avvcbimage Info <40654>: isExitOK()=202

2016-06-30T09:59:11.372+07:00 avvcbimage Info <40660>: vcbimage_progress::terminate

2016-06-30T09:59:11.373+07:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_EndAccess: Disk access completed.

Now there is a parameter that can be added to avvcb daemon to ignore the free space on the datastore and still proceed with the backup operation. This workaround is tricky, why because let's consider the following situation:

You have a virtual machine of about 500 GB on a datastore of 550 GB. So the free space on the datastore is 50 GB. Let's say we added the parameter to ignore this and still take a backup, and the new data that has come into the VM has reached the 50 GB free space limit, then the VM will straight up stop to function because it has no space to get the new data. At this point of time, you will end up expanding the datastore so that the VM can be powered on.

This is why, it is always recommended you expand the datastore then perform a backup. Now, there are certain cases where that 500 GB VM is a file server where no new data is coming in, In that case, we are good to add this parameter and run the backup.

How do we do this?

1. SSH into your VDP appliance and change to the following directory:

# cd /usr/local/avamarclient/var

2. You will have a file called as avvcbimageAll.cmd

3. Open this file using a vi editor and edit the file to add the following parameter:

--snapshot_max_change_percent=0

4. Restart the avagent daemon using the below command

service avagent-vmware restart

5. Now you should be able to run the backup job even when your datastore is running low on space.

Note:

If you are using internal proxy, then this step will be done on the VDP appliance itself as the avvcbimage lies on the appliance itself. If you are using external proxy, then you will have to edit the avvcbimageAll.cmd file on the proxy machine, as the proxy VM would be responsible for opening/closing VMDK as this machine runs the avvcb daemon. Also, if you have multiple proxy VMs, then add this parameter to all your external proxy machines and restart the avagent service on all of them.

This is supported on 6.1 VDP as well.

↧

Automatic/Manual Backup Verification Fails In VDP

July 8, 2016, 1:16 pm

≫ Next: Web Client Login Page Displays vRA or vCAC As The Banner Name

≪ Previous: Perform A VDP Backup When The Datastore For Client VM Is Running On Low Space

Written by Suhas Savkoor

To check the consistency of the restore points you have backup verification jobs. These verification jobs can be either Automatic (ABV) or Manual backup verification. The backup verification flow on a high level basis goes as:

>> Restore: Restores the restore point as a temporary VM on the ESXi host and datastore which is defined on the backup verification job

>> Power On: Powers On the VM.

>> Heartbeat Verification: Verifies the heartbeat for the restored virtual machine

>> Power Off: Powers Off the VM once the verification is done

>> Delete VM: Remove the temporary restored VM from the inventory and delete from disk.

The issue I am going to be discussing here is not a general issue, and this caused due to a very specific cause. However, the troubleshooting steps can be used and you might have similar causes due to which you will run into verification jobs to fail.

All the verification job logs are present under the following directory:

/usr/local/avamarclient/var/

The verification job that I had was something as:

xyz-backup-verify-1467724890971-c2857d179f4b9e67465bf496709d8bc1f43149ef-1016-vmimagel.log

I created it as xyz because I have a VM named xyz, hence the temporary restored VM would have a name similar VDP_VERIFICATION_xyz

So in the verification job the initial logging refers to the following:

>> Which vCenter this VM is going to be restored for verification

>> The ESXi host

>> The VMFS/NFS datastore

Their logging would be in the start of the verification log and would look something as:


2016-07-05T09:21:32.956+04:00 avvcbimage Info <16010>: vCenter 'ABC.vcloud.local' is 192.168.1.1


2016-07-05T09:21:32.956+04:00 avvcbimage Info <11981>: VM's host is ESXi.vcloud.local


2016-07-05T09:21:32.956+04:00 avvcbimage Info <11982>: VM's primary storage location is [Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx

So, to the error. To the very end of the verification log is the final moments of the process in my case were the errors:

2016-07-05T09:35:00.128+04:00 avvcbimage Info <19670>: vmAction runRemote()

2016-07-05T09:35:00.177+04:00 avvcbimage Info <19672>: vmAction powerOnVM()

2016-07-05T09:35:00.187+04:00 avvcbimage Info <17789>: Modifying VmxNet3 adapter: Network adapter 1 to not Connect at Power On

2016-07-05T09:35:00.213+04:00 avvcbimage Info <0000>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' setNics config at PowerOn task still in progress, sleep for 3 sec

2016-07-05T09:35:03.243+04:00 avvcbimage Info <14632>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' setNics config at PowerOn task completed, moref=

2016-07-05T09:35:03.266+04:00 avvcbimage Info <14629>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' Power On task queued, sleep for 1 sec

2016-07-05T09:35:04.287+04:00 avvcbimage Error <16006>: vSphere Task failed: 'The operation is not allowed in the current state.'.

2016-07-05T09:35:04.287+04:00 avvcbimage Error <14635>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' Power On task creation encountered a problem

2016-07-05T09:35:04.287+04:00 avvcbimage Warning <19673>: PowerOnVM failed or cancelled

2016-07-05T09:35:04.287+04:00 avvcbimage Info <19684>: vmAction cleanupVM() DeletingVM=0

2016-07-05T09:35:04.287+04:00 avvcbimage Info <19685>: vmAction poweroffVM()

2016-07-05T09:35:04.311+04:00 avvcbimage Info <0000>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' Power Off task still in progress, sleep for 3 sec

2016-07-05T09:35:07.345+04:00 avvcbimage Error <16006>: vSphere Task failed: 'The attempted operation cannot be performed in the current state (Powered off).'.

2016-07-05T09:35:07.345+04:00 avvcbimage Error <14635>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' Power Off task creation encountered a problem

2016-07-05T09:35:07.345+04:00 avvcbimage Info <19686>: vmAction deleteVM()

2016-07-05T09:35:07.387+04:00 avvcbimage Info <0000>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' deletion task still in progress, sleep for 3 sec

2016-07-05T09:35:10.416+04:00 avvcbimage Info <14632>: VM '[Local-DS-1] VDP_VERIFICATION_xyz_1467724891189/VDP_VERIFICATION_xyz_1467724891189.vmx' deletion task completed, moref=

2016-07-05T09:35:10.416+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, ABV failed (wrap-up stage)

2016-07-05T09:35:10.416+04:00 avvcbimage Error <19702>: ABV failed

2016-07-05T09:35:10.419+04:00 avvcbimage Info <16038>: Final summary, cancelled/aborted 0, snapview 0, exitcode 170: completed with errors, client log should be examined

So here, the restore was done successfully and the network adapter is always disconnected for the verification VM to avoid IP conflict.

The there was several tries done to Power On the virtual machine and all of them failed. Since the Power On was not completed the Power off failed as well.

The step to verify the heartbeat is excluded since the virtual machine was not powered On which led to the final state, delete the VM which was completed successfully.

That's pretty much it in the verification logs. This was not sufficient to find a cause, which led me to implement the next couple of tests:

1. For this verification job, I changed the destination host and datastore. Basically, I am doing the restore on a different host and a different datastore and it went through successfully. So something was either wrong with the host or the datastore.

2. So I changed the datastore location to the old path and the host still on the new one. The verification job completed successfully again. And when edited the job back to the old host, it failed with the same error.

So something is going on with this host! So we need to troubleshoot on the host level.

From the vobd.log during this time, I saw the following:

2016-07-06T14:01:49.787Z: [UserWorldCorrelator] 3011315291947us: [vob.uw.core.dumped] /bin/hostd(2038251) /var/core/hostd-zdump.003

2016-07-06T14:04:26.406Z: [UserWorldCorrelator] 3011471909126us: [vob.uw.core.dumped] /bin/hostd(2043706) /var/core/hostd-zdump.000

2016-07-06T14:08:35.096Z: [UserWorldCorrelator] 3011720596785us: [vob.uw.core.dumped] /bin/hostd(2099166) /var/core/hostd-zdump.001

2016-07-06T14:11:12.313Z: [UserWorldCorrelator] 3011877811665us: [vob.uw.core.dumped] /bin/hostd(2849623) /var/core/hostd-zdump.002

2016-07-06T14:13:48.795Z: [UserWorldCorrelator] 3012034293301us: [vob.uw.core.dumped] /bin/hostd(2040079) /var/core/hostd-zdump.003

So here the hostd daemon on the host has crashed and a zdump is created. So the hostd and hostd-worker threads were in a crash state and hence I rebooted the host.

After a reboot the hostd and worker thread were not in an inconsistent state anymore allowing me to perform verification tasks without any issues on this host.

There can be multiple causes for backup verification failure. Well this is one of them!

↧

Web Client Login Page Displays vRA or vCAC As The Banner Name

July 15, 2016, 10:35 am

≫ Next: Creating A Local User And Granting Shell Access In ESXi 6.0

≪ Previous: Automatic/Manual Backup Verification Fails In VDP

So in 6.0, you should have VMware vCenter Single Sign On as the Web Client banner page login name. However, if you apply the Branding Name for the vRA appliance the web client banner display gets renamed. In some cases, the change occurs even when the branding is not selected. This is a bug in vRA and is discussed in this link here.

Now I am not a vRA guy and that is left to the handful in my organization. However, the case where I worked on was something similar:

Customer had 5.5 vCenter with vCAC installed. The vCAC was discarded and the vCenter was upgraded to 6.0. When he logged in to web client, instead of the Single Sign On banner it displayed vCloud Automation Center. This was indeed confusing and needed a fix.

Upon installation of vRA/vCAC the parameter vmwSTSBrandName gets populated with the banner image information for the vRA. Upon removal of this product this parameter is not cleared leading to the issue. This had to be removed from the vmdir for all the tenants present under the Identity Manager.

Before you perform the workaround, please have a snapshot and/or a backup of the vCenter machine.

1. Download Jxplorer by clicking this link here
2. Login to the PSC machine from Jxplorer using this link here
3. Expand Services > Identity Manager > Tenants
4. Click the tenant and switch to Table Editor View
5. In the table editor view, locate the filed called vmwSTSBrandName. This will be populated with a value as displayed in the below screenshot

4. Right click this attribute and select Delete.
5. Click Submit
6. If this is a Windows Server go to services.msc and restart the VMware STS Service. If it requests to restart the dependent services, click Ok.
7. If it is an appliance restart the STS identity manager service using the below commad:

# service vmware-sts-idmd restart

8. Reload the web client page. Once the vmwSTSBrandName attribute is empty it will display VMware vCenter Single Sign On by default.

↧

Creating A Local User And Granting Shell Access In ESXi 6.0

July 22, 2016, 1:07 am

≫ Next: VDP Automatic Backup Verification Fails With VMware Tools Heartbeat Timeout

≪ Previous: Web Client Login Page Displays vRA or vCAC As The Banner Name

Written by Suhas Savkoor

So, in ESXi 6.0 onward, if you login to ESXi directly from vSphere client you do not have the option to specify Shell Access when you are creating a local user. The screen that you will see when creating a new user here is:

If you create this user and login to the Putty, you get the message saying Access denied.

The access.conf file should be updated automatically once the users are created and since it does not, perhaps due to security enhancements, there is a need of little tweaking that needs to be done.

Note:Please test this in your lab before you implement this in production. All the steps were implemented in a non production environment.

What you need to do is:

1. Create the user locally from the above wizard

2. Login to SSH for that ESXi host

3. Change the directory to:

# cd /etc/security

4. You will have a file called access.conf file. (Backup the file before editing) Open this file with a vi editor.

# vi access.conf

The contents look like below:

5. You need to add your user here in the format

+:<username>:ALL

6. Save the file

7. Restart the SSH session.

8. Now you can login to your ESXi host with the local user.

This user has shell access but not the root access. If I run any command to list the details of the devices connected to this host it displays the following:

Well that's pretty much it.

↧

VDP Automatic Backup Verification Fails With VMware Tools Heartbeat Timeout

August 3, 2016, 12:50 pm

≫ Next: VDP Backup Fails: PAX stream append at sector offset 128 length 128 sectors=65536 failed (32)

≪ Previous: Creating A Local User And Granting Shell Access In ESXi 6.0

Written by Suhas Savkoor

So recently I was working on a colleague's case where the backup verification, both manual and automatic were failing with VMware tools timeout. Now, there were multiple restore points being verified here and only one of them was failing. The VDP admin guide for verification states the below:

"Set the heartbeat timeout interval to its optimal value, depending on the environment. Note that some VMs may take longer to send and receive the VMware Tools heartbeat than others"

On the verification job logs, the following was observed. ">> " Marks the explanation of the event:

2016-07-14T21:52:10.541-02:00 avvcbimage Info <19672>: vmAction powerOnVM()
>> The VM power On function is called

2016-07-14T21:52:10.549-02:00 avvcbimage Info <17787>: Modifying E1000 adapter: Network adapter 1 to not Connect at Power On
>> The network adapter is disconnected on power on to avoid IP conflict

2016-07-14T21:52:17.665-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power On task completed, moref=
>> VM is powered ON

2016-07-14T21:52:17.665-02:00 avvcbimage Info <19674>: vmAction waitforToolRunning()
>> Waiting for heartbeat verification

2016-07-14T21:57:22.177-02:00 avvcbimage Warning <19675>: wait for toolrunning is timed out or cancelled
>> Verification timed out

2016-07-14T21:57:22.178-02:00 avvcbimage Info <19685>: vmAction poweroffVM()
>> power off VM function is called

2016-07-14T21:57:25.225-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power Off task completed, moref=
>> power off completed

2016-07-14T21:57:25.225-02:00 avvcbimage Info <19686>: vmAction deleteVM()
>> delete VM function is called

2016-07-14T21:57:28.295-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' deletion task completed, moref=
>> VM is deleted successfully

2016-07-14T21:57:28.311-02:00 avvcbimage Error <19702>: ABV failed
>> Automatic backup verification failed.

So, there is a parameter that we need to add in the avvcbimageAll.cmd file to increase the heartbeat wait timeout for the appliance.

The steps would be:

1. Open a SSH to the VDP appliance
2. Browse to the following directory

# usr/local/avamarclient/var/avvcbimageAll.cmd

3. Open this file in a vi editor and add the following parameter:

--validate_script_max_in_min=20

4. Save the file and restart the avagent daemon using the below command:

service avagent-vmware restart

Post this, the VDP appliance will wait for 20 minutes for VM tools heartbeat to be received before it could time out.
If the tools validation complete before 20 minutes, then it will proceed to the next steps. So setting the timeout to a large value will not have any performance impact on the VMs which are not facing the VM tools timeout during verification.

Simple, yes!

↧

VDP Backup Fails: PAX stream append at sector offset 128 length 128 sectors=65536 failed (32)

August 16, 2016, 1:55 am

≫ Next: VDP Backup Fails: Failed To Download VM Metadata

≪ Previous: VDP Automatic Backup Verification Fails With VMware Tools Heartbeat Timeout

Written by Suhas Savkoor

So I have ran into this issue a couple of times now and was able to finally drill down to a solution, and so thought of sharing this as there is nothing about this issue anywhere on the web.

In the few scenarios where I came across this issue, some were due to expanding the VMDK causes the backup to fail, backup for VM with a disk greater than 2 TB fails and sometimes without any reason it fails continuously.

This failure occurs prominently when:

1) VMDK is larger than 2 TB.
2) VMDK is not an integral multiple of 1 MB.
3) VMDK has changes in the last extent of the VMDK

Let's see what time means a little down the line.

To start off, when you open the backup job log to see for a failure, you will see the following:
Location:

/usr/local/avamarclient/var

2016-07-25T20:50:48.091+04:00 avvcbimage Info <14700>: submitting pax container in-use block extents:
2016-07-25T20:50:48.221+04:00 avvcbimage Info <6688>: Process 9604 (/usr/local/avamarclient/bin/avtar) finished (code 176: fatal signal)
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] PAX stream append ([Store_1] test/test.vmdk) at sector offset 128 length 128 sectors=65536 failed (32)
2016-07-25T20:50:48.221+04:00 avvcbimage Warning <6690>: CTL workorder "Test-backup-1469491200006" non-zero exit status 'code 176: fatal signal'
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, PAX stream append failed (wrap-up stage)
2016-07-25T20:50:48.221+04:00 avvcbimage Info <40654>: isExitOK()=157
2016-07-25T20:50:48.221+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9746>: bytes submitted was 65536
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::endfile(virtdisk-flat.vmdk,65536) returned problem:32
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::enddir returned problem:32
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.222+04:00 avvcbimage Info <12032>: backup (Store_1] test/test.vmdk) terminated 3145728.00 MB
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157
2016-07-25T20:50:48.222+04:00 avvcbimage Error <0000>: [IMG1003] Backup of [Store_1] test/test.vmdk failed
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Close: Close disk.

The avtar.log for this backup job in the same directory has the following back trace.

2016-08-08T20:42:17.180+04:00 avtar FATAL <5889>: Fatal signal 11 in pid 28769
> 2016-08-08T20:42:17.186+04:00 [avtar] FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000b3abe1
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000b3b957
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000b3bc7b
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000b3bd6e
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000a86530
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00007f9388482850
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000ac7364
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006762ef
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000678363
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006c8db8
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006cc683
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006cd083
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000009a5b76
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000009ad0ed
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000612bce
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006c1492
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000006c36b6
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000004cfcfb
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000004d6d4e
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000004d82e0
> 2016-08-08T20:42:17.186+04:00 [avtar] | 0000000000a88129
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00000000004ea627
> 2016-08-08T20:42:17.186+04:00 [avtar] | 00007f93864b1c36
> 2016-08-08T20:42:17.186+04:00 [avtar] | 000000000048a029
> 2016-08-08T20:42:17.186+04:00 [avtar] ERROR: <0001> uapp::handlefatal: aborting program pid=28769, sig=11
> 2016-08-08T20:42:17.186+04:00 avtar FATAL <5890>: handlefatal: Aborting program with code 176, pid=28769, sig=11

So I had one disk for this VM which was expanded beyond 2 TB and when converted into MB it was 3397386.24 MB (3.24 TB). In plain words, the format in MB is fractional and not a whole number and hence it is not an integral multiple of 1.
When your VDP sees the disk as a fractional value and beyond 2 TB, the backup fails. This is a known issue in the avTar version running on VDP.

To find your current avTar version you can run the following;

> This command from root of the VDP SSH


status.dpn

> These two commands from admin mode only of VDP SSH

gsan --version

mcserver.sh --version

A VDP 6.1 has an avTar version of 7.2 with a varying minor build.

How do we resolve this?

Step 1: For a quick backup to make sure everything is saved for this VM. Not recommended as a permanent solution.

Disable CBT on the VM, so that incremental backups are disabled for this VM and all backups are always full backup. If you do not have a restore point for a long time and would badly need one, then you can implement this. Else, if you have a good known restore point, then this is not required as it unnecessarily consumes space. To disable CBT you can follow this VMware KB article here

Step 2: Recommended. Make sure a good backup is available before this.

Here, you will have to locate the sizes of all your disks for that VM and convert them to MB. If the converted value in MB is a whole number, then that disk is good to go.

If not, like in my case, where the drive was 3397386.24 MB, then we will have to round off this drive to the next largest whole number. Here it would be 3397387 MB. You can do this from the edit settings of the virtual machine. There is no need of expanding the drive from within the guest. This is done on the VMDK level only so that the VDP can look at this drive as an integral multiple of 1.

Post this, you can have incremental backups running.

This is acknowledged as a known issue and the fix is in the future release of VDP. Release version and ETA are unknown currently.

If you have questions before implementation, let me know.

Thanks!

↧

VDP Backup Fails: Failed To Download VM Metadata

August 18, 2016, 2:42 pm

≫ Next: Installation Of SQL Agent For VDP Fails.

≪ Previous: VDP Backup Fails: PAX stream append at sector offset 128 length 128 sectors=65536 failed (32)

Written by Suhas Savkoor

A virtual machine backup was continuously failing with the error: VDP: Miscellaneous error.

When you look at the backup job log at the location:

# cd /usr/local/avamarclient/var

2016-08-18T08:24:59.099+07:00 avvcbimage Warning <40722>: The VM 'Win2012R2/' could not be located on the datastore:[FreeNAS-iSCSI] Win2012R2/Win2012R2.vmx
2016-08-18T08:24:59.099+07:00 avvcbimage Warning <40649>: DataStore/File Metadata download failed.
2016-08-18T08:24:59.099+07:00 avvcbimage Warning <0000>: [IMG0016] The datastore from VMX '[FreeNAS-iSCSI] Win2012R2/Win2012R2.vmx' could not be fully inspected.
2016-08-18T08:24:59.128+07:00 avvcbimage Info <0000>: initial attempt at downloading VM file failed, now trying by re-creating the HREF, (CURL) /folder/Win2012R2%2fWin2012R2.vmx?dcPath=%252fDatacenter&dsName=FreeNAS-iSCSI, (LEGACY) /fold
er/Win2012R2%2FWin2012R2.vmx?dcPath=%252FDatacenter&dsName=FreeNAS%252DiSCSI.

2016-08-18T08:25:29.460+07:00 avvcbimage Info <40756>: Attempting to Download file:/folder/Win2012R2%2FWin2012R2.vmx?dcPath=%252FDatacenter&dsName=FreeNAS%252DiSCSI

2016-08-18T08:25:29.515+07:00 avvcbimage Warning <15995>: HTTP fault detected, Problem with returning page from HTTP, Msg:'SOAP 1.1 fault: SOAP-ENV:Server [no subcode]

"HTTP Error"

Detail: HTTP/1.1 404 Not Found

2016-08-18T08:25:59.645+07:00 avvcbimage Warning <40650>: Download VM .vmx file failed.

2016-08-18T08:25:59.645+07:00 avvcbimage Error <17821>: failed to download vm metadata, try later

2016-08-18T08:25:59.645+07:00 avvcbimage Info <9772>: Starting graceful (staged) termination, failed to download vm metadata (wrap-up stage)

2016-08-18T08:25:59.645+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation or pre/post snapshot script failed

2016-08-18T08:25:59.645+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation/pre-script/post-script failed

2016-08-18T08:25:59.645+07:00 avvcbimage Info <40654>: isExitOK()=157

Here we see that the backup is failing as it is unable to gather the vmx file details of the client it is trying to backup. It is unable to gather the vmx details is because it is unable to locate the client on the datastore.

This issue occurs when the virtual machine files or the folder itself are manually moved to a different folder.

If you look at the below screenshot, you see my virtual machine is under a folder called "do not power on"

Now, if you have a look the actual vmx file of this Windows2012R2 machine you see it is different.

From this, my virtual machine was initially under the FreeNAS-iSCSI / VM folder / VM Files and then this was moved to the "do not power on" folder. If you just move the virtual machine, the vmx file will not be updated with the new path. VDP during the backup will look at this path of the virtual machine and will fail because we do not have the vmx file in this location.

To fix this, you will have to remove the virtual machine from the inventory and re-add it back to the inventory. Doing this will update the vmx file location as seen below. Once the correct location is populated you can run the backups for this virtual machine.

To register VM to inventory you can follow this KB article here.

That's pretty much it!

↧

Installation Of SQL Agent For VDP Fails.

August 22, 2016, 2:27 pm

≫ Next: Expired Certificate On VDP Port 29000

≪ Previous: VDP Backup Fails: Failed To Download VM Metadata

Written by Suhas Savkoor

When trying to install the SQL agent for vSphere Data Protection the agent installation fails with the error:

"VMware VDP for SQL Server Setup Wizard ended prematurely"

Here, under Installation directory:\Program files, there will be a directory called avp that would be created. This folder will have the logs which will contain the details regarding the failure. However, with the failure and roll back of the agent installation, this folder is removed automatically. So, the logs are lost. To gather the logs, we will have to redirect the msi installer logs to a different location.

To perform this run the below command:

msiexec /i "C:\MyPackage\Example.msi" /L*V "C:\log\example.log"

The /i parameter will launch the MSI package. After the installation is finished, the log is complete.

Once the installation now fails, the above log file will have the logging for the cause of failure.
In my case, the following was noticed:

Property(S): UITEXT_SUPPORT_DROPPED = This release of the VMware VDP client software does not support the operating system version on this computer. Installing the client on this computer is not recommended.

The "EMC Avamar Compatibility and Interoperability Matrix" on EMC Online Support at https://support.EMC.com provides client and operating system compatibility information.

So the failure here is due to unsupported Guest OS. The SQL version was 2014 which was very well supported. This SQL was running on a Windows 2012 R2 box.

From the VDP admin guide (Page 162), we see that SQL 2012 R2 is unsupported Guest OS for the client:

From the EMC Avamar Compatibility matrix (Page 38) we see this again is not supported.

So, if you see such issues please verify if the Agent being installed is compatible with the Guest OS release.

That's pretty much it!

↧

Expired Certificate On VDP Port 29000

September 16, 2016, 1:28 am

≫ Next: Configuring Data Domain Virtual Edition And Connecting It To A VDP Appliance

≪ Previous: Installation Of SQL Agent For VDP Fails.

Written by Suhas Savkoor

So recently I was working on a case where the certificate had expired on Port 29000 for VDP 6.1.2. The port 29000 is used for replication data, this can be seen on page 148 of this admin guide here. Now, on the same admin guide, on page 46 it talks about replacing the certificate for the appliance, however, this is applicable on the Web management page for Port 8543 which is for your vdp-configure page.

If you still go to " https://vdp-IP:29000 " you will still see the certificate as expired as below:

How do we get this certificate replaced? Well, here are the steps:

Here we are using self signed certificates and the replacing is also done with a new self signed certificate:

1. Login to the VDP appliance as admin

2. Type the below command:

# cd ~

3. Use the below command to generate a new self signed certificate:

# openssl req -x509 -newkey rsa:3072 -keyform PEM -keyout avamar-1key.pem -nodes -outform PEM -out avamar-1cert.pem -subj "/C=US/ST=California/L=Irvine/O=Dell EMC, Inc./OU=Avamar/CN=vdp-hostname.customersite.com" -days 3654

Replace vdp-hostname.customersite.com with your VDP appliance hostname

The " days " parameter can be altered to set the certificate validity as per your requirement.

The above command generates a SHA1 certificate. If you would like to generate a SHA256 certificate, then the command would be:

# openssl req -x509 -sha256 -newkey rsa:3072 -keyform PEM -keyout avamar-1key.pem -nodes -outform PEM -out avamar-1cert.pem -subj "/C=US/ST=California/L=Irvine/O=Dell EMC, Inc./OU=Avamar/CN=vdp-hostname.customersite.com" -days 3654

4. The key and certificate will be written to the /home/admin directory and called "avamar-1key.pem" and "avamar-1cert.pem", respectively.

5. Follow Page 38 of this Avamar security Guide to perform the replace operation of certificates.

6. Post restarting the gsan service as per the security guide, you can now go to " https://vdp-IP:29000 " and verify that the certificate is now renewed.

That's pretty much it. Should be helpful if you receive warnings during security scans against VDP.

↧