Tuesday 17 February 2015

Test recovery with EMC Recovery Point SRA and SRM 5.0 fails with the error: Failed to recover the datastore

Symptoms : ========= Attempting to perform Test Failover fails with the error: Error - Failed to recover datastore 'XX_HDXX_LUNXX_T4'. VMFS volume residing on recovered devices "5431", "5432", "5433" and expected to be auto-mounted during HBA rescan cannot be found Started the replication via SRM the lun got successfully mounted. Checked and found that the problem was with EMC replication. The dr-vmware.log file showed the error Failed to recover datastore 'VNX-MP-LUN07-TIER3'. VMFS volume residing on recovered devices '"60:06:01:60:07:E0:22:00:20:DD:9E:42:CF:0F:E1:11"' and expected to be auto-mounted during HBA rescan cannot be found

Logs : 

=======
2012-01-06T23:06:54.263Z cpu14:114970)WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 4ef4e35b-1442b43c-038c-0025b5000c19: No 2012-01-06T23:06:54.271Z cpu14:114970)FSS: 4333: No FS driver claimed device 'snap-50d5bf4f-4ef4e359-709ff89d-a083-0025b5000c19': Not supported 2012-01-06T23:06:54.274Z cpu14:114970)VC: 1449: Device rescan time 38 msec (total number of devices 12) 2012-01-06T23:06:54.274Z cpu14:114970)VC: 1452: Filesystem probe time 64 msec (devices probed 9 of 12) 2012-01-06T23:06:54.292Z cpu11:2059)ScsiDeviceIO: 2316: Cmd(0x4124415c3540) 0x9e, CmdSN 0xcbc2 to dev "naa.50060160bce010a750060160bce010a7" failed H:0x0 2012-01-06T23:06:54.292Z cpu11:2059)ScsiDeviceIO: 2316: Cmd(0x4124415c3540) 0x25, CmdSN 0xcbc3 to dev "naa.50060160bce010a750060160bce010a7" failed H:0x0 2012-01-06T23:06:54.294Z cpu11:2059)ScsiDeviceIO: 2316: Cmd(0x4124415c3540) 0x28, CmdSN 0xcbcb to dev "naa.6006016007e02200608991b2ad2de111" failed H:0x0 2012-01-06T23:06:54.294Z cpu14:114970)Partition: 484: Read of GPT header failed on "naa.6006016007e02200608991b2ad2de111": I/O error 2012-01-06T23:06:54.294Z cpu11:2059)ScsiDeviceIO: 2316: Cmd(0x4124415c3540) 0x28, CmdSN 0xcbcc to dev "naa.6006016007e02200608991b2ad2de111" failed H:0x0 2012-01-06T23:06:54.294Z cpu14:114970)WARNING: Partition: 944: Partition table read from device naa.6006016007e02200608991b2ad2de111 failed: I/O error 2012-01-06T23:06:54.345Z cpu14:114970)HBX: 676: Setting pulse [HB state abcdef02 offset 3764224 gen 1 stampUS 177206183150 uuid 4f04ca84-1f05f6a1-c310-d8d 2012-01-06T23:06:54.345Z cpu14:114970)WARNING: FSAts: 1263: Denying reservation access on an ATS-only vol 'VNX-HP-LUN01-TIER2' 2012-01-06T23:06:54.345Z cpu14:114970)WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 4ef4e35b-1442b43c-038c-0025b5000c19: No 2012-01-06T23:06:54.352Z cpu14:114970)FSS: 4333: No FS driver claimed device 'snap-50d5bf4f-4ef4e359-709ff89d-a083-0025b5000c19': Not supported 2012-01-06T23:06:54.376Z cpu14:114970)Vol3: 647: Couldn't read volume header from control: Invalid handle 2012-01-06T23:06:54.376Z cpu14:114970)FSS: 4333: No FS driver claimed device 'control': Not supported 2012-01-06T23:06:54.396Z cpu14:114970)VC: 1449: Device rescan time 41 msec (total number of devices 12) Resolution :
============ Our initial workaround was to disable the single VAAI Primitive VMFS3.Hardware.AcceleratedLocking on the hosts. From EMC Had to change the affected hosts from Failover Mode 1 to Failover Mode 4 (within the Unisphere Storage System Connectivity Status menu). Once the hosts were changed to Failover Mode 4, we re-enabled the VAAI Primitive and no longer encountered the issue.

Unable to power on/off the VM Error : msg.checkpoint.cpufeaturecheck.fail

There are many situations where in we have encountered issues of unable to power of the virtual machine. Below is one of the troubleshooting step which helped resolving the problem with the help extensive/elaborate logging of VMware Logs with which we identify the problem.

Symptoms : ========== Unable to power on/off the virtual machine Unable to revert to the previous snapshot Unable to unregister/register the host to different host. Purpose : ========= To power on the virtual machine Cause :
VMware.logs: ============= vmx| [msg.checkpoint.cpufeaturecheck.fail] The features supported by the processor(s) in this machine are different from the features supported by the processor(s) in the machine on which the checkpoint was saved. Please try to resume the snapshot on a machine where the processors have the same features
Resolution : ============ Uncommented the checkpoints in the vmx file and was successfully able to power on the virtual machine.

Vmotion stucks at In progress status & Unable to connect to the virtual center service + tomcat service utilizing high memory

So In this particular troubleshooting we were facing issues performing VMotion of a VM, In the course of troubleshooting we identified that though the problem was with the database size, the was an underlying issue with the tomcat service which lead to the problem of vmotion.

Symptoms / Troubleshooting Performed :
==============================
    Unable to connect to the Virtual Center / Unable to perform Vmotion
    Tried to truncate the database. Followed following KB : 1003980
    Tried to stop the service for Virtual center : Unable to stop the service.
    Tried to reboot the virtual center server : successfull.
    Unable to start the virtual center service.
    Tried to check the size of the database = 50GB
    Found that the Microsoft SQL has exceeded the maximum limit.
    Followed following Articles : 1007453, 1000125- Got 525 GB of free space.
    Was successfully able to connect the virtual center.

Purpose :
=======
     Was unable to perform Vmotion.
     Checked the task manager an found that tomat service was utilizing the high memory.

Cause :
======
    Tomat service was utilizing the high memory.

 Resolution :
=========
     Increased the memory of tomcat service from 256MB to 1024 and was successfully able to perform VMotion. 

Unable to Ping the Virtual machine "Could not find the file specified" While starting the IPSEC services"

Symptoms :
========== Unable to ping the Virtual machine from another VM in the same subnet. Unable to ping the default gateway. Purpose :
========== To be able to connect to the Network. To be able to connect to the domain Network. Cause :
======== - Checked the Eventviewer Logs and found that we were getting error messages related to IPSEC service. - Checked the services.msc and found that the IPSEC service was not started. - Error : "Could not find the file specified" While starting the IPSEC services.
Resolution : ===========
  • To resolve this issue, followed these steps:
  • Rebuild a new local policy store. To do this, Click Start, click Run, type regsvr32 polstore.dll in the Open box, and then click OK.
  • Verify that the IPSEC Services component is set to automatic, and then restart the domain controller.
  • Restarted the Virtual machine and was able to connect to the network.
More Information : 
===================
Followed following Knowledgebase : http://support.microsoft.com/kb/912023

Error "failed: Unable to create a VSS snapshot of the source volume. Error code 2147754774 (0x80042316)"

Symptoms :
==========
  • Error "failed: Unable to create a VSS snapshot of the source volume. Error code 2147754774(0x80042316)"
  • Convertor Logs shows following error : "a general system error occurred.Found dangling SSL error"
  • Conversion fails at 1%
Purpose :
=========
  •     To be able to convert from Physical Machine to Virtual Machine.
  •     To be able to convert from Virtual Machine to Virtual Machine.
Resolution :
============
  • Placed the computer in clean boot state, disabled firewall, tried the conversion process again same issue.
  • Tried to perform the cold cloning by attaching the virtual machine cd drive to the standalone convertor : Error no network driver found.
  • Checked and found that the problem was related to the VSS service.
  • Restarted the VSS service : same issue
  • Tried to perform backup through Net backup utility : unable to take back up.
  • Found following knowledgebase from the Microsoft website to fix the VSS : http://support.microsoft.com/kb/940184
  • Installed and ran fixit , and was successfully able to start and complete the conversion process.

Duplicate IP error on a VM Unable to connect to Network

Symptoms :
=========
  •     Unable to connect the Virtual machine to the Network
Purpose :
=======
  •     To get the Virtual machine to the network.
Cause :
======
  •     Only Windows 2008 machines were getting the duplicate IP errors.
  •     Tried resetting TCP/IP stack.
  •     Tried providing different static IP : same problem
Resolution :
=========
When this problem occurs, the ProxyArp device responds to all ARP requests.
To work around this problem, we can turn off gratuitous ARP by setting the value of the ARPRetryCount registry entry to 0. To do this, follow these steps.

1.  Click Start , type regedit in the Start Search box, and then press ENTER.
2.  Locate the following registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
3.  On the Edit menu, point to New , and then click DWORD Value .
4.  Type ArpRetryCount .
5.  Right-click the ArpRetryCount registry entry, and then click Modify .
6.  In the Value data box, type 0 , and then click OK .
7.  Exit Registry Editor.
8. Rebooted the machine

See below link :
http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2networking/thread/d7bda315-6366-4e0a-bdcf-dc875ff6963e

Conversion fails with the error: failed: Unable to create a VSS snapshot of the source volume

Symptoms :
========
  •     Cannot perform Physical to Virtual or Virtual to Virtual conversion.
  •     Conversion fails at 1%
  •     You see the error:
Error :
=====
  •  Unable to create a VSS snapshot of the source volume. Error code 2147754774 (0x80042316)
Resolution
========
  • This issue may occur when there are problems with Volume Shadow Copy Service (VSS) on the Windows operating system.
  • To resolve this issue, isolate the issue using the NT Backup Utility. For more information, see Using the Windows NT Backup Utility.
  • If the NT Backup utility fails to resolve this issue and you continue to see VSS errors, see the Microsoft Knowledge Base Article 940184.

VirtualCenter Server service does not start after performing security hardening

Symptoms :
========
You cannot start the VirtualCenter Server service if youdisable the managed object browser by setting this element in the vpxd.cfg file:

<enableDebugBrowse>false<enableDebugBrowse/>

Note: Page 83 of the vSphere 4.1 Hardening Guide provides instructions to disable the managed object browser.

Cause :
======
This issue occurs because the element described in the vSphere 4.1 Hardening Guide  seems incorrect.

The vpxd.cfg file becomes invalid due to the tag(/) at the end of the element <enableDebugBrowse>false<enableDebugBrowse/>.

Resolution :
=========

  • To resolve this issue:   
  • Change the parameter to:  <enableDebugBrowse>false</enableDebugBrowse>
  • Restart the VirtualCenter Server Service.

Unable to open web console for vCloud Director externally


Symptoms :
=========
  •     Unable to open the vCloud web console from any other computer in the same network.
Purpose :
=======
  •     To be able open the vCloud web console externally.
Cause :
======
  •     Stopped the services for the iptables to isolate the problem.
  •     After stopping the IPtables we were successfully able to open the web console externally.
  •     IPtables had rules which restricted the connection.
Resolution :
=========
  •     IPTables Rules are stored in /etc/sysconfig/iptables. 
  •     After un-commiting the reject rules we flushed iptables to get the default rules.

Cannot ping or RDP a Windows 7 VM with Wake on LAN enabled

Symptoms :
=========

When Wake on LAN (WoL) is enabled on a Windows 7 guest operating system, you may experience these symptoms:
  • You cannot ping the virtual machine
  • Remote Desktop Protocol (RDP) connections to the virtual machine fail
Cause :
======
  • This issue occurs because Windows 7 guest operating systems do not wake on dedicated packets that are created using the ping command.
Resolution :
=========
  • To resolve this issue, disable WoL for the virtual network adapter in Windows 7.
  • For more information about WoL, see Power Management for Network Devices in Windows 7.



Configuring Microsoft Cluster Service fails with the Error: "Validate SCSI-3 Persistent Reservation"

Symptoms :
=========

When attempting to set up a Microsoft Cluster, you receive an error in the Failover Cluster Validation Report during the validation step.

You see the error:
  • One or more tests indicate that the configuration is not suitable for clustering.
  • In the Failover Cluster Validation Report, you see errors similar to:
  • Successfully put PR reserve on cluster disk 0 from node VIRTUAL_MACHINE_HOSTNAME.DOMAINNAME while it should have failed
  • Cluster Disk 0 does not support persistent reservations.

Cause :
======
This issue can be caused by one of these situations:
  • If you are attempting to set up a Cluster in a Single Host, and the shared disk (or disks) for the cluster is RDM (Raw Device Mapping, or Raw Disk Mapping), in Physical Compatibility Mode. If this is the case, you can either change the disks to Virtual Compatibility mode, or if you wish to use Physical Compatibility Mode, the cluster nodes must be on different hosts.
  • If the storage array is not configured correctly to support SCSI-3 compliant commands.
  • A third-party plug-in is interfering with the SCSI reservation process (such as the PowerPath/VE PSA).
  • If virtual machines are running under the MSCS configuration on two separate ESX/ESXi hosts, and the .vmdk files are on a VMFS datastore which is configured as physical bus sharing mode. (This configuration of CAB is unsupported). For more information, see the Cluster Virtual Machines on One Physical Host section of the Setup for Failover Clustering and Microsoft Cluster Service Guide.
Resolution :
=========
If you are attempting to create a cluster which is not one of these:
  • Cluster on a Single Host with one or more shared eagerzeroedthick virtual disks
  • Cluster on a Single Host with one or more shared RDMs in Virtual Compatibility mode
  • Cluster across Hosts with one or more shared RDMs in Physical Compatibility mode
  • Then you are not configuring a MSCS cluster with a supported configuration. For more/related information, see Microsoft Clustering on VMware vSphere: Guidelines for Supported Configurations (1037959) and Microsoft Cluster Service (MSCS) support on ESX/ESXi (1004617). It is critical that your configuration be set up consistent with the requirements specified in these articles.

If you are having issues with one of the supported MSCS configurations, resolve the issue with these steps:


     1. Remove all RDMs from the Windows 2008 virtual servers:
  • Power off the virtual machine.
  • Right-click the virtual machine and click Edit Settings.
  • Take note of the RDM mapping.
  • Remove the RDM hard disk from the virtual machine and click OK.
     2. Contact the storage vendor and make sure that the array has been configured correctly to support SCSI-3 compliant commands.
  • Improvements in failover clusters require that the storage respond correctly to specific SCSI commands. The storage must follow the SPC-3 standard. In particular, the storage must support Persistent Reservations as specified in the SPC-3 standard.
     3. Add RDMs in physical compatibility (pass-through) mode or virtual compatibility (non-pass-through) mode. VMware recommends physical compatibility mode. For more information, see the Add Hard Disks to the First Node for Clusters Across Physical Hosts section of the Setup for Failover Clustering and Microsoft Cluster Service guide.

     4.  Ensure that relevant affinity/anti affinity rule is configured in a DRS cluster to prevent the virtual machines from powering on an incorrect host.
Notes:
=====
  • Cluster in a box should be on the same host affinity rule.
  • Cluster across box should be on separate hosts anti affinity.
  • When adding RDMs, select Physical as the compatibility mode and select a new virtual device node. For example: select SCSI (1:0).
  • If these steps do not resolve the issue, check for the existence of a third-party plug-in such as PowerPath/VE to determine if it is causing the issue. If you believe the issue may be related to PowerPath/VE, contact EMC for further assistance.
  • In vSphere 5.x, clusters across physical machines with non-pass-through RDM is supported only for clustering with Windows Server 2003. It is not supported for clustering with Windows Server 2008.

Installing vCenter Server 4.1 on Windows 2008 R2 fails with the error: Error 1402. Could not open key: UNKNOWN...

Symptoms :
=========
  • Cannot install vCenter Server 4.1 on Windows 2008 R2
  • Installing vCenter Server 4.1 on Windows 2008 R2 fails
  • You see the error:
  • Error 1402. Could not open key: UNKNOWN..
Note: This issue may also occur with other VMware products, such as VMware Workstation.

Resolution :
========
  • This issue occurs due to insufficient or incorrect permissions on a certain registry key or its parent container.
Steps:
=====

To resolve this issue, apply appropriate permission to the Windows Registry.
  • Click Start > Run, type regedit, and click OK. The Registry Editor window opens.
  • Navigate to the parent container KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Components. 
  • Note: The S-1-5-18 key may be different on your computer.
  • Locate the correct container registry entry, right click the parent container registry key, such as C9AE13788D0B61F80AF18C3B9B1A1EE8, and click Permissions. Note, the relevant container registry entry causing the install issue can be identified in a fully detailed error message appended to the error outlined in the above Symptoms.
  • In the Permissions dialog, click Add and add the Administrator, Administrators and SYSTEM accounts.
  • Click each account in the list and ensure that it has Full Control = Allow and Read = Allow permissions.
  • Select the Replace all child object permissions with inheritable permissions from this object check box and click Apply.
  • In the Owner tab, click Administrators in the list, select the Replace owner on subcontainers and objects check box and click Apply.
  • Retry the vCenter Server 4.1 installation.

vMotion fails with following Error: "problem detected at CPUID level 0×80000001"

Symptoms :
=========
  • Migrating from, for example, esxhost1 to esxhost2, modifies the virtual machine’s configuration to preserve the CPU feature requirements for its guest operating system.
  • Performing a vMotion from one host to another within a cluster fails  
  • The vMotion check fails with this CPU mismatch 
  • Error: Unable to migrate from esxhost365x1 to esxhost365x2: The CPU of the host is incompatible with the CPU feature requirements of the virtual machine; problem detected at CPUID level 0×80000001 register ‘edx’
  • In vCenter Server, you may see an error that indicates that a mismatch has been detected.

Cause :
======
  • The issue can occur on IBM x3650 systems. 
Resolution :
==========
  • To resolve this issue, upgrade the BIOS firmware on the host.

Friday 13 February 2015

Unable to mount the CD/DVD-ROM drive inside a Windows VM

Symptoms :
=========
  • You are unable to see a DVD/CD-ROM drive after mounting
  • You are unable to install VMware Tools because the drive is not mountable
  • Reinstalling the drivers for the DVD/CD-ROM on the virtual machine does not resolve the issue
Resolution :
=========
  • This issue occurs due to a Microsoft registry problem.  
  • To resolve this issue, you must delete the UpperFilters and LowerFilters registry keys of the CD-ROM drive from the registry. 
For more information, see the Microsoft Knowledge Base article 929461.

Notes:
======
This article contains steps to modify the registry. However, serious problems might occur if you modify the registry incorrectly. Therefore, ensure to follow these steps carefully.
For additional safety, back up the registry before modifying it. For more information, see the Microsoft Knowledge Base article 322756.

SRM : Failed to add array manager for MirrorView SRA with error: Failed to connect to management system address while executing 'discoverArrays' command (2009074)

Symptoms :
==========
  • You cannot add an array manager when using MirrorView 1.4.0.16 SRA with vCenter Site Recovery Manger 4.0/4.1.
  • Attempting to configure the array manager results in the follow error occurs:
  • Failed to connect to management system address while executing 'discoverArrays' command
  • When checking the MirrorView SRA DR.log logfile the following errors can be seen:

Logs :
=====
[2011-11-04 14:33:01.730 02700 verbose 'SysCommandLineWin32'] Starting process: "C:\\Program Files (x86)\\VMware\\VMware vCenter Site Recovery Manager\\external\\perl-5.8.8\\bin\\perl.exe" "C:/Program Files (x86)/VMware/VMware vCenter Site Recovery Manager/scripts/SAN/MirrorView SRA/command.pl"
[2011-11-04 14:33:02.183 02700 trivia 'PrimarySanProvider'] discoverArrays's output:
MirrorViewSRACLI Start
[#2] MirrorViewSRACLI End
[#2] [001768][000796][11/04 14:33:01 995][MirrorViewSRACoreUtilities.cpp@001600 InitLogging                             ] XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[#2] [001768][000796][11/04 14:33:01 995][MirrorViewSRACoreUtilities.cpp@001601 InitLogging                             ] MirrorView SRA Version: 1.4.0.16
[#2] [001768][000796][11/04 14:33:01 995][DiscoverArrays.cpp            @000213 DiscoverArrays                          ] Enter
[#2] [001768][000796][11/04 14:33:01 995][MirrorViewSRACoreUtilities.cpp@000007 InitStorAPI                             ] Enter1
[#2] [001768][000796][11/04 14:33:01 995][MirrorViewSRACoreUtilities.cpp@000044 InitStorAPI                             ] Trying database file: C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\scripts\SAN\MirrorView SRA\symapi_db.bin
[#2] [001768][000796][11/04 14:33:02 042][MirrorViewSRACoreUtilities.cpp@000053 InitStorAPI                             ] StorInit suceeded for database file: C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\scripts\SAN\MirrorView SRA\symapi_db.bin
[#2] [001768][000796][11/04 14:33:02 042][MirrorViewSRACoreUtilities.cpp@000127 InitStorAPI                             ] Exit. hr:0x0
[#2] [001768][000796][11/04 14:33:02 042][MirrorViewSRACoreUtilities.cpp@000622 StoreCredentials                        ] Enter
[#2] [001768][000796][11/04 14:33:02 120][MirrorViewSRACoreUtilities.cpp@000674 StoreCredentials                        ] Calling StorAccessCredentialsDefine Host: bne01-spa Username: emceng
[#2] [001768][000796][11/04 14:33:02 120][MirrorViewSRACoreUtilities.cpp@000682 StoreCredentials                        ] Invalid ip address 'bne01-spa' or username 'emceng' specified. HRESULT:0xC00407D5
[#2] [001768][000796][11/04 14:33:02 120][DiscoverArrays.cpp            @000236 DiscoverArrays                          ] Exit. hr:0xc00407d5
[2011-11-04 14:33:02.183 02700 info 'PrimarySanProvider'] discoverArrays exited with exit code 0
[2011-11-04 14:33:02.183 02700 trivia 'PrimarySanProvider'] 'discoverArrays' returned <Response>
[#2]   <ReturnCode>2</ReturnCode>
[#2]   <ArrayList />
[#2] </Response>
[2011-11-04 14:33:02.183 02700 info 'PrimarySanProvider'] Return code for discoverArrays: 2
[2011-11-04 14:33:02.183 02700 error 'PrimarySanProvider'] The scripts returned an error, leaving the temporary file 'C:\Windows\TEMP\vmware-SYSTEM-1279426608\dr-sanprovider2700-0'
[2011-11-04 14:33:02.183 02700 error 'PrimarySanProvider'] Invalid array management system address reported by the script
[2011-11-04 14:33:02.183 02700 info 'ArrayManagerImpl.QueryInfoTask-Task'] Work function threw MethodFault: dr.san.fault.InvalidAddressFault

Cause :
======

This issue occurs when the SRM server is unable to resolve the hostname or FQDN of the Storage Manager of the array.

Resolution :
=========

To resolve this issue, when adding the array manager, use the IP addresses instead of the hostname or FQDN for the Storage Managers of the array to.

VMware SRM installation fails with error: Visual Studio C++ Redistributable could not be found

Symptoms:
=========
  • You are unable to install VMware Site Recovery Manager.
  • Installation of VMware Site Recovery Manager fails
  • You see this error: "Visual Studio C++ Redistributable could not be found".
Cause :
=========

This error appears when the installation wizard is not able to detect the Visual Studio C++ Redistributable application on the Windows machine.

Resolution :
==========

To resolve the issue, install the Visual C++ redistributable from VMware vCenter:
  1. Download the VMware Virtual Center from the VMware Download Center, selecting the appropriate version and build of your existing environment.
  2. Explore the Setup folder. Go to redist > vcredist and open the folder for the version of your Operating System.
  3. Double-click the .exe file to install the Visual C++ Redistributable.
After the Visual C++ redistibutable is installed, you can install VMware Site Recovery Manager.