HPE SimpliVity
1753701 Members
5046 Online
108799 Solutions
New Discussion

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

 
fahlis
Frequent Advisor

New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

Hi!
I started a new deployment for a customer today.
2-Node-10GB switch connected.
vCenter is upgraded to 6.7 U2C
The nodes came with OmniStack 3.7.9 so of course I upgraded to 3.7.10 before deploy.
SPP 20190903 installed on both nodes.
All deployment manager test steps OK
Deployment of first node stalls with the message "ERROR:step 9 of 34 - Failed to register host in datacenter"
I noticed that before above error it was at 62% "Running: Wait for Hypervisor to boot"  for a long time.
Closed it and tried to deploy second node which stalled with exactly the same error.
I should mention that I added the new SVT Cluster into the existing Datacenter and disabled admission control for the current cluster as stated in "Post-Deployment-Checklist"
Any solutions at hand ?

Br / Tony
14 REPLIES 14
gustenar
HPE Pro

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

Hello @fahlis 

Did you capture the deployment manager logs? Could you post the output of orchestrator.log a little before the error and after? 

Is vCenter able to resolve the DNS name of this host properly? After you factory reset and have the IP assigned make sure you can reach it from vCenter by IP and DNS.


I am an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DaveOb
HPE Pro

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

Can you manually add the ESXi by either DNS or hostname to Vcenter,this may give you a clue as to the problem ,

Can you ping the ESXI ?   at this point the hypervisor shoud be booted and reachable.

If you connect to the Virtual console via ILO has the ESXi booted sucessfully,

 


I am an HPE employee
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
fahlis
Frequent Advisor

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@gustenar  and @DaveOb 

Thanks for your suggestions.
I found out that the DNS Server differed on the Deployment Manager workstation vs. vCenter and ESXi hosts. Could this perhaps be the reason?

I have looked briefly through the orchestrator logs. Noted this.

2019-11-26 14:53:22,845Z INFO main [c.s.d.o.Orchestrator] examineResult(Orchestrator.java:1163) - Done with Standard Input File ready to deploy 1 OmniCubes
2019-11-26 14:53:22,850Z INFO main [c.s.d.o.Orchestrator] decryptPasswords(Orchestrator.java:531) - decrypting one or more passwords
2019-11-26 14:53:22,856Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:417) - Decrypting password for hostCredential
2019-11-26 14:53:23,383Z DEBUG main [c.s.d.o.Orchestrator] decrypt(DeployCrypt.java:96) - Failed to decrypt a password - exception javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
2019-11-26 14:53:23,385Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:425) - Failed to decrypt hostCredential
2019-11-26 14:53:23,386Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:417) - Decrypting password for hostCredential
2019-11-26 14:53:23,387Z DEBUG main [c.s.d.o.Orchestrator] decrypt(DeployCrypt.java:96) - Failed to decrypt a password - exception javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
2019-11-26 14:53:23,388Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:425) - Failed to decrypt hostCredential
2019-11-26 14:53:23,388Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:417) - Decrypting password for hostCredential
2019-11-26 14:53:23,389Z DEBUG main [c.s.d.o.Orchestrator] decrypt(DeployCrypt.java:96) - Failed to decrypt a password - exception javax.crypto.BadPaddingException: Given final block not properly padded. Such issues can arise if a bad key is used during decryption.
2019-11-26 14:53:23,389Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:425) - Failed to decrypt hostCredential
2019-11-26 14:53:23,389Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:417) - Decrypting password for hostCredential
2019-11-26 14:53:23,393Z INFO main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:421) - Password for hostCredential decrypted
2019-11-26 14:53:23,397Z DEBUG main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:417) - Decrypting password for svtcliCredential
2019-11-26 14:53:23,398Z INFO main [c.s.d.o.Orchestrator] decryptCredential(Orchestrator.java:421) - Password for svtcliCredential decrypted
2019-11-26 14:53:23,399Z DEBUG main [c.s.d.o.Orchestrator] decryptPasswords(Orchestrator.java:602) - Decrypted all passwords
2019-11-26 14:53:23,399Z INFO main [c.s.d.o.Orchestrator] examineResult(Orchestrator.java:1163) - Done with Standard Input File ready to deploy 1 OmniCubes
2019-11-26 14:53:23,503Z INFO main [c.s.h.f.SessionFactory] getSystemTypeFromSvtConfig(SessionFactory.java:524) - Unable to determine hypervisor type based on config file.

Maybe I need to use a more simplified host password during deployment?

2019-11-26 15:09:48,544Z INFO OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] info(DeployLogger.java:214) - step 9 of 34 (70.56%) - Find or create a datacenter
2019-11-26 15:09:48,589Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.n.s.VMwareHostnameVerifier] verify(VMwareHostnameVerifier.java:84) - Verifying host '172.21.0.11'
2019-11-26 15:09:48,601Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.n.s.VMwareHostnameVerifier] verify(VMwareHostnameVerifier.java:102) - Verifying host '172.21.0.11' - peer cert Subject DN 'C=US, CN=vCenter.company.com'
2019-11-26 15:09:48,601Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.n.s.VMwareHostnameVerifier] verify(VMwareHostnameVerifier.java:106) - Retrieved cert SubjectAltNames: [vCenter.company.com]
2019-11-26 15:09:48,602Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.n.s.VMwareHostnameVerifier] matchIpAddressByDnsName(VMwareHostnameVerifier.java:223) - Found address '172.21.0.11' via DNS lookup of subjectAltName 'vCenter.company.com/172.21.0.11'
2019-11-26 15:09:48,603Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.n.s.VMwareHostnameVerifier] matchIpAddressByDnsName(VMwareHostnameVerifier.java:227) - SubjectAltName 'vCenter.company.com' matches host '172.21.0.11'
2019-11-26 15:09:49,080Z INFO OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] info(DeployLogger.java:214) - step 9 of 34 (70.58%) - Successful obtained finger print from Host - took 0.418 seconds
2019-11-26 15:09:49,139Z INFO OmniStack-172.19.2.15 [c.s.h.h.v.VMWareSessionImpl] disconnect(VMWareSessionImpl.java:572) - Disconnecting from 'https://172.19.2.15/sdk' ...
2019-11-26 15:09:49,146Z INFO OmniStack-172.19.2.15 [c.s.h.h.v.VMWareSessionImpl] disconnect(VMWareSessionImpl.java:574) - Disconnected
2019-11-26 15:09:49,150Z DEBUG OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] debug(DeployLogger.java:224) - step 9 of 34 (70.58%) - Controller ID: c11bc2b1-94be-47ca-b2bf-1c13f509053b
2019-11-26 15:09:49,150Z INFO OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] info(DeployLogger.java:214) - step 9 of 34 (70.58%) - Creating a cluster in datacenter - SVT_Clu01
2019-11-26 15:09:49,216Z INFO pool-2-thread-1 [c.s.h.HVALTask] watchdogTask(HVALTask.java:301) - Scheduling watchdog to fire in 1800s
2019-11-26 15:09:53,508Z ERROR pool-2-thread-1 [c.s.h.h.v.t.c.ControllerAddHostToCluster] addHostToCluster(ControllerAddHostToCluster.java:177) - HVALTaskAbortException occurred while adding host 172.19.2.15 to cluster SVT_Clu01 on datacenter CompanyDC
com.simplivity.hval.exceptions.HVALTaskAbortException: Failed to add host to cluster: A general system error occurred: Unable to push CA certificates and CRLs to host 172.19.2.15
at com.simplivity.hval.hms.vmware.tasks.controller.ControllerAddHostToCluster.addHostToCluster(ControllerAddHostToCluster.java:156)
at com.simplivity.hval.hms.vmware.tasks.controller.ControllerAddHostToCluster.performTask(ControllerAddHostToCluster.java:117)
at com.simplivity.hval.HVALTask.call(HVALTask.java:171)
at com.simplivity.hval.HVALTask.call(HVALTask.java:70)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2019-11-26 15:09:53,537Z WARN OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] warn(DeployLogger.java:136) - step 9 of 34 (70.75%) - Warning: Failed to add host - Host name: 172.19.2.15, Datacenter: CompanyDC, Cluster: SVT_Clu01
2019-11-26 15:09:53,538Z ERROR OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] error(DeployLogger.java:98) - step 9 of 34 (70.75%) - Failed to register host in datacenter - Datacenter: CompanyDC Cluster: SVT_Clu01
2019-11-26 15:09:53,539Z ERROR OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] rememberLowestLogLevel(Orchestrator.java:240) - Setting lower log level
2019-11-26 15:09:53,540Z INFO OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] info(DeployLogger.java:180) - step 9 of 34 (100.00%) - Getting host by ip: 172.19.2.15
2019-11-26 15:09:53,548Z DEBUG OmniStack-172.19.2.15 [c.s.h.h.v.VMWareHostImpl] checkObject(VMWareUtil.java:234) - com.vmware.vim25.mo.HostSystem object is null or empty.
2019-11-26 15:09:53,554Z ERROR OmniStack-172.19.2.15 [c.s.d.o.Orchestrator] error(DeployLogger.java:98) - step 9 of 34 (100.00%) - An exception occurred registering the host in the datacenter
com.simplivity.hval.exceptions.HVALUnknownHostException: Exception found while retrieving Host object
at com.simplivity.hval.hms.vmware.VMWareHostImpl.fetchHostObject(VMWareHostImpl.java:665)
at com.simplivity.hval.hms.vmware.VMWareHostImpl.checkObject(VMWareHostImpl.java:5123)
at com.simplivity.deploy.orchestrator.DeployActions.getHvalHost(DeployActions.java:199)
at com.simplivity.deploy.orchestrator.DeployActions.addHostToHms(DeployActions.java:1739)
at com.simplivity.deploy.orchestrator.CustomerDeploy.runSteps(CustomerDeploy.java:122)
at com.simplivity.deploy.orchestrator.DeployExecutor.run(DeployExecutor.java:496)
at java.lang.Thread.run(Unknown Source)
Caused by: com.simplivity.hval.exceptions.HVALHmsObjectNotFoundException: com.vmware.vim25.mo.HostSystem object is null or empty.
at com.simplivity.hval.hms.vmware.VMWareUtil.checkObject(VMWareUtil.java:235)
at com.simplivity.hval.hms.vmware.VMWareUtil.checkObject(VMWareUtil.java:254)
at com.simplivity.hval.hms.vmware.VMWareHostImpl.fetchHostObject(VMWareHostImpl.java:654)
... 6 common frames omitted
2019-11-26 15:09:53,987Z INFO main [c.s.d.o.Orchestrator] deployAndLog(Orchestrator.java:1117) - Deployed 0 OmniCubes
2019-11-26 15:09:53,989Z DEBUG main [c.s.d.o.Orchestrator] waitForThriftTerminationToBeServiced(Orchestrator.java:692) - waiting for thrift server thread
2019-11-26 15:09:54,046Z DEBUG pool-1-thread-1 [c.s.d.o.Orchestrator] terminate(OrchestratorServiceHandler.java:116) - terminate
2019-11-26 15:09:54,116Z DEBUG main [c.s.d.o.Orchestrator] auditThriftInterfacePollingResultForThriftServiceTermination(Orchestrator.java:718) - thrift server terminated
2019-11-26 15:09:54,118Z WARN Thread-1 [o.a.t.s.TThreadPoolServer] execute(TThreadPoolServer.java:230) - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: socket closed
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:134)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60)
at org.apache.thrift.server.TThreadPoolServer.execute(TThreadPoolServer.java:185)
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:175)
at com.simplivity.deploy.orchestrator.Orchestrator.thriftServer(Orchestrator.java:367)
at com.simplivity.deploy.orchestrator.Orchestrator.access$000(Orchestrator.java:63)
at com.simplivity.deploy.orchestrator.Orchestrator$1.run(Orchestrator.java:398)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketException: socket closed
at java.net.DualStackPlainSocketImpl.accept0(Native Method)
at java.net.DualStackPlainSocketImpl.socketAccept(Unknown Source)
at java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.net.PlainSocketImpl.accept(Unknown Source)
at java.net.ServerSocket.implAccept(Unknown Source)
at java.net.ServerSocket.accept(Unknown Source)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:129)
... 8 common frames omitted

Hope that helps for more suggestions.

Br / Tony

AnkiN
Valued Contributor

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@fahlis 
>What is the license type for the vCenter?

>How many nodes are there in the existing HPE SimpliVity federation?

gustenar
HPE Pro

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@fahlis The DNS could definitely be a problem. 

After the failure, did you try adding the host to vCenter manually using IP or FQDN? That can tell us if there is a problem reaching out. It looks like the problem occurred when trying to transfer the certificates:

2019-11-26 15:09:53,508Z ERROR pool-2-thread-1 [c.s.h.h.v.t.c.ControllerAddHostToCluster] addHostToCluster(ControllerAddHostToCluster.java:177) - HVALTaskAbortException occurred while adding host 172.19.2.15 to cluster SVT_Clu01 on datacenter CompanyDC
com.simplivity.hval.exceptions.HVALTaskAbortException: Failed to add host to cluster: A general system error occurred: Unable to push CA certificates and CRLs to host 172.19.2.15

Another test you could do is run Deployment manager from another machine with same DNS servers and see if it does better. 

Let me know. 

Gus. 

 


I am an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
fahlis
Frequent Advisor

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@AnkiN 

 

vCenter server 6 Standard License

No existing federation, this is a new 2-node 10Gb switch connected deployment.

Br / Tony

fahlis
Frequent Advisor

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@gustenar 

I am actually deploying from the arbiter server, and I have changed the DNS to the same as vCenter / ESXi.

Will try again tomorrow but need to factory reset first of course.

Br / Tony

gustenar
HPE Pro

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@fahlis Ok let us know how it goes.


I am an HPE employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
fahlis
Frequent Advisor

Re: New Deployment of OmniStack 3.7.10 stalls at step 9 of 34

@gustenar @AnkiN @DaveOb 

Troubleshooted further today.

Found out that the customer had not added reverse DNS for the hosts so nslookup from vCSA CLI did not work.

Fixed that myself, factory reset again and still the same. issue

Followed your suggestions and tried to add one host to the cluster manually which gave this error message.

"A General system error occured: Unable to push CA certificates and CRLs to host esx1.company.com"

Further troubleshooting and found this link and proposed solution.
https://communities.vmware.com/thread/619169

Joining new hosts failed with certificate issues - I was getting certificate issues when trying to join NEW hosts to a new host cluster in this datacenter in vSphere.  There is a vCenter setting (vCenter -> Configure -> Settings -> Advanced Settings -> vpxd.certmgmt.mode) with a default value of 'vmca', and VMware support had changed the value to 'thumbprint' which then allowed the new hosts to join the cluster using their default certificates (these were newly installed ESXi 6.7 hosts).  Once they were added successfully, this setting was changed back to its default 'vmca'.

 

So I have change to 'thumbprint' and are ready to redeploy first host.
I do believe this is the solution.

 

Just one thing first.

By accident I upgraded iLO to v 1.45 from the iLO Repository.
Don't know if that's the reason for the servers are rebooting very slow now.
Am I required to downgrade to v 1.44 ?
if so I will try that from SPP ISO boot and interactive mode Force.

 

Br / Tony