yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #25363
[Bug 1400477] Re: In Juno Cannot Create Spark Cluster From Horizon
** Also affects: sahara
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1400477
Title:
In Juno Cannot Create Spark Cluster From Horizon
Status in OpenStack Dashboard (Horizon):
New
Status in OpenStack Data Processing (Sahara, ex. Savanna):
New
Bug description:
Trying to instantiate a Spark 1.0.0 cluster, using “Data Processing”
element under Horizon, and am having the following problems:
1- Security Group: Having problem with security groups, when
either defining "Node Group Templates" or instantiating a cluster. In
the first case, if I use an existing group, say "default", gui shows
an error stating "Error Security group '2' not found". Sahara log file
indicates the same thing:
2014-12-03 23:41:24.144 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
2014-12-03 23:41:24.146 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-12-03 23:41:24.163 30234 DEBUG urllib3.connectionpool [-] "GET /v2/926c31c887f441f6a4e4b8031b8cc528/os-security-groups HTTP/1.1" 200 682 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-12-03 23:41:24.165 30234 DEBUG sahara.utils.api [-] Validation Error occurred: error_code=400, error_message=Security group '2' not found, error_name=INVALID_REFERENCE bad_request /usr/local/lib/python2.7/dist-packages/sahara/utils/api.py:245
2014-12-03 23:41:24.165 30234 INFO sahara.cli.sahara_all [-] 10.0.0.86 - - [03/Dec/2014 23:41:24] "POST /v1.1/926c31c887f441f6a4e4b8031b8cc528/node-group-templates HTTP/1.1" 400 221 0.063121
2014-12-03 23:41:24.257 30234 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650
2014-12-03 23:41:24.258 30234 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role _remove_auth_headers /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:707
I can, temporarily, avoid the problem by selecting "Auto Security
Group" option. This would allow for a node group to be created;
however, I do not see any new security group, under Compute -> Access
& Security. At any rate, this also fails during cluster instantiation:
2014-12-03 23:55:06.285 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
2014-12-03 23:55:06.286 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-12-03 23:55:06.409 30234 DEBUG urllib3.connectionpool [-] "POST /v2/926c31c887f441f6a4e4b8031b8cc528/servers HTTP/1.1" 400 116 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
2014-12-03 23:55:06.451 30234 ERROR sahara.service.ops [-] Error during operating cluster 'Sprk265' (reason: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400))
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops Traceback (most recent call last):
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/local/lib/python2.7/dist-packages/sahara/service/ops.py", line 113, in wrapper
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops f(cluster_id, *args, **kwds)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/local/lib/python2.7/dist-packages/sahara/service/ops.py", line 198, in _provision_cluster
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops INFRA.create_cluster(cluster)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 51, in create_cluster
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops self._create_instances(cluster)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 168, in _create_instances
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops self._run_instance(cluster, node_group, idx, aa_group=aa_group)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 305, in _run_instance
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops **nova_kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 883, in create
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops **boot_kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 546, in _boot
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops return_raw=return_raw, **kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 100, in _create
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops _resp, body = self.api.client.post(url, body=body)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 490, in post
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops return self._cs_request(url, 'POST', **kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 465, in _cs_request
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops resp, body = self._time_request(url, method, **kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 439, in _time_request
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops resp, body = self.request(url, method, **kwargs)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 433, in request
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops raise exceptions.from_response(resp, body, url, method)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops BadRequest: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400)
2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops
2014-12-03 23:55:06.611 30234 INFO sahara.service.direct_engine [-] Cluster 'Sprk265' creation rollback (reason: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400))
2014-12-03 23:55:06.616 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
2014-12-03 23:55:06.622 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
2014-12-03 23:55:07.018 30234 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650
2014-12-03 23:55:07.019 30234 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role _remove_auth_headers /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:707
Here is my conf file:
[DEFAULT]
use_floating_ips=True
use_neutron=True
[keystone_authtoken]
auth_uri = http://keystone1.maas17:5000/v2.0/
identity_uri=http://keystone1.maas17:35357/
admin_user=sahara
admin_password=sahara
admin_tenant_name=sahara
periodic_enable=true
plugins=vanilla,hdp,idh,spark,cdh
[database]
connection=mysql://sahara:sahara@mysql1.maas17/sahara
Sahara is also registered with Keystone:
# keystone service-list
+----------------------------------+----------+-----------------+----------------------------+
| id | name | type | description |
+----------------------------------+----------+-----------------+----------------------------+
| 75b2c466c35a44d5bbe7167c1ed38e20 | cinder | volume | Cinder Volume Service |
| d5b014c2d96e4d619f9d9b8e646f0f5b | ec2 | ec2 | EC2 Compatibility Layer |
| 8180061e79a24627be43485910a9e16a | glance | image | Glance Image Service |
| 4dad7b5145c842a4a8fbbab1f158629c | keystone | identity | Keystone Identity Service |
| d833594550e14d49be4df4394886c849 | nova | compute | Nova Compute Service |
| fe42b5328002433989ffd6b18414aacc | quantum | network | Quantum Networking Service |
| fac3f901841c4528b3902ae1b7265b4e | s3 | s3 | S3 Compatible object-store |
| 12104e23db1441a28cb42cf8c9437139 | sahara | data_processing | Data processing service |
+----------------------------------+----------+-----------------+----------------------------+
See https://ask.openstack.org/en/question/55161/juno-sahara-spark-100-security-group-error/ for workarounds.
2- Spark Login: I am not able to login to the recommended Spark
Image, i.e., http://sahara-files.mirantis.com/saha... . Launching this
image either by itself for through Sahara/Data Processing, results in
invalid user ubuntu:
Generation complete.
* Stopping Handle applying cloud-config[74G[ OK ]
* Starting Hadoop namenode:
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ubuntu.out
open-vm-tools: not starting as this is not a VMware VM
landscape-client is not configured, please run landscape-config.
* Restoring resolver state... [80G [74G[ OK ]
chown: invalid user: 'ubuntu:ubuntu'
chown: invalid user: 'ubuntu:ubuntu'
rm: cannot remove '/tmp/in_target.d/post-install.d/20-spark': No such file or directory
* Stopping System V runlevel compatibility[74G[ OK ]
* Starting execute cloud user/final scripts[74G[ OK ]
Cloud-init v. 0.7.5 running 'modules:final' at Thu, 04 Dec 2014 04:48:56 +0000. Up 28.44 seconds.
2014-12-04 04:48:56,208 - util.py[WARNING]: Running ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
ec2:
ec2: #############################################################
ec2: -----BEGIN SSH HOST KEY FINGERPRINTS-----
ec2: -----END SSH HOST KEY FINGERPRINTS-----
ec2: #############################################################
-----BEGIN SSH HOST KEY KEYS-----
-----END SSH HOST KEY KEYS-----
Cloud-init v. 0.7.5 finished at Thu, 04 Dec 2014 04:48:56 +0000. Datasource DataSourceNone. Up 28.55 seconds
2014-12-04 04:48:56,237 - cc_final_message.py[WARNING]: Used fallback datasource
Ubuntu 14.04.1 LTS ubuntu ttyS0
ubuntu login:
Sahara log file shows repeated failed login attempts:
2014-12-04 05:01:40.099 30327 DEBUG sahara.service.engine [-] Can't login to node sprk265-worker-002 (10.0.200.59), reason error: [Errno 110] Connection timed out _wait_until_accessible /usr/local/lib/python2.7/dist-packages/sahara/service/engine.py:110
2014-12-04 05:01:40.116 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-worker-001] _execute_command took 127.5 seconds to complete _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
2014-12-04 05:01:40.117 30327 DEBUG sahara.service.engine [-] Can't login to node sprk265-worker-001 (10.0.200.56), reason error: [Errno 110] Connection timed out _wait_until_accessible /usr/local/lib/python2.7/dist-packages/sahara/service/engine.py:110
2014-12-04 05:01:44.644 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-worker-003] Executing "ls .ssh/authorized_keys" _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
2014-12-04 05:01:45.015 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-controller-001] Executing "ls .ssh/authorized_keys" _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
2014-12-04 05:01:45.141 30327 DEBUG sahara.openstack.common.periodic_task [-] Running periodic task SaharaPeriodicTasks.update_job_statuses run_periodic_tasks /usr/local/lib/python2.7/dist-packages/sahara/openstack/common/periodic_task.py:193
Note that I am able to launch other ubuntu images using the same key
pair
To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1400477/+subscriptions
References