← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1400477] Re: In Juno Cannot Create Spark Cluster From Horizon

 

** Changed in: horizon
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1400477

Title:
  In Juno Cannot Create Spark Cluster From Horizon

Status in OpenStack Dashboard (Horizon):
  Invalid
Status in OpenStack Data Processing (Sahara, ex. Savanna):
  Invalid

Bug description:
  Trying to instantiate a Spark 1.0.0 cluster, using “Data Processing”
  element under Horizon, and am having the following problems:

  1-      Security Group: Having problem with security groups, when
  either defining "Node Group Templates" or instantiating a cluster. In
  the first case, if I use an existing group, say "default", gui shows
  an error stating "Error Security group '2' not found". Sahara log file
  indicates the same thing:

     2014-12-03 23:41:24.144 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
      2014-12-03 23:41:24.146 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
      2014-12-03 23:41:24.163 30234 DEBUG urllib3.connectionpool [-] "GET /v2/926c31c887f441f6a4e4b8031b8cc528/os-security-groups HTTP/1.1" 200 682 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
      2014-12-03 23:41:24.165 30234 DEBUG sahara.utils.api [-] Validation Error occurred: error_code=400, error_message=Security group '2' not found, error_name=INVALID_REFERENCE bad_request /usr/local/lib/python2.7/dist-packages/sahara/utils/api.py:245
      2014-12-03 23:41:24.165 30234 INFO sahara.cli.sahara_all [-] 10.0.0.86 - - [03/Dec/2014 23:41:24] "POST /v1.1/926c31c887f441f6a4e4b8031b8cc528/node-group-templates HTTP/1.1" 400 221 0.063121
      2014-12-03 23:41:24.257 30234 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650
      2014-12-03 23:41:24.258 30234 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role _remove_auth_headers /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:707

  I can, temporarily, avoid the problem by selecting "Auto Security
  Group" option. This would allow for a node group to be created;
  however, I do not see any new security group, under Compute -> Access
  & Security. At any rate, this also fails during cluster instantiation:

  2014-12-03 23:55:06.285 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
  2014-12-03 23:55:06.286 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
  2014-12-03 23:55:06.409 30234 DEBUG urllib3.connectionpool [-] "POST /v2/926c31c887f441f6a4e4b8031b8cc528/servers HTTP/1.1" 400 116 _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:415
  2014-12-03 23:55:06.451 30234 ERROR sahara.service.ops [-] Error during operating cluster 'Sprk265' (reason: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400))
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops Traceback (most recent call last):
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/local/lib/python2.7/dist-packages/sahara/service/ops.py", line 113, in wrapper
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     f(cluster_id, *args, **kwds)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/local/lib/python2.7/dist-packages/sahara/service/ops.py", line 198, in _provision_cluster
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     INFRA.create_cluster(cluster)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 51, in create_cluster
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     self._create_instances(cluster)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 168, in _create_instances
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     self._run_instance(cluster, node_group, idx, aa_group=aa_group)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/local/lib/python2.7/dist-packages/sahara/service/direct_engine.py", line 305, in _run_instance
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     **nova_kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 883, in create
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     **boot_kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 546, in _boot
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     return_raw=return_raw, **kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/base.py", line 100, in _create
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     _resp, body = self.api.client.post(url, body=body)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 490, in post
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     return self._cs_request(url, 'POST', **kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 465, in _cs_request
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     resp, body = self._time_request(url, method, **kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 439, in _time_request
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     resp, body = self.request(url, method, **kwargs)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops   File "/usr/lib/python2.7/dist-packages/novaclient/client.py", line 433, in request
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops     raise exceptions.from_response(resp, body, url, method)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops BadRequest: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400)
  2014-12-03 23:55:06.451 30234 TRACE sahara.service.ops 
  2014-12-03 23:55:06.611 30234 INFO sahara.service.direct_engine [-] Cluster 'Sprk265' creation rollback (reason: Security group 6 not found for project 926c31c887f441f6a4e4b8031b8cc528. (HTTP 400))
  2014-12-03 23:55:06.616 30234 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): cloudctrl1.maas17
  2014-12-03 23:55:06.622 30234 DEBUG urllib3.connectionpool [-] Setting read timeout to None _make_request /usr/lib/python2.7/dist-packages/urllib3/connectionpool.py:375
  2014-12-03 23:55:07.018 30234 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:650
  2014-12-03 23:55:07.019 30234 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role _remove_auth_headers /usr/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py:707

  Here is my conf file:

  [DEFAULT]
  use_floating_ips=True
  use_neutron=True
   [keystone_authtoken]
  auth_uri = http://keystone1.maas17:5000/v2.0/
  identity_uri=http://keystone1.maas17:35357/
  admin_user=sahara
  admin_password=sahara
  admin_tenant_name=sahara
  periodic_enable=true
  plugins=vanilla,hdp,idh,spark,cdh
  [database]
  connection=mysql://sahara:sahara@mysql1.maas17/sahara

  Sahara is also registered with Keystone:

  # keystone service-list
  +----------------------------------+----------+-----------------+----------------------------+
  |                id                |   name   |       type      |        description         |
  +----------------------------------+----------+-----------------+----------------------------+
  | 75b2c466c35a44d5bbe7167c1ed38e20 |  cinder  |      volume     |   Cinder Volume Service    |
  | d5b014c2d96e4d619f9d9b8e646f0f5b |   ec2    |       ec2       |  EC2 Compatibility Layer   |
  | 8180061e79a24627be43485910a9e16a |  glance  |      image      |    Glance Image Service    |
  | 4dad7b5145c842a4a8fbbab1f158629c | keystone |     identity    | Keystone Identity Service  |
  | d833594550e14d49be4df4394886c849 |   nova   |     compute     |    Nova Compute Service    |
  | fe42b5328002433989ffd6b18414aacc | quantum  |     network     | Quantum Networking Service |
  | fac3f901841c4528b3902ae1b7265b4e |    s3    |        s3       | S3 Compatible object-store |
  | 12104e23db1441a28cb42cf8c9437139 |  sahara  | data_processing |  Data processing service   |
  +----------------------------------+----------+-----------------+----------------------------+
  See https://ask.openstack.org/en/question/55161/juno-sahara-spark-100-security-group-error/ for workarounds.

  2-      Spark Login: I am not able to login to the recommended Spark
  Image, i.e., http://sahara-files.mirantis.com/saha... . Launching this
  image either by itself for through Sahara/Data Processing, results in
  invalid user ubuntu:

  Generation complete.
   * Stopping Handle applying cloud-config[74G[ OK ]
   * Starting Hadoop namenode: 
  starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ubuntu.out
  open-vm-tools: not starting as this is not a VMware VM
  landscape-client is not configured, please run landscape-config.
   * Restoring resolver state...       [80G [74G[ OK ]
  chown: invalid user: 'ubuntu:ubuntu'
  chown: invalid user: 'ubuntu:ubuntu'
  rm: cannot remove '/tmp/in_target.d/post-install.d/20-spark': No such file or directory
   * Stopping System V runlevel compatibility[74G[ OK ]
   * Starting execute cloud user/final scripts[74G[ OK ]
  Cloud-init v. 0.7.5 running 'modules:final' at Thu, 04 Dec 2014 04:48:56 +0000. Up 28.44 seconds.
  2014-12-04 04:48:56,208 - util.py[WARNING]: Running ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
  ec2: 
  ec2: #############################################################
  ec2: -----BEGIN SSH HOST KEY FINGERPRINTS-----
  ec2: -----END SSH HOST KEY FINGERPRINTS-----
  ec2: #############################################################
  -----BEGIN SSH HOST KEY KEYS-----
  -----END SSH HOST KEY KEYS-----
  Cloud-init v. 0.7.5 finished at Thu, 04 Dec 2014 04:48:56 +0000. Datasource DataSourceNone.  Up 28.55 seconds
  2014-12-04 04:48:56,237 - cc_final_message.py[WARNING]: Used fallback datasource

  Ubuntu 14.04.1 LTS ubuntu ttyS0

  ubuntu login:

  Sahara log file shows repeated failed login attempts:

  2014-12-04 05:01:40.099 30327 DEBUG sahara.service.engine [-] Can't login to node sprk265-worker-002 (10.0.200.59), reason error: [Errno 110] Connection timed out _wait_until_accessible /usr/local/lib/python2.7/dist-packages/sahara/service/engine.py:110
  2014-12-04 05:01:40.116 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-worker-001] _execute_command took 127.5 seconds to complete _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
  2014-12-04 05:01:40.117 30327 DEBUG sahara.service.engine [-] Can't login to node sprk265-worker-001 (10.0.200.56), reason error: [Errno 110] Connection timed out _wait_until_accessible /usr/local/lib/python2.7/dist-packages/sahara/service/engine.py:110
  2014-12-04 05:01:44.644 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-worker-003] Executing "ls .ssh/authorized_keys" _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
  2014-12-04 05:01:45.015 30327 DEBUG sahara.utils.ssh_remote [-] [sprk265-controller-001] Executing "ls .ssh/authorized_keys" _log_command /usr/local/lib/python2.7/dist-packages/sahara/utils/ssh_remote.py:459
  2014-12-04 05:01:45.141 30327 DEBUG sahara.openstack.common.periodic_task [-] Running periodic task SaharaPeriodicTasks.update_job_statuses run_periodic_tasks /usr/local/lib/python2.7/dist-packages/sahara/openstack/common/periodic_task.py:193

  Note that I am able to launch other ubuntu images using the same key
  pair

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1400477/+subscriptions


References