mahara-contributors team mailing list archive
-
mahara-contributors team
-
Mailing list archive
-
Message #28602
[Bug 1487274] [NEW] Elasticsearch choking on non-ASCII characters
Public bug reported:
In 15.10 I've added code to "quarantine" records that Elasticsearch
won't index. That is, if Elasticsearch errors out while processing a
batch of records, then I re-try each record individually. And if it
errors out while processing one of those individual records, I mark the
record as quarantined, and keep it in the search_elasticsearch_queue
table.
I've backported that to one of our large 15.04 sites, and since then
I've taken a look at the data in the records that have caused
Elasticsearch to choke. They all contain non-ASCII characters, i.e.
Unicode characters. These can be as simple as "e with an accent over
it", all the way up to exotic ones like emoji and the Unicode snowman.
I was not able to replicate this when testing on my local machine, but
it is certainly in place on our production servers, and bugs such as Bug
1408577 make me think it's probably also present on some other servers
as well.
** Affects: mahara
Importance: High
Status: Confirmed
** Affects: mahara/1.10
Importance: High
Status: Confirmed
** Affects: mahara/1.9
Importance: High
Status: Confirmed
** Affects: mahara/15.04
Importance: High
Status: Confirmed
** Affects: mahara/15.10
Importance: High
Status: Confirmed
** Tags: elasticsearch i18n search unicode
** Also affects: mahara/15.04
Importance: Undecided
Status: New
** Also affects: mahara/15.10
Importance: High
Status: Confirmed
** Also affects: mahara/1.10
Importance: Undecided
Status: New
** Also affects: mahara/1.9
Importance: Undecided
Status: New
** Changed in: mahara/15.04
Milestone: None => 15.04.4
** Changed in: mahara/1.9
Milestone: None => 1.9.9
** Changed in: mahara/1.10
Milestone: None => 1.10.7
** Changed in: mahara/15.04
Status: New => Confirmed
** Changed in: mahara/1.9
Status: New => Confirmed
** Changed in: mahara/1.10
Status: New => Confirmed
** Changed in: mahara/15.04
Importance: Undecided => High
** Changed in: mahara/1.9
Importance: Undecided => High
** Changed in: mahara/1.10
Importance: Undecided => High
--
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274
Title:
Elasticsearch choking on non-ASCII characters
Status in Mahara:
Confirmed
Status in Mahara 1.10 series:
Confirmed
Status in Mahara 1.9 series:
Confirmed
Status in Mahara 15.04 series:
Confirmed
Status in Mahara 15.10 series:
Confirmed
Bug description:
In 15.10 I've added code to "quarantine" records that Elasticsearch
won't index. That is, if Elasticsearch errors out while processing a
batch of records, then I re-try each record individually. And if it
errors out while processing one of those individual records, I mark
the record as quarantined, and keep it in the
search_elasticsearch_queue table.
I've backported that to one of our large 15.04 sites, and since then
I've taken a look at the data in the records that have caused
Elasticsearch to choke. They all contain non-ASCII characters, i.e.
Unicode characters. These can be as simple as "e with an accent over
it", all the way up to exotic ones like emoji and the Unicode snowman.
I was not able to replicate this when testing on my local machine, but
it is certainly in place on our production servers, and bugs such as
Bug 1408577 make me think it's probably also present on some other
servers as well.
To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions
Follow ups
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2018-04-05
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2018-03-06
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2017-12-29
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Kristina Hoeppner, 2017-10-22
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2017-09-18
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2017-09-18
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Kristina Hoeppner, 2017-03-26
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Kristina Hoeppner, 2017-03-26
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-12-29
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-12-29
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Kristina Hoeppner, 2016-12-12
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-12-11
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-10-24
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-10-21
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-10-21
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-10-20
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-08-08
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-08-08
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-07-10
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-07-10
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-07-07
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-06-09
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-06-08
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-05-01
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-05-01
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-04-28
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2016-03-22
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Aaron Wells, 2015-11-26
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Aaron Wells, 2015-10-27
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2015-10-18
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2015-10-06
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Robert Lyon, 2015-10-06
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Aaron Wells, 2015-09-30
-
[Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
From: Aaron Wells, 2015-08-21