← Back to team overview

mahara-contributors team mailing list archive

[Bug 1293272] [NEW] Feature: Detect spam using heuristics

 

Public bug reported:

We put in place a new user probation system that blocks spammers from
posting links, images, or URLs to the forums and other publicly
accessible areas. This has greatly reduced the amount of spam on
mahara.org. However, in the past week there have been three posts by
spammers posting a web domain without a protocol, i.e.
"bugs.launchpad.com" instead of "http://bugs.launchpad.com";

I avoided adding non-linked domain names to the new user probation code,
because it becomes a lot more difficult to match those in a robust way
(spammers will, for instance, replace periods with hyphens or spaces,
etc) and I didn't want to get too many false positives.

But if this trend continues, we'll need to add something eventually.
Pretty much the only way to deal with these more subtly constructed spam
messages is to detect them using heuristics. That has a high rate of
false positives, so you'd want to combine it with a heuristics detection
system in most cases.

So we've got the following things under this bug:

1. Get a heuristic spam detection system (ideally an external library)

2. Put possibly spammy forum posts into a moderation queue -- this
requires implementing a forum moderation queue, which should be a
separate bug

3. A moderation queue doesn't make sense for wall posts, comments, and
personal messages. Perhaps block a message if it triggers heuristic
detection and the user is on probation?

** Affects: mahara
     Importance: Wishlist
     Assignee: Aaron Wells (u-aaronw)
         Status: Confirmed


** Tags: mahara.org spam

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1293272

Title:
  Feature: Detect spam using heuristics

Status in Mahara ePortfolio:
  Confirmed

Bug description:
  We put in place a new user probation system that blocks spammers from
  posting links, images, or URLs to the forums and other publicly
  accessible areas. This has greatly reduced the amount of spam on
  mahara.org. However, in the past week there have been three posts by
  spammers posting a web domain without a protocol, i.e.
  "bugs.launchpad.com" instead of "http://bugs.launchpad.com";

  I avoided adding non-linked domain names to the new user probation
  code, because it becomes a lot more difficult to match those in a
  robust way (spammers will, for instance, replace periods with hyphens
  or spaces, etc) and I didn't want to get too many false positives.

  But if this trend continues, we'll need to add something eventually.
  Pretty much the only way to deal with these more subtly constructed
  spam messages is to detect them using heuristics. That has a high rate
  of false positives, so you'd want to combine it with a heuristics
  detection system in most cases.

  So we've got the following things under this bug:

  1. Get a heuristic spam detection system (ideally an external library)

  2. Put possibly spammy forum posts into a moderation queue -- this
  requires implementing a forum moderation queue, which should be a
  separate bug

  3. A moderation queue doesn't make sense for wall posts, comments, and
  personal messages. Perhaps block a message if it triggers heuristic
  detection and the user is on probation?

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1293272/+subscriptions


Follow ups

References