← Back to team overview

launchpad-reviewers team mailing list archive

[Merge] ~ikoruk/launchpad:user-agent into launchpad:master

 

Yuliy Schwartzburg has proposed merging ~ikoruk/launchpad:user-agent into launchpad:master.

Commit message:
Adding blocked user agents to apache for mainsite and API

This is specifically to block Bytedance from scraping LP degrading performance

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~ikoruk/launchpad/+git/launchpad/+merge/470982

The blocked user agents should be "Bytespider|Bytedance"
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of ~ikoruk/launchpad:user-agent into launchpad:master.
diff --git a/charm/launchpad-appserver/config.yaml b/charm/launchpad-appserver/config.yaml
index b82a3d1..31aed04 100644
--- a/charm/launchpad-appserver/config.yaml
+++ b/charm/launchpad-appserver/config.yaml
@@ -12,6 +12,11 @@ options:
     description: >
       Cognitive Services subscription key for the Bing Custom Search API.
     default:
+  blocked_user_agents:
+    type: string
+    description: >
+      User agents that should be blocked from Launchpad, separated by '|'.
+    default:
   csrf_secret:
     type: string
     description: >
diff --git a/charm/launchpad-appserver/reactive/launchpad-appserver.py b/charm/launchpad-appserver/reactive/launchpad-appserver.py
index 879dbe4..5446487 100644
--- a/charm/launchpad-appserver/reactive/launchpad-appserver.py
+++ b/charm/launchpad-appserver/reactive/launchpad-appserver.py
@@ -331,6 +331,12 @@ def deconfigure_vhost():
     remove_state("launchpad.vhost.configured")
 
 
+@when("config.changed.blocked_user_agents")
+def reconfigure_blocked_user_agents():
+    remove_state("launchpad.vhost.configured")
+    remove_state("launchpad.api-vhost.configured")
+
+
 @when("api-vhost-config.available", "service.configured")
 @when_not("launchpad.api-vhost.configured")
 def configure_api_vhost():
diff --git a/charm/launchpad-appserver/templates/vhosts/api-https.conf.j2 b/charm/launchpad-appserver/templates/vhosts/api-https.conf.j2
index 52b9225..f9ed5fe 100644
--- a/charm/launchpad-appserver/templates/vhosts/api-https.conf.j2
+++ b/charm/launchpad-appserver/templates/vhosts/api-https.conf.j2
@@ -30,6 +30,12 @@
 
     RewriteEngine on
 
+{% if blocked_user_agents %}
+    # Block certain user agents
+    RewriteCond %{HTTP_USER_AGENT} ^.*({{ blocked_user_agents }}).*$ [NC]
+    RewriteRule .* – [F,L]
+{%- endif %}
+
     RewriteRule ^/offline\.html$ - [PT]
     RewriteRule ^/robots\.txt$ - [PT]
     RewriteRule ^/\+apidoc/(.*) /$1 [PT]
diff --git a/charm/launchpad-appserver/templates/vhosts/mainsite-https.conf.j2 b/charm/launchpad-appserver/templates/vhosts/mainsite-https.conf.j2
index 16708c2..7aac31e 100644
--- a/charm/launchpad-appserver/templates/vhosts/mainsite-https.conf.j2
+++ b/charm/launchpad-appserver/templates/vhosts/mainsite-https.conf.j2
@@ -38,6 +38,12 @@
 
     RewriteEngine on
 
+{% if blocked_user_agents %}
+    # Block certain user agents
+    RewriteCond %{HTTP_USER_AGENT} ^.*({{ blocked_user_agents }}).*$ [NC]
+    RewriteRule .* – [F,L]
+{%- endif %}
+
 {% if google_site_verification %}
     # https://portal.admin.canonical.com/C49078: File needed for Google to
     # verify domain control.

Follow ups