calibre-devs team mailing list archive

Thread
Date

[Merge] lp:~miurahr/calibre/experimental-recipes into lp:calibre

To: mp+41555@xxxxxxxxxxxxxxxxxx
From: Hiroshi Miura <miurahr@xxxxxxxxx>
Date: Tue, 23 Nov 2010 07:00:36 -0000
Reply-to: mp+41555@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Hiroshi Miura has proposed merging lp:~miurahr/calibre/experimental-recipes into lp:calibre.

Requested reviews:
  Kovid Goyal (kovid)


Introduced 15 Japanese recipes with some cover images and  icons.

  - CNET Japan
  - Endgadget Japan
  - MSN Sankei News
  - Reuters Japan
  - Nikkei(Free)
  - Nikkei::Sports
  - Nikkei::Industry
  - Nikkei::Life
  - Nikkei::Economy
  - Nikkei::Headline
  - Jiji Express
  - Mainichi Daily News
  - Mainichi Daily News:: IT and electoronics

Remove nikkei_sub.recipe  which is problematic by too much feed at once.

-- 
https://code.launchpad.net/~miurahr/calibre/experimental-recipes/+merge/41555
Your team calibre developers is subscribed to branch lp:~miurahr/calibre/experimental-recipes.

=== added file 'resources/images/news/cnetjapan.png'
Binary files resources/images/news/cnetjapan.png	1970-01-01 00:00:00 +0000 and resources/images/news/cnetjapan.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/endgadget_ja.png'
Binary files resources/images/news/endgadget_ja.png	1970-01-01 00:00:00 +0000 and resources/images/news/endgadget_ja.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/jijinews.png'
Binary files resources/images/news/jijinews.png	1970-01-01 00:00:00 +0000 and resources/images/news/jijinews.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/msnsankei.png'
Binary files resources/images/news/msnsankei.png	1970-01-01 00:00:00 +0000 and resources/images/news/msnsankei.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_free.png'
Binary files resources/images/news/nikkei_free.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_free.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_sub_economy.png'
Binary files resources/images/news/nikkei_sub_economy.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_sub_economy.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_sub_industory.png'
Binary files resources/images/news/nikkei_sub_industory.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_sub_industory.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_sub_life.png'
Binary files resources/images/news/nikkei_sub_life.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_sub_life.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_sub_main.png'
Binary files resources/images/news/nikkei_sub_main.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_sub_main.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/nikkei_sub_sports.png'
Binary files resources/images/news/nikkei_sub_sports.png	1970-01-01 00:00:00 +0000 and resources/images/news/nikkei_sub_sports.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/reuters.png'
Binary files resources/images/news/reuters.png	1970-01-01 00:00:00 +0000 and resources/images/news/reuters.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/images/news/reuters_ja.png'
Binary files resources/images/news/reuters_ja.png	1970-01-01 00:00:00 +0000 and resources/images/news/reuters_ja.png	2010-11-23 07:00:03 +0000 differ
=== added file 'resources/recipes/cnetjapan.recipe'
--- resources/recipes/cnetjapan.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/cnetjapan.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,30 @@
+import re;
+
+class CNetJapan(BasicNewsRecipe):
+    title          = u'CNET Japan'
+    oldest_article = 3
+    max_articles_per_feed = 30
+
+    feeds          = [(u'cnet rss', u'http://feeds.japan.cnet.com/cnet/rss')]
+    language       = 'ja'
+    encoding       = 'Shift_JIS'
+    remove_javascript = True
+
+    preprocess_regexps = [
+       (re.compile(ur'<!--\u25B2contents_left END\u25B2-->.*</body>', re.DOTALL|re.IGNORECASE|re.UNICODE),
+        lambda match: '</body>'),
+       (re.compile(r'<!--AD_ELU_HEADER-->.*</body>', re.DOTALL|re.IGNORECASE),
+        lambda match: '</body>'),
+       (re.compile(ur'<!-- \u25B2\u95A2\u9023\u30BF\u30B0\u25B2 -->.*<!-- \u25B2ZDNet\u25B2 -->', re.UNICODE),
+        lambda match: '<!-- removed -->'),
+        ]
+
+    remove_tags_before = dict(name="h2")
+    remove_tags = [
+                   {'class':"social_bkm_share"},
+                   {'class':"social_bkm_print"},
+                   {'class':"block20 clearfix"},
+                   dict(name="div",attrs={'id':'bookreview'}),
+                    ]
+    remove_tags_after = {'class':"block20"}
+

=== added file 'resources/recipes/endgadget_ja.recipe'
--- resources/recipes/endgadget_ja.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/endgadget_ja.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,20 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+japan.engadget.com
+'''
+
+from calibre.web.feeds.news import BasicNewsRecipe
+
+class EndgadgetJapan(BasicNewsRecipe):
+    title          = u'Endgadget\u65e5\u672c\u7248'
+    cover_url      = 'http://skins18.wincustomize.com/1/49/149320/29/7578/preview-29-7578.jpg'
+    masthead_url   = 'http://www.blogsmithmedia.com/japanese.engadget.com/media/eng-jp-logo-t.png'
+    oldest_article = 7
+    max_articles_per_feed = 100
+    no_stylesheets = True
+    language = 'ja'
+    encoding = 'utf-8'
+    feeds          = [(u'engadget', u'http://japanese.engadget.com/rss.xml')]

=== added file 'resources/recipes/jijinews.recipe'
--- resources/recipes/jijinews.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/jijinews.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,24 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.jiji.com
+'''
+
+class JijiDotCom(BasicNewsRecipe):
+    title          = u'\u6642\u4e8b\u901a\u4fe1'
+    __author__     = 'Hiroshi Miura'
+    description    = 'World News from Jiji Press'
+    publisher      = 'Jiji Press Ltd.'
+    category       = 'news'
+    encoding       = 'utf-8'
+    oldest_article = 6
+    max_articles_per_feed = 100
+    language       = 'ja'
+    cover_url       = 'http://www.jiji.com/img/top_header_logo2.gif'
+    masthead_url    = 'http://jen.jiji.com/images/logo_jijipress.gif'
+
+    feeds          = [(u'\u30cb\u30e5\u30fc\u30b9', u'http://www.jiji.com/rss/ranking.rdf')]
+    remove_tags_after = dict(id="ad_google")
+

=== added file 'resources/recipes/mainichi.recipe'
--- resources/recipes/mainichi.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/mainichi.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,24 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.mainichi.jp
+'''
+
+class MainichiDailyNews(BasicNewsRecipe):
+    title          = u'\u6bce\u65e5\u65b0\u805e'
+    __author__     = 'Hiroshi Miura'
+    oldest_article = 2
+    max_articles_per_feed = 20
+    description    = 'Japanese traditional newspaper Mainichi Daily News'
+    publisher      = 'Mainichi Daily News'
+    category       = 'news, japan'
+    language       = 'ja'
+
+    feeds          = [(u'daily news', u'http://mainichi.jp/rss/etc/flash.rss')]
+
+    remove_tags_before = {'class':"NewsTitle"}
+    remove_tags = [{'class':"RelatedArticle"}]
+    remove_tags_after = {'class':"Credit"}
+

=== added file 'resources/recipes/mainichi_it_news.recipe'
--- resources/recipes/mainichi_it_news.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/mainichi_it_news.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,16 @@
+class MainichiDailyITNews(BasicNewsRecipe):
+    title          = u'\u6bce\u65e5\u65b0\u805e(IT&\u5bb6\u96fb)'
+    __author__     = 'Hiroshi Miura'
+    oldest_article = 2
+    max_articles_per_feed = 100
+    description    = 'Japanese traditional newspaper Mainichi Daily News - IT and electronics'
+    publisher      = 'Mainichi Daily News'
+    category       = 'news, Japan, IT, Electronics'
+    language       = 'ja'
+
+    feeds          = [(u'IT News', u'http://mainichi.pheedo.jp/f/mainichijp_electronics')]
+
+    remove_tags_before = {'class':"NewsTitle"}
+    remove_tags = [{'class':"RelatedArticle"}]
+    remove_tags_after = {'class':"Credit"}
+

=== added file 'resources/recipes/msnsankei.recipe'
--- resources/recipes/msnsankei.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/msnsankei.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,22 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+sankei.jp.msn.com
+'''
+
+class MSNSankeiNewsProduct(BasicNewsRecipe):
+    title          = u'MSN\u7523\u7d4c\u30cb\u30e5\u30fc\u30b9(\u65b0\u5546\u54c1)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'Products release from Japan'
+    oldest_article = 7
+    max_articles_per_feed = 100
+    encoding       = 'Shift_JIS'
+    language       = 'ja'
+
+    feeds          = [(u'\u65b0\u5546\u54c1', u'http://sankei.jp.msn.com/rss/news/release.xml')]
+
+    remove_tags_before = dict(id="__r_article_title__")
+    remove_tags_after  = dict(id="ajax_release_news")
+    remove_tags = [{'class':"parent chromeCustom6G"}]

=== added file 'resources/recipes/nikkei_free.recipe'
--- resources/recipes/nikkei_free.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_free.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,58 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+class NikkeiNet(BasicNewsRecipe):
+    title          = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(Free)'
+    __author__     = 'Hiroshi Miura'
+    description    = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    oldest_article = 2
+    max_articles_per_feed = 20
+    language       = 'ja'
+ 
+    feeds          =  [ (u'\u65e5\u7d4c\u4f01\u696d', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sangyo'),
+		 (u'\u65e5\u7d4c\u88fd\u54c1',	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=newpro'),
+		 (u'internet',		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=internet'),
+		 (u'\u653f\u6cbb', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=seiji'),
+		 (u'\u8ca1\u52d9', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=zaimu'), 
+		 (u'\u7d4c\u6e08', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=keizai'),
+		 (u'\u56fd\u969b', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kaigai'),
+		 (u'\u79d1\u5b66', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kagaku'),
+		 (u'\u30de\u30fc\u30b1\u30c3\u30c8', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=market'),
+		 (u'\u304f\u3089\u3057', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kurashi'), 
+		 (u'\u30b9\u30dd\u30fc\u30c4', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sports'),
+		 (u'\u793e\u4f1a', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=shakai'), 
+		 (u'\u30a8\u30b3', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=eco'),
+		 (u'\u5065\u5eb7', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kenkou'), 
+		 (u'\u96c7\u7528', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=koyou'), 
+		 (u'\u6559\u80b2', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kyouiku'),
+		 (u'\u304a\u304f\u3084\u307f', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=okuyami'),
+		 (u'\u4eba\u4e8b', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=zinzi'),
+		 (u'\u7279\u96c6', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=special'),
+		 (u'\u5730\u57df\u30cb\u30e5\u30fc\u30b9', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=local'),
+		 (u'\u7d71\u8a08\u30fb\u767d\u66f8', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=report'),
+		 (u'\u30e9\u30f3\u30ad\u30f3\u30b0', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=ranking'), 
+		 (u'\u4f1a\u898b', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=interview'),
+		 (u'\u793e\u8aac\u30fb\u6625\u79cb', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=shasetsu'), 
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30d7\u30ed\u91ce\u7403', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=baseball'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u5927\u30ea\u30fc\u30b0', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=mlb'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30b5\u30c3\u30ab\u30fc', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=soccer'), 
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30b4\u30eb\u30d5', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=golf'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u76f8\u64b2', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sumou'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u7af6\u99ac', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=keiba'),
+ 		 (u'\u8abf\u67fb\u30fb\u30a2\u30f3\u30b1\u30fc\u30c8', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=research')
+		]
+
+    remove_tags_before = dict(id="CONTENTS")
+    remove_tags = [
+                   dict(name="form"),
+                   {'class':"cmn-hide"},
+                  ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+

=== added file 'resources/recipes/nikkei_sub_economy.recipe'
--- resources/recipes/nikkei_sub_economy.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_sub_economy.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,111 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+import string, re, sys
+from calibre import strftime
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import mechanize
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NikkeiNet_sub_economy(BasicNewsRecipe):
+    title           = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(\u7d4c\u6e08)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    needs_subscription = True
+    oldest_article  = 2
+    max_articles_per_feed = 20
+    language        = 'ja'
+    remove_javascript = False
+    temp_files = []
+
+    remove_tags_before = {'class':"cmn-section cmn-indent"}
+    remove_tags = [
+                       {'class':"JSID_basePageMove JSID_baseAsyncSubmit cmn-form_area JSID_optForm_utoken"},
+                       {'class':"cmn-article_keyword cmn-clearfix"},
+                       {'class':"cmn-print_headline cmn-clearfix"},
+                         ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+
+    feeds = [  (u'\u653f\u6cbb', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=seiji'),
+		 (u'\u8ca1\u52d9', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=zaimu'), 
+		 (u'\u7d4c\u6e08', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=keizai'),
+		 (u'\u30de\u30fc\u30b1\u30c3\u30c8', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=market'),
+		 (u'\u96c7\u7528', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=koyou'), 
+		 (u'\u6559\u80b2', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kyouiku'),
+		 (u'\u304a\u304f\u3084\u307f', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=okuyami'),
+		 (u'\u4eba\u4e8b', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=zinzi'),
+        ]
+    
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        cj = mechanize.LWPCookieJar()
+        br.set_cookiejar(cj)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            #print "----------------------------get login form--------------------------------------------"
+            # open login form
+            br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
+            response = br.response()
+            #print "----------------------------get login form---------------------------------------------"
+            #print "----------------------------set login form---------------------------------------------"
+            # remove disabled input which brings error on mechanize
+            response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
+            response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
+            br.set_response(response)
+            br.select_form(name='LA0010Form01')
+            br['LA0010Form01:LA0010Email']   = self.username
+            br['LA0010Form01:LA0010Password'] = self.password
+            br.form.find_control(id='LA0010Form01:LA0010AutoLoginOn',type="checkbox").get(nr=0).selected = True
+            br.submit()
+            response1 = br.response()
+            #print "----------------------------send login form---------------------------------------------"
+            #print "----------------------------open news main page-----------------------------------------"
+            # open news site
+            br.open('http://www.nikkei.com/')
+            response2 = br.response()
+            #print "----------------------------www.nikkei.com BODY   --------------------------------------"
+            #print response2.get_data()
+            #print "-------------------------^^-got auto redirect form----^^--------------------------------"
+            # forced redirect in default
+            br.select_form(nr=0)
+            br.submit()
+            response3 = br.response()
+            # return some cookie which should be set by Javascript
+            #print response3.geturl()
+            raw = response3.get_data()
+            #print "---------------------------response to form --------------------------------------------"
+            # grab cookie from JS and set it
+            redirectflag = re.search(r"var checkValue = '(\d+)';", raw, re.M).group(1)
+            br.select_form(nr=0)
+
+            self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+            self.temp_files[-1].write("#LWP-Cookies-2.0\n")
+
+            self.temp_files[-1].write("Set-Cookie3: Cookie-dummy=Cookie-value; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].write("Set-Cookie3: redirectFlag="+redirectflag+"; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].close()
+            cj.load(self.temp_files[-1].name)
+
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+
+
+

=== added file 'resources/recipes/nikkei_sub_industry.recipe'
--- resources/recipes/nikkei_sub_industry.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_sub_industry.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,109 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+import string, re, sys
+from calibre import strftime
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import mechanize
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NikkeiNet_sub_industory(BasicNewsRecipe):
+    title           = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(\u7523\u696d)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    needs_subscription = True
+    oldest_article  = 2
+    max_articles_per_feed = 20
+    language        = 'ja'
+    remove_javascript = False
+    temp_files = []
+
+    remove_tags_before = {'class':"cmn-section cmn-indent"}
+    remove_tags = [
+                       {'class':"JSID_basePageMove JSID_baseAsyncSubmit cmn-form_area JSID_optForm_utoken"},
+                       {'class':"cmn-article_keyword cmn-clearfix"},
+                       {'class':"cmn-print_headline cmn-clearfix"},
+                         ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+
+    feeds = [ (u'\u65e5\u7d4c\u4f01\u696d', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sangyo'),
+		 (u'\u65e5\u7d4c\u88fd\u54c1',	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=newpro'),
+		 (u'internet',		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=internet'),
+		 (u'\u56fd\u969b', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kaigai'),
+		 (u'\u79d1\u5b66', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kagaku'),
+
+        ]
+    
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        cj = mechanize.LWPCookieJar()
+        br.set_cookiejar(cj)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            #print "----------------------------get login form--------------------------------------------"
+            # open login form
+            br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
+            response = br.response()
+            #print "----------------------------get login form---------------------------------------------"
+            #print "----------------------------set login form---------------------------------------------"
+            # remove disabled input which brings error on mechanize
+            response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
+            response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
+            br.set_response(response)
+            br.select_form(name='LA0010Form01')
+            br['LA0010Form01:LA0010Email']   = self.username
+            br['LA0010Form01:LA0010Password'] = self.password
+            br.form.find_control(id='LA0010Form01:LA0010AutoLoginOn',type="checkbox").get(nr=0).selected = True
+            br.submit()
+            response1 = br.response()
+            #print "----------------------------send login form---------------------------------------------"
+            #print "----------------------------open news main page-----------------------------------------"
+            # open news site
+            br.open('http://www.nikkei.com/')
+            response2 = br.response()
+            #print "----------------------------www.nikkei.com BODY   --------------------------------------"
+            #print response2.get_data()
+            #print "-------------------------^^-got auto redirect form----^^--------------------------------"
+            # forced redirect in default
+            br.select_form(nr=0)
+            br.submit()
+            response3 = br.response()
+            # return some cookie which should be set by Javascript
+            #print response3.geturl()
+            raw = response3.get_data()
+            #print "---------------------------response to form --------------------------------------------"
+            # grab cookie from JS and set it
+            redirectflag = re.search(r"var checkValue = '(\d+)';", raw, re.M).group(1)
+            br.select_form(nr=0)
+
+            self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+            self.temp_files[-1].write("#LWP-Cookies-2.0\n")
+
+            self.temp_files[-1].write("Set-Cookie3: Cookie-dummy=Cookie-value; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].write("Set-Cookie3: redirectFlag="+redirectflag+"; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].close()
+            cj.load(self.temp_files[-1].name)
+
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+
+
+

=== added file 'resources/recipes/nikkei_sub_life.recipe'
--- resources/recipes/nikkei_sub_life.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_sub_life.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,110 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+import string, re, sys
+from calibre import strftime
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import mechanize
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NikkeiNet_sub_life(BasicNewsRecipe):
+    title           = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(\u751f\u6d3b)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    needs_subscription = True
+    oldest_article  = 2
+    max_articles_per_feed = 20
+    language        = 'ja'
+    remove_javascript = False
+    temp_files = []
+
+    remove_tags_before = {'class':"cmn-section cmn-indent"}
+    remove_tags = [
+                       {'class':"JSID_basePageMove JSID_baseAsyncSubmit cmn-form_area JSID_optForm_utoken"},
+                       {'class':"cmn-article_keyword cmn-clearfix"},
+                       {'class':"cmn-print_headline cmn-clearfix"},
+                         ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+
+    feeds = [  (u'\u304f\u3089\u3057', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kurashi'), 
+		 (u'\u30b9\u30dd\u30fc\u30c4', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sports'),
+		 (u'\u793e\u4f1a', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=shakai'), 
+		 (u'\u30a8\u30b3', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=eco'),
+		 (u'\u5065\u5eb7', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=kenkou'), 
+		 (u'\u7279\u96c6', 		u'http://www.zou3.net/php/rss/nikkei2rss.php?head=special'),
+		 (u'\u30e9\u30f3\u30ad\u30f3\u30b0', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=ranking')
+        ]
+    
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        cj = mechanize.LWPCookieJar()
+        br.set_cookiejar(cj)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            #print "----------------------------get login form--------------------------------------------"
+            # open login form
+            br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
+            response = br.response()
+            #print "----------------------------get login form---------------------------------------------"
+            #print "----------------------------set login form---------------------------------------------"
+            # remove disabled input which brings error on mechanize
+            response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
+            response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
+            br.set_response(response)
+            br.select_form(name='LA0010Form01')
+            br['LA0010Form01:LA0010Email']   = self.username
+            br['LA0010Form01:LA0010Password'] = self.password
+            br.form.find_control(id='LA0010Form01:LA0010AutoLoginOn',type="checkbox").get(nr=0).selected = True
+            br.submit()
+            response1 = br.response()
+            #print "----------------------------send login form---------------------------------------------"
+            #print "----------------------------open news main page-----------------------------------------"
+            # open news site
+            br.open('http://www.nikkei.com/')
+            response2 = br.response()
+            #print "----------------------------www.nikkei.com BODY   --------------------------------------"
+            #print response2.get_data()
+            #print "-------------------------^^-got auto redirect form----^^--------------------------------"
+            # forced redirect in default
+            br.select_form(nr=0)
+            br.submit()
+            response3 = br.response()
+            # return some cookie which should be set by Javascript
+            #print response3.geturl()
+            raw = response3.get_data()
+            #print "---------------------------response to form --------------------------------------------"
+            # grab cookie from JS and set it
+            redirectflag = re.search(r"var checkValue = '(\d+)';", raw, re.M).group(1)
+            br.select_form(nr=0)
+
+            self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+            self.temp_files[-1].write("#LWP-Cookies-2.0\n")
+
+            self.temp_files[-1].write("Set-Cookie3: Cookie-dummy=Cookie-value; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].write("Set-Cookie3: redirectFlag="+redirectflag+"; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].close()
+            cj.load(self.temp_files[-1].name)
+
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+
+
+

=== added file 'resources/recipes/nikkei_sub_main.recipe'
--- resources/recipes/nikkei_sub_main.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_sub_main.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,103 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+import string, re, sys
+from calibre import strftime
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import mechanize
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NikkeiNet_sub_main(BasicNewsRecipe):
+    title           = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(\u7dcf\u5408)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    needs_subscription = True
+    oldest_article  = 2
+    max_articles_per_feed = 20
+    language        = 'ja'
+    remove_javascript = False
+    temp_files = []
+
+    remove_tags_before = {'class':"cmn-section cmn-indent"}
+    remove_tags = [
+                       {'class':"JSID_basePageMove JSID_baseAsyncSubmit cmn-form_area JSID_optForm_utoken"},
+                       {'class':"cmn-article_keyword cmn-clearfix"},
+                       {'class':"cmn-print_headline cmn-clearfix"},
+                         ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+
+    feeds = [ (u'NIKKEI', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=main')]
+    
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        cj = mechanize.LWPCookieJar()
+        br.set_cookiejar(cj)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            #print "----------------------------get login form--------------------------------------------"
+            # open login form
+            br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
+            response = br.response()
+            #print "----------------------------get login form---------------------------------------------"
+            #print "----------------------------set login form---------------------------------------------"
+            # remove disabled input which brings error on mechanize
+            response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
+            response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
+            br.set_response(response)
+            br.select_form(name='LA0010Form01')
+            br['LA0010Form01:LA0010Email']   = self.username
+            br['LA0010Form01:LA0010Password'] = self.password
+            br.form.find_control(id='LA0010Form01:LA0010AutoLoginOn',type="checkbox").get(nr=0).selected = True
+            br.submit()
+            response1 = br.response()
+            #print "----------------------------send login form---------------------------------------------"
+            #print "----------------------------open news main page-----------------------------------------"
+            # open news site
+            br.open('http://www.nikkei.com/')
+            response2 = br.response()
+            #print "----------------------------www.nikkei.com BODY   --------------------------------------"
+            #print response2.get_data()
+            #print "-------------------------^^-got auto redirect form----^^--------------------------------"
+            # forced redirect in default
+            br.select_form(nr=0)
+            br.submit()
+            response3 = br.response()
+            # return some cookie which should be set by Javascript
+            #print response3.geturl()
+            raw = response3.get_data()
+            #print "---------------------------response to form --------------------------------------------"
+            # grab cookie from JS and set it
+            redirectflag = re.search(r"var checkValue = '(\d+)';", raw, re.M).group(1)
+            br.select_form(nr=0)
+
+            self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+            self.temp_files[-1].write("#LWP-Cookies-2.0\n")
+
+            self.temp_files[-1].write("Set-Cookie3: Cookie-dummy=Cookie-value; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].write("Set-Cookie3: redirectFlag="+redirectflag+"; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].close()
+            cj.load(self.temp_files[-1].name)
+
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+
+
+

=== added file 'resources/recipes/nikkei_sub_sports.recipe'
--- resources/recipes/nikkei_sub_sports.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/nikkei_sub_sports.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,110 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.nikkei.com
+'''
+
+import string, re, sys
+from calibre import strftime
+from calibre.web.feeds.recipes import BasicNewsRecipe
+import mechanize
+from calibre.ptempfile import PersistentTemporaryFile
+
+
+class NikkeiNet_sub_sports(BasicNewsRecipe):
+    title           = u'\u65e5\u7d4c\u65b0\u805e\u96fb\u5b50\u7248(\u30b9\u30dd\u30fc\u30c4)'
+    __author__      = 'Hiroshi Miura'
+    description     = 'News and current market affairs from Japan'
+    cover_url       = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    masthead_url    = 'http://parts.nikkei.com/parts/ds/images/common/logo_r1.svg'
+    needs_subscription = True
+    oldest_article  = 2
+    max_articles_per_feed = 20
+    language        = 'ja'
+    remove_javascript = False
+    temp_files = []
+
+    remove_tags_before = {'class':"cmn-section cmn-indent"}
+    remove_tags = [
+                       {'class':"JSID_basePageMove JSID_baseAsyncSubmit cmn-form_area JSID_optForm_utoken"},
+                       {'class':"cmn-article_keyword cmn-clearfix"},
+                       {'class':"cmn-print_headline cmn-clearfix"},
+                         ]
+    remove_tags_after = {'class':"cmn-pr_list"}
+
+    feeds = [ 
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30d7\u30ed\u91ce\u7403', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=baseball'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u5927\u30ea\u30fc\u30b0', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=mlb'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30b5\u30c3\u30ab\u30fc', u'http://www.zou3.net/php/rss/nikkei2rss.php?head=soccer'), 
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u30b4\u30eb\u30d5', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=golf'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u76f8\u64b2', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=sumou'),
+		 (u'\u30b9\u30dd\u30fc\u30c4\uff1a\u7af6\u99ac', 	u'http://www.zou3.net/php/rss/nikkei2rss.php?head=keiba')
+        ]
+    
+    def get_browser(self):
+        br = BasicNewsRecipe.get_browser()
+
+        cj = mechanize.LWPCookieJar()
+        br.set_cookiejar(cj)
+
+        #br.set_debug_http(True)
+        #br.set_debug_redirects(True)
+        #br.set_debug_responses(True)
+
+        if self.username is not None and self.password is not None:
+            #print "----------------------------get login form--------------------------------------------"
+            # open login form
+            br.open('https://id.nikkei.com/lounge/nl/base/LA0010.seam')
+            response = br.response()
+            #print "----------------------------get login form---------------------------------------------"
+            #print "----------------------------set login form---------------------------------------------"
+            # remove disabled input which brings error on mechanize
+            response.set_data(response.get_data().replace("<input id=\"j_id48\"", "<!-- "))
+            response.set_data(response.get_data().replace("gm_home_on.gif\" />", " -->"))
+            br.set_response(response)
+            br.select_form(name='LA0010Form01')
+            br['LA0010Form01:LA0010Email']   = self.username
+            br['LA0010Form01:LA0010Password'] = self.password
+            br.form.find_control(id='LA0010Form01:LA0010AutoLoginOn',type="checkbox").get(nr=0).selected = True
+            br.submit()
+            response1 = br.response()
+            #print "----------------------------send login form---------------------------------------------"
+            #print "----------------------------open news main page-----------------------------------------"
+            # open news site
+            br.open('http://www.nikkei.com/')
+            response2 = br.response()
+            #print "----------------------------www.nikkei.com BODY   --------------------------------------"
+            #print response2.get_data()
+            #print "-------------------------^^-got auto redirect form----^^--------------------------------"
+            # forced redirect in default
+            br.select_form(nr=0)
+            br.submit()
+            response3 = br.response()
+            # return some cookie which should be set by Javascript
+            #print response3.geturl()
+            raw = response3.get_data()
+            #print "---------------------------response to form --------------------------------------------"
+            # grab cookie from JS and set it
+            redirectflag = re.search(r"var checkValue = '(\d+)';", raw, re.M).group(1)
+            br.select_form(nr=0)
+
+            self.temp_files.append(PersistentTemporaryFile('_fa.html'))
+            self.temp_files[-1].write("#LWP-Cookies-2.0\n")
+
+            self.temp_files[-1].write("Set-Cookie3: Cookie-dummy=Cookie-value; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].write("Set-Cookie3: redirectFlag="+redirectflag+"; domain=\".nikkei.com\"; path=\"/\"; path_spec; secure; expires=\"2029-12-21 05:07:59Z\"; version=0\n")
+            self.temp_files[-1].close()
+            cj.load(self.temp_files[-1].name)
+
+            br.submit()
+
+            #br.set_debug_http(False)
+            #br.set_debug_redirects(False)
+            #br.set_debug_responses(False)
+        return br
+
+
+
+

=== added file 'resources/recipes/reuters_ja.recipe'
--- resources/recipes/reuters_ja.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/reuters_ja.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,37 @@
+from calibre.web.feeds.news import BasicNewsRecipe
+import re
+
+class ReutersJa(BasicNewsRecipe):
+
+    title = 'Reuters(Japan)'
+    description = 'Global news in Japanese'
+    __author__ = 'Hiroshi Miura'
+    use_embedded_content   = False
+    language = 'ja'
+    max_articles_per_feed = 10
+    remove_javascript = True
+
+    feeds = [ ('Top Stories', 'http://feeds.reuters.com/reuters/JPTopNews?format=xml'),
+                  ('World News', 'http://feeds.reuters.com/reuters/JPWorldNews?format=xml'),
+                  ('Business News', 'http://feeds.reuters.com/reuters/JPBusinessNews?format=xml'),
+                  ('Technology News', 'http://feeds.reuters.com/reuters/JPTechnologyNews?format=xml'),
+                  ('Oddly Enough News', 'http://feeds.reuters.com/reuters/JPOddlyEnoughNews?format=xml')
+         ]
+
+    remove_tags_before = {'class':"article primaryContent"}
+    remove_tags = [ dict(id="banner"),
+                    dict(id="autilities"),
+                    dict(id="textSizer"),
+                    dict(id="shareFooter"),
+                    dict(id="relatedNews"),
+                    dict(id="editorsChoice"),
+                    dict(id="ecArticles"),
+                    {'class':"secondaryContent"},
+                    {'class':"module"},
+                     ]
+    remove_tags_after = {'class':"assetBuddy"}
+
+    def print_version(self, url):
+        m = re.search('(.*idJPJAPAN-[0-9]+)', url)
+        return m.group(0)+'?sp=true'
+

=== added file 'resources/recipes/the_h.recipe'
--- resources/recipes/the_h.recipe	1970-01-01 00:00:00 +0000
+++ resources/recipes/the_h.recipe	2010-11-23 07:00:03 +0000
@@ -0,0 +1,31 @@
+#!/usr/bin/env  python
+
+__license__   = 'GPL v3'
+__copyright__ = '2010, Hiroshi Miura <miurahr@xxxxxxxxx>'
+'''
+www.h-online.com
+'''
+
+class TheHeiseOnline(BasicNewsRecipe):
+    title          = u'The H'
+    __author__     = 'Hiroshi Miura'
+    oldest_article = 3
+    description    = 'In association with Heise Online'
+    publisher      = 'Heise Media UK Ltd.'
+    category       = 'news, technology, security'
+    max_articles_per_feed = 100
+    language       = 'en'
+    encoding       = 'utf-8'
+    conversion_options = {
+                      'comment'  : description
+                     ,'tags'     : category
+                     ,'publisher': publisher
+                     ,'language' : language
+                     }
+    feeds          = [
+                      (u'The H News Feed', u'http://www.h-online.com/news/atom.xml')
+                     ]
+
+    def print_version(self, url):
+        return url + '?view=print'
+

Follow ups

[Merge] lp:~miurahr/calibre/experimental-recipes into lp:calibre
From: noreply, 2010-11-23