cairo-dock-team team mailing list archive
-
cairo-dock-team team
-
Mailing list archive
-
Message #01811
[Merge] lp:~eduardo-mucelli/cairo-dock-plug-ins-extras/WebSearch into lp:cairo-dock-plug-ins-extras
Eduardo Mucelli R. Oliveira has proposed merging lp:~eduardo-mucelli/cairo-dock-plug-ins-extras/WebSearch into lp:cairo-dock-plug-ins-extras.
Requested reviews:
Cairo-Dock Team (cairo-dock-team)
A heavy code refactoring, each search engine has its own module. Webshots has changed its HTML the whole fetching was recoded. Removing user-defined option for a pre-fetching limit. Finally a totally functional parallelized thumbnail download.
--
https://code.launchpad.net/~eduardo-mucelli/cairo-dock-plug-ins-extras/WebSearch/+merge/25200
Your team Cairo-Dock Team is requested to review the proposed merge of lp:~eduardo-mucelli/cairo-dock-plug-ins-extras/WebSearch into lp:cairo-dock-plug-ins-extras.
=== added directory 'RubyBattery/emblems'
=== added file 'RubyBattery/emblems/charging.png'
Binary files RubyBattery/emblems/charging.png 1970-01-01 00:00:00 +0000 and RubyBattery/emblems/charging.png 2010-05-13 01:49:21 +0000 differ
=== added file 'RubyBattery/emblems/discharging.png'
Binary files RubyBattery/emblems/discharging.png 1970-01-01 00:00:00 +0000 and RubyBattery/emblems/discharging.png 2010-05-13 01:49:21 +0000 differ
=== modified file 'WebSearch/Changelog.txt'
--- WebSearch/Changelog.txt 2010-05-10 11:35:51 +0000
+++ WebSearch/Changelog.txt 2010-05-13 01:49:21 +0000
@@ -1,5 +1,6 @@
+1.0.0: (May/12/2010): A heavy code refactoring, each search engine has its own module. Webshots has changed its HTML the whole fetching was recoded. Removing user-defined option for a pre-fetching limit. Finally a totally functional parallelized thumbnail download.
0.7.3: (May/10/2010): Fixing the Google search since the engine changed the result stats HTML. Fixing the Bing result stats. Fixing the stats showing.
-0.7.0: (May/3/2010): WebSearch now fetch results from Wikipedia. Some code improvements. A new icon has been drawn, and a new preview also, both using Gimp.
+0.7.0: (May/3/2010): WebSearch now fetch results from Wikipedia. Some code improvements. A new icon has been drawn, and a new preview also, both using Gimp.
0.5.5: (May/1/2010): WebSearch now fetch results from Flickr and optionally it shows the image thumbnail as sub-icon. Thumbnail download is faster, it was parallelized with multi-threading. Fixed the removal of Google Images injected results within Google search. Adding Youtube, and Webshots result stats. Removed the log when downloading thumbnails. Any console output was removed.
0.4.1 (April/25/2010): WebSearch now fetch results from Webshots and optionally it shows the image thumbnail as sub-icon. Now thumbnails are downloaded only by each sub-icons pagination increasing the response time. Changing a show_youtube_video_preview parameter for a general one show_thumbnail_preview able for any search engine which gives thumbnail possibility.
0.3.0 (April/21/2010): WebSearch now fetch results from Youtube and optionally it shows the video thumbnail as sub-icon. New applet preview image that shows the results of a Youtube search.
=== modified file 'WebSearch/WebSearch'
--- WebSearch/WebSearch 2010-05-10 11:35:51 +0000
+++ WebSearch/WebSearch 2010-05-13 01:49:21 +0000
@@ -33,8 +33,8 @@
%w{rubygems open-uri nokogiri dbus parseconfig launchy}.each { |x| require x } # requirements
class Array
+ def fifth;self[4];end # defining the method "fifth" just for code readability
def third;self[2];end # defining the method "third" just for code readability
- def fifth;self[4];end
end
class String
@@ -43,505 +43,274 @@
end
end
-bus = DBus::SessionBus.instance # TODO: a module to encapsulate DBus-Dock connection
-applet_service = bus.service("org.cairodock.CairoDock")
-applet_name = File.basename(Dir.getwd) # nome do applet, neste caso é demo_ruby
-applet_path = "/org/cairodock/CairoDock/#{applet_name}" # caminho onde o objeto está guardado no bus
-
-applet_object = applet_service.object(applet_path)
-applet_object.introspect
-applet_object.default_iface = 'org.cairodock.CairoDock.applet' # list of icons contained in our sub-dock, or in our desklet
-
-applet_sub_icons_object = applet_service.object("#{applet_path}/sub_icons")
-applet_sub_icons_object.introspect
-applet_sub_icons_object.default_iface = "org.cairodock.CairoDock.subapplet" # list of icons contained in our sub-dock, or in our desklet
-
-class Link
- attr_accessor :url, :description, :id, :icon, :shortened_url
- @@next_id = 0 # sequential id "static"
-
- def initialize (url = "", description = "", icon = File.expand_path("./icon"))
- self.url = url
- self.description = description
- self.id = @@next_id += 1
- self.icon = icon
- self.shortened_url = shorten url
- end
-
- def shorten (string, count = 45) # TODO: count as a parameter in .conf file
- if string.length > count
- shortened = string.slice(0 .. count-1)
- shortened + "..." if shortened
- else
- string
- end
- end
-
- def self.reset_next_id
- class_variable_set(:@@next_id, 0) # metaprogramming to reset the instance counter
- end
-end
-
-class ThumbnailedLink < Link # a nice refactoring with the old YoutubeLink class
- attr_accessor :image_id, :thumb_url, :thumb_path, :downloaded_thumb
- @@next_image_id = 0
-
- def initialize(url = "", description = "", thumb_url = "")
- self.thumb_url = thumb_url
- self.image_id = @@next_image_id += 1
- self.thumb_path = define_thumbnail_path
- self.downloaded_thumb = false
- super(url, description)
- end
-
- def download_thumbnail
- # download thumb quietly (q), name it (O) '#{image_id}.jpg' and take it to the directory named as engine
- IO.popen("wget -q #{self.thumb_url} -O #{self.thumb_path}")
- self.downloaded_thumb = true
- self.icon = File.expand_path(self.thumb_path)
- end
-
- # Thumbnail path composed by the search engine and image id
- def define_thumbnail_path
- directory = extract_directory_from_url
- "./images/#{directory}/#{self.image_id}.jpg"
- end
-
- # Extract from the thumb_url the what is the search engine using the the core of the url
- def extract_directory_from_url
- directories = %w(youtube webshots flickr) # directories names like engines names
- found = directories.detect {|d| self.thumb_url.include?(d)} # search for engines names in thumb_url
- end
-
- def downloaded_thumb?
- self.downloaded_thumb
- end
-
- def self.reset_next_image_id
- class_variable_set(:@@next_image_id, 0) # metaprogramming to reset the instance counter
- end
-end
-
-class Applet
-
- attr_accessor :engine, :links, :query, :stats, :engines, :file_name,
- :number_of_fetched_links, :number_of_displayed_links, :page_of_displayed_links, # prefetch works only for google
- :show_current_page, :show_description_instead_url, :show_thumbnail_preview,
- :scroll_engine_index
-
- Google = "http://www.google.com/search?q=" # (10,20,30,50,100) results per page
- Bing = "http://www.bing.com/search?q=" # 10 results per page
- Yahoo = "http://search.yahoo.com/search?p=" # 10 results per page
- Teoma = "http://www.teoma.com/web?q=" # 10 results per page
- Youtube = "http://www.youtube.com"
- YoutubeQ = "#{Youtube}/results?search_query=" # 20 results per page
- Webshots = "http://www.webshots.com/search?query=" # 72 results per page
- Flickr = "http://www.flickr.com"
- FlickrQ = "#{Flickr}/search/?q=" # 28 results per page
- Wikipedia = "http://en.wikipedia.org"
- WikipediaQ = "#{Wikipedia}/w/index.php?title=Special:Search&search=" # parameter "limit" results per page
-
- DialogActiveTime = 5 # time in seconds the dialog window will be active
-
- def initialize applet, sub_icons, file_name
- self.query = ""
- self.file_name = file_name
- self.engines = [Google, Bing, Yahoo, Teoma, YoutubeQ, Webshots, FlickrQ, WikipediaQ]
- self.scroll_engine_index = 0 # current index when scrolling through search engines
- @icon = applet
- @sub_icons = sub_icons
- reset_search_settings
- set_configuration_parameters # setting the self.configuration content
- end
-
- def set_configuration_parameters
- conf = ParseConfig.new(File.expand_path("~/.config/cairo-dock/current_theme/plug-ins/#{self.file_name}/#{self.file_name}.conf"))
- # for parameters within a list, the value is the position in the options list, not the value by itself
- self.engine = self.engines.at(conf.params['Configuration']['engine'].to_i)
- inform_current_search_engine # inform in bottom of the icon what is the new engine
- self.number_of_fetched_links = [10, 20, 30, 50, 100].at(conf.params['Configuration']['number of fetched links'].to_i)
- self.number_of_displayed_links = conf.params['Configuration']['number of displayed links'].to_i # number of sub-icons to be shown
- self.show_current_page = conf.params['Configuration']['show current page'].to_b
- self.show_description_instead_url = conf.params['Configuration']['show description instead url'].to_b
- self.show_thumbnail_preview = conf.params['Configuration']['show thumbnail preview'].to_b
- end
+module WebSearch
+
+ def self.name
+ File.basename(Dir.getwd) # nome do applet, neste caso é o mesmo do diretorio
+ end
+
+ def self.start
+ bus = DBus::SessionBus.instance # TODO: a module to encapsulate DBus-Dock connection
+ applet_service = bus.service("org.cairodock.CairoDock")
+ applet_path = "/org/cairodock/CairoDock/#{WebSearch.name}" # caminho onde o objeto está guardado no bus
+
+ applet_object = applet_service.object(applet_path)
+ applet_object.introspect
+ applet_object.default_iface = 'org.cairodock.CairoDock.applet' # list of icons contained in our sub-dock, or in our desklet
+
+ applet_sub_icons_object = applet_service.object("#{applet_path}/sub_icons")
+ applet_sub_icons_object.introspect
+ applet_sub_icons_object.default_iface = "org.cairodock.CairoDock.subapplet" # list of icons contained in our sub-dock, or in our desklet
+
+ applet = Applet.new applet_object, applet_sub_icons_object
+ applet.start
+ loop = DBus::Main.new
+ loop << bus
+ loop.run
+ end
+
+ class Applet
+
+ require './lib/Engine.rb'
+ # require './lib/Exceptions.rb'
+
+ attr_accessor :engine, :query, :engines,
+ :number_of_fetched_links, :number_of_displayed_links, :page_of_displayed_links,
+ :show_current_page, :show_description_instead_url, :show_thumbnail_preview,
+ :scroll_engine_index
+
+ DialogActiveTime = 5 # time in seconds the dialog window will be active
+
+ def initialize applet, sub_icons
+ self.query = ""
+ self.engines = %w(Google Bing Yahoo! Teoma Wikipedia Youtube Webshots Flickr)
+ self.engine = Engine.new
+ self.scroll_engine_index = 0 # current index when scrolling through search engines
+ @icon = applet
+ @sub_icons = sub_icons
+ reset_search_settings
+ set_configuration_parameters # setting the self.configuration content
+ end
+
+ def set_configuration_parameters
+ conf = ParseConfig.new(File.expand_path("~/.config/cairo-dock/current_theme/plug-ins/#{WebSearch.name}/#{WebSearch.name}.conf"))
+ # for parameters within a list, the value is the position in the options list, not the value by itself
+ self.number_of_fetched_links = [10, 20, 30, 50, 100].at(conf.params['Configuration']['number of fetched links'].to_i)
+ self.engine.name = self.engines.at(conf.params['Configuration']['engine'].to_i)
+ inform_current_search_engine # inform in bottom of the icon what is the new engine
+ self.number_of_displayed_links = conf.params['Configuration']['number of displayed links'].to_i # number of sub-icons to be shown
+ self.show_current_page = conf.params['Configuration']['show current page'].to_b
+ self.show_description_instead_url = conf.params['Configuration']['show description instead url'].to_b
+ self.show_thumbnail_preview = conf.params['Configuration']['show thumbnail preview'].to_b
+ end
- def start
- verify_user_action
- end
-
- def verify_user_action
- @icon.on_signal("on_build_menu") do |param| # right click signal
- action_on_build_menu
- end
- @icon.on_signal("on_menu_select") do |selected_menu|
- action_on_menu_select selected_menu
- end
- @icon.on_signal("on_answer") do |answer|
- action_on_answer answer
- end
- @icon.on_signal("on_scroll") do |scroll_up| # when the user scroll the mouse up or down on the icon
- action_on_scroll scroll_up # scroll down param = false, scroll up param = true
- end
- @icon.on_signal("on_middle_click") do |param|
- ask_for_search_query
- end
- @icon.on_signal("on_click") do |param|
- action_on_click
- end
- @icon.on_signal("on_reload_module") do |config_has_changed|
- action_on_reload_module config_has_changed
- end
- @sub_icons.on_signal("on_click_sub_icon") do |param, sub_icon_id|
- action_on_click_sub_icon sub_icon_id
- end
- @sub_icons.on_signal("on_middle_click_sub_icon") do |sub_icon_id|
- action_on_middle_click_sub_icon sub_icon_id
- end
- end
-
- def action_on_build_menu
- #if @icon.add_menu_items_available # Cairo-Dock > 2.1.4-0beta0
- # items = [{:type => 1, :label => 'Google', :menu => 0, :id => 1, :icon => './images/google.png', :tooltip => 'Google'}]
- # @icon.AddMenuItems(items)
- #else
- @icon.PopulateMenu(["Google", "Bing", "Yahoo!", "Teoma", "Youtube", "Webshots", "Flickr", "Wikipedia"])
- #end
- end
-
- def ask_for_search_query
- @icon.AskText("Search for:", "#{self.query}")
- end
-
- def action_on_answer answer
- unless answer.empty?
- reset_search_settings unless self.query.empty?
- self.query = answer
- fetch_next_resulting_page
- end
- end
-
- def reset_search_settings
- self.links = []
- self.stats = ""
- self.page_of_displayed_links = 0 # current pagination of displayed links
- Link.reset_next_id
- ThumbnailedLink.reset_next_image_id
- @sub_icons.RemoveSubIcon("any")
- end
-
- def action_on_click_sub_icon sub_icon_id
- Launchy.open self.links.at(sub_icon_id.to_i-1).url
- end
-
- def action_on_middle_click_sub_icon sub_icon_id
- text = ""
- if self.show_description_instead_url # sub-icons are entitled by description ...
- text = self.links.at(sub_icon_id.to_i-1).url # so URL will be shown in dialog
- else # sub-icons are entitled by url ...
- text = self.links.at(sub_icon_id.to_i-1).description # so description will be shown in dialog
- end
- @icon.ShowDialog(text, DialogActiveTime)
- end
-
- def action_on_click
- if self.stats.empty?
- ask_for_search_query
- else
- @icon.ShowDialog(self.stats, DialogActiveTime)
- end
- end
-
- # Changing the search engine by context menu
- def action_on_menu_select param
- switch_search_engine param
- end
-
- def action_on_reload_module config_has_changed
- set_configuration_parameters if config_has_changed
- end
-
- # Scrolling behavior can be switch the search engine, or fetch another resulting page
- def action_on_scroll scroll_up
- if self.query.empty? # before the first query it is possible scroll through engines
- if scroll_up
- switch_search_engine self.scroll_engine_index +=1 # drawback: user scrolls a lot for up/down and this variable
- else # gets a value far from (0..self.engines.length-1) limits.
- switch_search_engine self.scroll_engine_index -=1 # user need to scroll back a lot to get in these limits again
- end
- else # later the first query scroll through the resulting pages
- if scroll_up
+ def start
+ verify_user_action
+ end
+
+ def verify_user_action
+ @icon.on_signal("on_build_menu") do |param| # right click signal
+ action_on_build_menu
+ end
+ @icon.on_signal("on_menu_select") do |selected_menu|
+ action_on_menu_select selected_menu
+ end
+ @icon.on_signal("on_answer") do |answer|
+ action_on_answer answer
+ end
+ @icon.on_signal("on_scroll") do |scroll_up| # when the user scroll the mouse up or down on the icon
+ action_on_scroll scroll_up # scroll down param = false, scroll up param = true
+ end
+ @icon.on_signal("on_middle_click") do |param|
+ ask_for_search_query
+ end
+ @icon.on_signal("on_click") do |param|
+ action_on_click
+ end
+ @icon.on_signal("on_reload_module") do |config_has_changed|
+ action_on_reload_module config_has_changed
+ end
+ @sub_icons.on_signal("on_click_sub_icon") do |param, sub_icon_id|
+ action_on_click_sub_icon sub_icon_id
+ end
+ @sub_icons.on_signal("on_middle_click_sub_icon") do |sub_icon_id|
+ action_on_middle_click_sub_icon sub_icon_id
+ end
+ end
+
+ def action_on_build_menu
+ #if @icon.add_menu_items_available # Cairo-Dock > 2.1.4-0beta0
+ # items = [{:type => 1, :label => 'Google', :menu => 0, :id => 1, :icon => './images/google.png', :tooltip => 'Google'}]
+ # @icon.AddMenuItems(items)
+ #else
+ @icon.PopulateMenu(self.engines)
+ #end
+ end
+
+ def ask_for_search_query
+ @icon.AskText("Search for:", "#{self.query}")
+ end
+
+ def action_on_answer answer
+ unless answer.empty?
+ reset_search_settings unless self.query.empty?
+ self.query = answer
+ self.engine = self.engine.connect # only when the fetch is imminent the engine connection occurs
fetch_next_resulting_page
+ end
+ end
+
+ def reset_search_settings
+ self.engine.links = []
+ self.engine.stats = ""
+ self.page_of_displayed_links = 0 # current pagination of displayed links
+ Link.reset_next_id
+ ThumbnailedLink.reset_next_image_id
+ @sub_icons.RemoveSubIcon("any")
+ end
+
+ def action_on_click_sub_icon sub_icon_id
+ Launchy.open self.engine.links.at(sub_icon_id.to_i-1).url
+ end
+
+ def action_on_middle_click_sub_icon sub_icon_id
+ text = ""
+ if self.show_description_instead_url # sub-icons are entitled by description ...
+ text = self.engine.links.at(sub_icon_id.to_i-1).url # so URL will be shown in dialog
+ else # sub-icons are entitled by url ...
+ text = self.engine.links.at(sub_icon_id.to_i-1).description # so description will be shown in dialog
+ end
+ @icon.ShowDialog(text, DialogActiveTime)
+ end
+
+ def action_on_click
+ if self.engine.stats.empty?
+ ask_for_search_query
else
- fetch_previous_resulting_page
- end
- end
- end
-
- def switch_search_engine index
- index = 0 if index < 0 # keep the lower limit
- index = self.engines.length - 1 if index > self.engines.length - 1 # keep the upper limit
- self.engine = self.engines.at index
- reset_search_settings # clean the previous search when choosing a new one
- inform_current_search_engine # inform in the bottom of the icon what is the new engine
- end
-
- def fetch_next_resulting_page
- offset = self.page_of_displayed_links * self.number_of_displayed_links # the position of the first link in the self.links array
- if self.links.size <= offset # user already scrolled by the fetched links, fetch more
+ @icon.ShowDialog(self.engine.stats, DialogActiveTime)
+ end
+ end
+
+ # Changing the search engine by context menu
+ def action_on_menu_select param
+ switch_search_engine param
+ end
+
+ def action_on_reload_module config_has_changed
+ set_configuration_parameters if config_has_changed
+ end
+
+ # Scrolling behavior can be switch the search engine, or fetch another resulting page
+ def action_on_scroll scroll_up
+ if self.query.empty? # before the first query it is possible scroll through engines
+ if scroll_up
+ switch_search_engine self.scroll_engine_index +=1 # drawback: user scrolls a lot for up/down and this variable
+ else # gets a value far from (0..self.engines.length-1) limits.
+ switch_search_engine self.scroll_engine_index -=1 # user need to scroll back a lot to get in these limits again
+ end
+ else # later the first query scroll through the resulting pages
+ if scroll_up
+ fetch_next_resulting_page
+ else
+ fetch_previous_resulting_page
+ end
+ end
+ end
+
+ def switch_search_engine index
+ index = 0 if index < 0 # keep the lower limit
+ index = self.engines.length - 1 if index > self.engines.length - 1 # keep the upper limit
+ self.engine.name = self.engines.at(index)
+ reset_search_settings # clean the previous search when choosing a new one
+ inform_current_search_engine # inform in the bottom of the icon what is the new engine
+ end
+
+ def fetch_next_resulting_page
+ offset = self.page_of_displayed_links * self.number_of_displayed_links # the position of the first link in the self.engine.links array
+ if self.engine.links.size <= offset # user already scrolled by the fetched links, fetch more
+ inform_start_of_waiting_process
+ self.engine.links = self.engine.retrieve_links(self.query, offset) # receive the fetched links
+ inform_end_of_waiting_process
+ end
+ self.page_of_displayed_links += 1 # sequential page identification, lets go to the next
+ sub_icon_list = construct_sub_icon_list(offset)
+ refresh_sub_icon_list (sub_icon_list)
+ inform_current_page
+ end
+
+ # Since the previous results are already stored in self.engine.links, it is necessary just to
+ # select the its interval that starts with the first link of the previous page.
+ # An easier approach would be query google again with page-1 but it would result
+ # more queries to the page, consequently it has some drawbacks such as increasing the
+ # probability of Google block the mechanized access, more bandwith, etc.
+ def fetch_previous_resulting_page
+ if self.page_of_displayed_links > 1 # there is no previous page from the first one
+ self.page_of_displayed_links -= 1 # one page back
+ inicio = (self.page_of_displayed_links-1) * self.number_of_displayed_links # the first position of the link in the previous page
+ sub_icon_list = construct_sub_icon_list(inicio)
+ refresh_sub_icon_list (sub_icon_list)
+ end
+ inform_current_page
+ end
+
+ # Construct the menu using a set of fetched links
+ # Links can have thumbnails
+ def construct_sub_icon_list inicio
+ sub_icon_list =[]
inform_start_of_waiting_process
- case self.engine
- when Google; retrieve_links_from_google(self.query, offset)
- when Bing; retrieve_links_from_bing(self.query, offset)
- when Yahoo; retrieve_links_from_yahoo(self.query, offset)
- when Teoma; retrieve_links_from_teoma(self.query, self.page_of_displayed_links + 1) # first Teoma page is 1
- when YoutubeQ; retrieve_links_from_youtube(self.query, self.page_of_displayed_links + 1) # first Youtube page is 1
- when Webshots; retrieve_links_from_webshots(self.query, offset)
- when FlickrQ; retrieve_links_from_flickr(self.query, self.page_of_displayed_links + 1) # first Flickr page is 1
- when WikipediaQ; retrieve_links_from_wikipedia(self.query, offset)
+ threads =[]
+ self.engine.links[inicio, self.number_of_displayed_links].each do |link| # first let's download the thumbs if necessary
+ if link.instance_of?(ThumbnailedLink) and not link.downloaded_thumb? # class that provides thumbs and a not yet downloaded thumb
+ if self.show_thumbnail_preview # user want to see thumbs, so let's get it
+ threads << Thread.new {link.download_thumbnail}
+ end
+ end
+ end
+ threads.each {|t| t.join}
+ self.engine.links[inicio, self.number_of_displayed_links].each do |link| # later, get the rest of sub-icons data
+ if self.show_description_instead_url
+ sub_icon_list << link.description # user prefer see description with the sub-icon
+ else
+ sub_icon_list << link.shortened_url # user prefer see shortened url with the sub-icon
+ end
+ sub_icon_list << link.icon # the icon
+ sub_icon_list << link.id.to_s # the sequential id
end
inform_end_of_waiting_process
- end
- self.page_of_displayed_links += 1 # sequential page identification, lets go to the next
- sub_icon_list = construct_sub_icon_list(offset)
- refresh_sub_icon_list (sub_icon_list)
- inform_current_page
- end
-
- # Since the previous results are already stored in self.links, it is necessary just to
- # select the its interval that starts with the first link of the previous page.
- # An easier approach would be query google again with page-1 but it would result
- # more queries to the page, consequently it has some drawbacks such as increasing the
- # probability of Google block the mechanized access, more bandwith, etc.
- def fetch_previous_resulting_page
- if self.page_of_displayed_links > 1 # there is no previous page from the first one
- self.page_of_displayed_links -= 1 # one page back
- inicio = (self.page_of_displayed_links-1) * self.number_of_displayed_links # the first position of the link in the previous page
- sub_icon_list = construct_sub_icon_list(inicio)
- refresh_sub_icon_list (sub_icon_list)
- end
- inform_current_page
- end
-
- # Construct the menu using a set of fetched links
- # Links can have thumbnails
- def construct_sub_icon_list inicio
- sub_icon_list =[]
- inform_start_of_waiting_process
- threads =[]
- self.links[inicio, self.number_of_displayed_links].each do |link| # first let's download the thumbs if necessary
- if link.instance_of?(ThumbnailedLink) and not link.downloaded_thumb? # class that provides thumbs and a not yet downloaded thumb
- if self.show_thumbnail_preview # user want to see thumbs, so let's get it
-# threads << Thread.new(link) do |l| # parallelizing thumb download with multi-threading
-# l.download_thumbnail
-# end
- p link.download_thumbnail
- end
- end
- end
- threads.each { |t| t.join }
- self.links[inicio, self.number_of_displayed_links].each do |link| # later, get the rest of sub-icons data
- if self.show_description_instead_url
- sub_icon_list << link.description # user prefer see description with the sub-icon
- else
- sub_icon_list << link.shortened_url # user prefer see shortened url with the sub-icon
- end
- sub_icon_list << link.icon # the icon
- sub_icon_list << link.id.to_s # the sequential id
- end
- inform_end_of_waiting_process
- sub_icon_list
- end
-
- def refresh_sub_icon_list sub_icon_list
- @sub_icons.RemoveSubIcon("any") # remove all rendered sub-icons
- @sub_icons.AddSubIcons(sub_icon_list)
- end
-
- # Fetch a user-defined number links from Google with just one query. The parameter offset is the index of the first link.
- # It is better to fetch a higher amount of links in order to minimize the number of queries to be sent to google
- def retrieve_links_from_google(query, offset = 0)
- google = Nokogiri::HTML(open("#{Google}#{query}&start=#{offset}&num=#{self.number_of_fetched_links}"))
- self.stats = retrieve_google_result_stats(google)
- (google/"h3[@class='r']").search("a[@href]").each do |raw_link|
- url = raw_link['href']
- # Google "injects" its images results in the backlink-based results, desconsidering it
- unless url.include? "?q=#{query}"
- description = raw_link.inner_text
- self.links << Link.new(url, description)
- end
- end
- end
-
- # Retrieve informations from Google search stats
- # The stats array positions "Resultados first - second de aproximadamente third para fourth (fifth segundos)"
- def retrieve_google_result_stats google
- stats = (google/"div[@id='resultStats']")
- /^About ([\S]+) results \s\(([\S]+) seconds\)/.match(stats.inner_text)
- total, time = $1, $2
- "Search for #{self.query} returned #{total} results in #{time} seconds"
- end
-
- # Fetch links from Bing. Since Bing does not provide an in-url way to fetch more links than the 10
- # as Google does (&num=amount_to_fetch), this method will be called every time that 10 new results need to be shown
- def retrieve_links_from_bing(query, offset = 1)
- bing = Nokogiri::HTML(open("#{Bing}#{query}&first=#{offset}"))
- self.stats = retrieve_bing_result_stats(bing)
- (bing/"h3").search("a[@onmousedown]").each do |raw_link|
- url = raw_link['href']
- description = raw_link.inner_text
- self.links << Link.new(url, description)
- end
- end
-
- # Retrieve informations from Bing search stats
- # The stats array postions "first-second 'of' third 'results'"
- def retrieve_bing_result_stats bing
- stats = (bing/"span[@id='count']").inner_text
- total = stats.split.fifth
- "Search for #{self.query} returned #{total} results"
- end
-
- # Fetch links from Yahoo!. Since Yahoo! does not provide an in-url way to fetch more links than the 10
- # as Google does (&num=amount_to_fetch), this method will be called every time that 10 new results need to be shown
- def retrieve_links_from_yahoo(query, offset = 1)
- yahoo = Nokogiri::HTML(open("#{Yahoo}#{query}&b=#{offset}"))
- self.stats = retrieve_yahoo_result_stats(yahoo)
- (yahoo/"div[@class~='res']").each do |res| # divs are usually from 'res' class but some sub-results are 'res_indent' class
- url = (res/"span[@class='url']").inner_text
- description = (res/"h3/a").inner_text
- self.links << Link.new(url, description)
- end
- end
-
- # Retrieve informations from Yahoo! search stats
- def retrieve_yahoo_result_stats yahoo
- total = (yahoo/"strong[@id='resultCount']").inner_text
- "Search for #{self.query} returned #{total} results"
- end
-
- # Instead of the offset (the index of the first link), Teoma (ask.com) receives the offset with the *page* value
- # The href paremeter has the URL and the tag's content has the description.
- # Teoma results are placed in an <a> tag with id='r(digit)_t'.
- def retrieve_links_from_teoma(query, page = 1)
- teoma = Nokogiri::HTML(open("#{Teoma}#{query}&page=#{page}"))
- self.stats = retrieve_teoma_result_stats teoma
- (teoma/"a[@id$='_t']").each do |res| # any a tag with an id that ends with _t
- url = res['href']
- description = res.inner_text
- self.links << Link.new(url, description)
- end
- end
-
- def retrieve_teoma_result_stats teoma
- total = teoma.at("//span[@id='indexLast']").next.next.inner_text
- "Search for #{self.query} returned #{total} results"
- end
-
- # Fetch links from english Wikipedia. It is necessary to set user agent, or the connection is Forbidden (403)
- def retrieve_links_from_wikipedia(query, offset = 0)
- wikipedia = Nokogiri::HTML(open("#{WikipediaQ}#{query}&offset=#{offset}&limit=#{self.number_of_fetched_links}", 'User-Agent' => 'ruby'))
- self.stats = retrieve_webshots_result_wikipedia wikipedia
- (wikipedia/"ul[@class='mw-search-results']/li/a").each do |res|
- url = res['href']
- description = res['title']
- self.links << Link.new("#{Wikipedia}#{url}", description)
- end
- end
-
- def retrieve_webshots_result_wikipedia wikipedia
- total = wikipedia.at("div[@class='results-info']/ul/li/b").next.next.inner_text
- "Search for #{self.query} returned #{total} results"
- end
-
- # url, e.g, /watch?v=WwojCsQ3Fa8
- # thumb_url, e.g, "http://i4.ytimg.com/vi/WwojCsQ3Fa8/default.jpg"
- def retrieve_links_from_youtube(query, page = 1)
- youtube = Nokogiri::HTML(open("#{YoutubeQ}#{query}&page=#{page}"))
- self.stats = retrieve_youtube_result_stats youtube
- (youtube/"a[@id^='video-long-title-']").each do |res| # 'a' tag has id which starts with "video-long-title-"
- url = res['href']
- description = res.inner_text
- video_id = url.split('=').last # /watch?v=WwojCsQ3Fa8 => WwojCsQ3Fa8 => video_id
- thumb_url = "http://i4.ytimg.com/vi/#{video_id}/default.jpg"
- self.links << ThumbnailedLink.new("#{Youtube}#{url}", description, thumb_url)
- end
- end
-
- def retrieve_youtube_result_stats youtube
- total = youtube.at("div[@class='name']").inner_text.split.last
- "Search for #{self.query} returned #{total} results"
- end
-
- # url, e.g, http://good-times.webshots.com/photo/2500137270102572130
- # thumb_url, e.g, http://thumb10.webshots.net/t/24/665/1/37/27/2500137270102572130SmNoHt_th.jpg"
- def retrieve_links_from_webshots(query, offset = 0)
- webshots = Nokogiri::HTML(open("#{Webshots}#{query}&start=#{offset}"))
- self.stats = retrieve_webshots_result_stats webshots
- (webshots/"li[@onmouseover^='wsPopup']").each do |res| # li tag has onmouseover which starts with "wsPopup"
- url = thumb_url = description = ""
- (res/"div[@class='photo']/a/img").each do |raw_thumb|
- thumb_url = raw_thumb['src']
- end
- (res/"div[@class='title']/p/a").each do |raw_link|
- url = raw_link['href']
- description = raw_link.inner_text
- end
- self.links << ThumbnailedLink.new(url, description, thumb_url)
- end
- end
-
- def retrieve_webshots_result_stats webshots
- total = webshots.at("div[@id='resultCount']/strong").inner_text
- "Search for #{self.query} returned #{total} results"
- end
-
- # url, e.g., /photos/21078069@N03/2780732654/
- # thumb_url, e.g., http://farm4.static.flickr.com/3255/2780732654_b7cbb2fb98_t.jpg"
- def retrieve_links_from_flickr(query, page = 1)
- flickr = Nokogiri::HTML(open("#{FlickrQ}#{query}#page=#{page}"))
- (flickr/"span[@class='photo_container pc_t']/a").each do |res|
- url = res['href']
- description = res['title']
- thumb_url = res.at("img")['src']
- self.links << ThumbnailedLink.new("#{Flickr}#{url}", description, thumb_url)
- end
- end
-
- def inform_start_of_waiting_process
- @icon.SetQuickInfo("...")
- end
-
- def inform_end_of_waiting_process
- @icon.SetQuickInfo("")
- end
-
- def inform_current_page
- if self.show_current_page
- @icon.SetQuickInfo("#{self.page_of_displayed_links}")
- else
+ sub_icon_list
+ end
+
+ def refresh_sub_icon_list sub_icon_list
+ @sub_icons.RemoveSubIcon("any") # remove all rendered sub-icons
+ @sub_icons.AddSubIcons(sub_icon_list)
+ end
+
+ def inform_start_of_waiting_process
+ @icon.SetQuickInfo("...")
+ end
+
+ def inform_end_of_waiting_process
@icon.SetQuickInfo("")
end
- end
+
+ def inform_current_page
+ if self.show_current_page
+ @icon.SetQuickInfo("#{self.page_of_displayed_links}")
+ else
+ @icon.SetQuickInfo("")
+ end
+ end
- # Inform in the bottom of the icon what is new search engine
- def inform_current_search_engine
- current = case self.engine
- when Google; "Google"
- when Bing; "Bing"
- when Yahoo; "Yahoo!"
- when Teoma; "Teoma"
- when YoutubeQ; "Youtube"
- when Webshots; "Webshots"
- when FlickrQ; "Flickr"
- when WikipediaQ; "Wikipedia"
+ # Inform in the bottom of the icon what is new search engine
+ def inform_current_search_engine
+ @icon.SetQuickInfo(self.engine.name)
end
- @icon.SetQuickInfo(current)
end
+
+ def self.log(msg)
+ $stderr.puts "WEBSEARCH_DEBUG: #{msg}"
+ end
+
end
-applet = Applet.new applet_object, applet_sub_icons_object, applet_name
-applet.start
-loop = DBus::Main.new
-loop << bus
-loop.run
+WebSearch.start
=== modified file 'WebSearch/WebSearch.conf'
--- WebSearch/WebSearch.conf 2010-05-10 11:35:51 +0000
+++ WebSearch/WebSearch.conf 2010-05-13 01:49:21 +0000
@@ -1,4 +1,4 @@
-#!en;0.7.3
+#!en;1.0.0
#[gtk-about]
[Icon]
@@ -88,9 +88,6 @@
#i[5;10] Maximum number of results shown :
#{in sub-icons.}
number of displayed links = 10
-#l[10;20;30;50;100] Number of prefetched results :
-#{higher is better. Applicable for Google, and Wikipedia.}
-number of fetched links = 4
#b Show the current page as the icon label ?
show current page = true
#b Show the description of the result instead of its URL in the sub-icons ?
=== modified file 'WebSearch/auto-load.conf'
--- WebSearch/auto-load.conf 2010-05-10 11:35:51 +0000
+++ WebSearch/auto-load.conf 2010-05-13 01:49:21 +0000
@@ -10,4 +10,4 @@
category = 2
# Version of the applet; change it everytime you change something in the config file. Don't forget to update the version both in this file and in the config file.
-version = 0.7.3
+version = 1.0.0
=== added directory 'WebSearch/lib'
=== added file 'WebSearch/lib/Bing.rb'
--- WebSearch/lib/Bing.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Bing.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,29 @@
+class Bing < Engine
+
+ def initialize
+ self.base_url = "http://www.bing.com"
+ self.query_url = "#{self.base_url}/search?q=" # 10 results per page
+ super
+ end
+
+ # Fetch links from Bing. Since Bing does not provide an in-url way to fetch more links than the 10
+ # as Google does (&num=amount_to_fetch), this method will be called every time that 10 new results need to be shown
+ def retrieve_links(query, offset = 1)
+ bing = Nokogiri::HTML(open("#{self.query_url}#{query}&first=#{offset}"))
+ self.stats = retrieve_bing_result_stats(bing, query)
+ (bing/"h3").search("a[@onmousedown]").each do |raw_link|
+ url = raw_link['href']
+ description = raw_link.inner_text
+ self.links << Link.new(url, description)
+ end
+ self.links
+ end
+
+ # Retrieve informations from Bing search stats
+ # The stats array postions "first-second 'of' third 'results'"
+ def retrieve_bing_result_stats bing, query
+ stats = (bing/"span[@id='count']").inner_text
+ total = stats.split.fifth
+ "Search for #{query} returned #{total} results"
+ end
+end
=== added file 'WebSearch/lib/Engine.rb'
--- WebSearch/lib/Engine.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Engine.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,25 @@
+class Engine # Factory + Inheritance
+
+ require './lib/Link.rb'
+
+ attr_accessor :name, :stats, :links, :base_url, :query_url, :engine
+
+ def initialize
+ self.links =[]
+ end
+
+ def connect
+ WebSearch.log "connecting to #{self.name}"
+ self.engine = case self.name
+ when "Google"; require './lib/Google.rb'; Google.new # lazy loading applied to engines libraries
+ when "Bing"; require './lib/Bing.rb'; Bing.new
+ when "Yahoo!"; require './lib/Yahoo.rb'; Yahoo.new
+ when "Teoma"; require './lib/Teoma.rb'; Teoma.new
+ when "Wikipedia"; require './lib/Wikipedia.rb'; Wikipedia.new
+ when "Youtube"; require './lib/Youtube.rb'; Youtube.new
+ when "Webshots"; require './lib/Webshots.rb'; Webshots.new
+ when "Flickr"; require './lib/Flickr.rb'; Flickr.new
+ end
+ end
+
+end
=== added file 'WebSearch/lib/Exceptions.rb'
--- WebSearch/lib/Exceptions.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Exceptions.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,11 @@
+module Exceptions
+ class UnknownEngineException < StandardError
+ def initialize(engine)
+ @engine = engine
+ end
+ def message
+ "Unknown search engine #{@engine}"
+ end
+ end
+end
+
=== added file 'WebSearch/lib/Flickr.rb'
--- WebSearch/lib/Flickr.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Flickr.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,22 @@
+class Flickr < Engine
+
+ def initialize
+ self.base_url = "http://www.flickr.com"
+ self.query_url = "#{self.base_url}/search/?q=" # 28 results per page
+ super
+ end
+
+ # url, e.g., /photos/21078069@N03/2780732654/
+ # thumb_url, e.g., http://farm4.static.flickr.com/3255/2780732654_b7cbb2fb98_t.jpg"
+ def retrieve_links(query, page = 1)
+ flickr = Nokogiri::HTML(open("#{self.query_url}#{query}#page=#{page}"))
+ (flickr/"span[@class='photo_container pc_t']/a").each do |res|
+ url = res['href']
+ description = res['title']
+ thumb_url = res.at("img")['src']
+ self.links << ThumbnailedLink.new("#{self.base_url}#{url}", description, thumb_url)
+ end
+ self.links
+ end
+
+end
=== added file 'WebSearch/lib/Google.rb'
--- WebSearch/lib/Google.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Google.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,37 @@
+class Google < Engine
+
+ #attr_accessor :name, :stats, :links, :base_url, :query_url, :number_of_fetched_links
+ attr_accessor :number_of_fetched_links
+
+ def initialize
+ self.number_of_fetched_links = 100
+ self.base_url = "http://www.google.com"
+ self.query_url = "#{self.base_url}/search?q=" # (10,20,30,50,100) results per page"
+ super
+ end
+
+ # Fetch a user-defined number links from Google with just one query. The parameter offset is the index of the first link.
+ # It is better to fetch a higher amount of links in order to minimize the number of queries to be sent to google
+ def retrieve_links (query, offset)
+ google = Nokogiri::HTML(open("#{self.query_url}#{query}&start=#{offset}&num=#{self.number_of_fetched_links}"))
+ self.stats = retrieve_result_stats(google, query)
+ (google/"h3[@class='r']").search("a[@href]").each do |raw_link|
+ url = raw_link['href']
+ # Google "injects" its images results in the backlink-based results, desconsidering it
+ unless url.include? "?q=#{query}"
+ description = raw_link.inner_text
+ self.links << Link.new(url, description)
+ end
+ end
+ self.links
+ end
+
+ # Retrieve informations from Google search stats
+ # The stats array positions "Resultados first - second de aproximadamente third para fourth (fifth segundos)"
+ def retrieve_result_stats(google, query)
+ stats = (google/"div[@id='resultStats']")
+ /^About ([\S]+) results \s\(([\S]+) seconds\)/.match(stats.inner_text)
+ total, time = $1, $2
+ "Search for #{query} returned #{total} results in #{time} seconds"
+ end
+end
=== added file 'WebSearch/lib/Link.rb'
--- WebSearch/lib/Link.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Link.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,63 @@
+class Link
+ attr_accessor :url, :description, :id, :icon, :shortened_url
+ @@next_id = 0 # sequential id "static"
+
+ def initialize (url = "", description = "", icon = File.expand_path("./icon"))
+ self.url = url
+ self.description = description
+ self.id = @@next_id += 1
+ self.icon = icon
+ self.shortened_url = shorten url
+ end
+
+ def shorten (string, count = 45) # TODO: count as a parameter in .conf file
+ if string.length > count
+ shortened = string.slice(0 .. count-1)
+ shortened + "..." if shortened
+ else
+ string
+ end
+ end
+
+ def self.reset_next_id
+ class_variable_set(:@@next_id, 0) # metaprogramming to reset the instance counter
+ end
+end
+
+class ThumbnailedLink < Link # a nice refactoring with the old YoutubeLink class
+ attr_accessor :image_id, :thumb_url, :thumb_path, :downloaded_thumb
+ @@next_image_id = 0
+
+ def initialize(url = "", description = "", thumb_url = "")
+ super(url, description)
+ self.thumb_url = thumb_url
+ self.image_id = @@next_image_id += 1
+ self.thumb_path = define_thumbnail_path
+ self.downloaded_thumb = false
+ end
+
+ def download_thumbnail # remember that is being threaded outside
+ # download thumb quietly (q), name it (O) '#{image_id}.jpg' and take it to the directory named as engine
+ IO.popen("wget -q #{self.thumb_url} -O #{self.thumb_path}") do |io| # open the pipe
+ IO.select([io], nil, nil, 0.5) # non-blocking download through the pipe
+ end
+ self.downloaded_thumb = true
+ self.icon = File.expand_path(self.thumb_path)
+ end
+
+ # Thumbnail path composed by the search engine and image id
+ # Extract from the thumb_url the what is the search engine using the the core of the url
+ def define_thumbnail_path
+ directories = %w(youtube webshots flickr) # directories names like engines names
+ directory = directories.detect {|d| self.url.include?(d)} # search for engines names in url
+ "./images/#{directory}/#{self.image_id}.jpg"
+ end
+
+ def downloaded_thumb?
+ self.downloaded_thumb
+ end
+
+ def self.reset_next_image_id
+ class_variable_set(:@@next_image_id, 0) # metaprogramming to reset the instance counter
+ end
+end
=== added file 'WebSearch/lib/Teoma.rb'
--- WebSearch/lib/Teoma.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Teoma.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,26 @@
+class Teoma < Engine
+
+ def initialize
+ self.base_url = "http://www.teoma.com"
+ self.query_url = "#{self.base_url}/web?q=" # 10 results per page
+ super
+ end
+ # Instead of the offset (the index of the first link), Teoma (ask.com) receives the offset with the *page* value
+ # The href paremeter has the URL and the tag's content has the description.
+ # Teoma results are placed in an <a> tag with id='r(digit)_t'.
+ def retrieve_links(query, page = 1)
+ teoma = Nokogiri::HTML(open("#{self.query_url}#{query}&page=#{page}"))
+ self.stats = retrieve_teoma_result_stats(teoma, query)
+ (teoma/"a[@id$='_t']").each do |res| # any a tag with an id that ends with _t
+ url = res['href']
+ description = res.inner_text
+ self.links << Link.new(url, description)
+ end
+ self.links
+ end
+
+ def retrieve_teoma_result_stats(teoma, query)
+ total = teoma.at("//span[@id='indexLast']").next.next.inner_text
+ "Search for #{query} returned #{total} results"
+ end
+end
=== added file 'WebSearch/lib/Webshots.rb'
--- WebSearch/lib/Webshots.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Webshots.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,27 @@
+class Webshots < Engine
+
+ def initialize
+ self.base_url = "http://www.webshots.com"
+ self.query_url = "#{self.base_url}/search?querySource=community&query=" # 36 results per page
+ super
+ end
+
+ # url, e.g, http://good-times.webshots.com/photo/2500137270102572130
+ # thumb_url, e.g, http://thumb10.webshots.net/t/24/665/1/37/27/2500137270102572130SmNoHt_th.jpg"
+ def retrieve_links(query, offset = 0)
+ webshots = Nokogiri::HTML(open("#{self.query_url}#{query}&start=#{offset}"))
+ self.stats = retrieve_webshots_result_stats(webshots, query)
+ (webshots/"a[@class='searchListItemLink']").each do |res|
+ url = res['href']
+ description = res['title']
+ thumb_url = res.at("img[@class='searchListItemImg']")['src']
+ self.links << ThumbnailedLink.new(url, description, thumb_url)
+ end
+ self.links
+ end
+
+ def retrieve_webshots_result_stats(webshots, query)
+ total = webshots.at("span[@class='resultsNo']/strong").inner_text
+ "Search for #{query} returned #{total} results"
+ end
+end
=== added file 'WebSearch/lib/Wikipedia.rb'
--- WebSearch/lib/Wikipedia.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Wikipedia.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,28 @@
+class Wikipedia < Engine
+
+ attr_accessor :number_of_fetched_links
+
+ def initialize
+ self.number_of_fetched_links = 100
+ self.base_url = "http://en.wikipedia.org"
+ self.query_url = "#{self.base_url}/w/index.php?title=Special:Search&search=" # parameter "limit" results per page
+ super
+ end
+
+ # Fetch links from english Wikipedia. It is necessary to set user agent, or the connection is Forbidden (403)
+ def retrieve_links(query, offset = 0)
+ wikipedia = Nokogiri::HTML(open("#{self.query_url}#{query}&offset=#{offset}&limit=#{self.number_of_fetched_links}", 'User-Agent' => 'ruby'))
+ self.stats = retrieve_webshots_result_wikipedia(wikipedia, query)
+ (wikipedia/"ul[@class='mw-search-results']/li/a").each do |res|
+ url = res['href']
+ description = res['title']
+ self.links << Link.new("#{self.query_url}#{url}", description)
+ end
+ self.links
+ end
+
+ def retrieve_webshots_result_wikipedia (wikipedia, query)
+ total = wikipedia.at("div[@class='results-info']/ul/li/b").next.next.inner_text
+ "Search for #{query} returned #{total} results"
+ end
+end
=== added file 'WebSearch/lib/Yahoo.rb'
--- WebSearch/lib/Yahoo.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Yahoo.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,26 @@
+class Yahoo < Engine
+
+ def initialize
+ self.base_url = "http://search.yahoo.com"
+ self.query_url = "#{self.base_url}/search?p=" # 10 results per page
+ super
+ end
+ # Fetch links from Yahoo!. Since Yahoo! does not provide an in-url way to fetch more links than the 10
+ # as Google does (&num=amount_to_fetch), this method will be called every time that 10 new results need to be shown
+ def retrieve_links(query, offset = 1)
+ yahoo = Nokogiri::HTML(open("#{self.query_url}#{query}&b=#{offset}"))
+ self.stats = retrieve_yahoo_result_stats(yahoo, query)
+ (yahoo/"div[@class~='res']").each do |res| # divs are usually from 'res' class but some sub-results are 'res_indent' class
+ url = (res/"span[@class='url']").inner_text
+ description = (res/"h3/a").inner_text
+ self.links << Link.new(url, description)
+ end
+ self.links
+ end
+
+ # Retrieve informations from Yahoo! search stats
+ def retrieve_yahoo_result_stats (yahoo, query)
+ total = (yahoo/"strong[@id='resultCount']").inner_text
+ "Search for #{query} returned #{total} results"
+ end
+end
=== added file 'WebSearch/lib/Youtube.rb'
--- WebSearch/lib/Youtube.rb 1970-01-01 00:00:00 +0000
+++ WebSearch/lib/Youtube.rb 2010-05-13 01:49:21 +0000
@@ -0,0 +1,28 @@
+class Youtube < Engine
+
+ def initialize
+ self.base_url = "http://www.youtube.com"
+ self.query_url = "#{self.base_url}/results?search_query=" # 20 results per page
+ super
+ end
+
+ # url, e.g, /watch?v=WwojCsQ3Fa8
+ # thumb_url, e.g, "http://i4.ytimg.com/vi/WwojCsQ3Fa8/default.jpg"
+ def retrieve_links(query, page = 1)
+ youtube = Nokogiri::HTML(open("#{self.query_url}#{query}&page=#{page}"))
+ self.stats = retrieve_youtube_result_stats(youtube, query)
+ (youtube/"a[@id^='video-long-title-']").each do |res| # 'a' tag has id which starts with "video-long-title-"
+ url = res['href']
+ description = res.inner_text
+ video_id = url.split('=').last # /watch?v=WwojCsQ3Fa8 => WwojCsQ3Fa8 => video_id
+ thumb_url = "http://i4.ytimg.com/vi/#{video_id}/default.jpg"
+ self.links << ThumbnailedLink.new("#{self.base_url}#{url}", description, thumb_url)
+ end
+ self.links
+ end
+
+ def retrieve_youtube_result_stats (youtube, query)
+ total = youtube.at("div[@class='name']").inner_text.split.last
+ "Search for #{query} returned #{total} results"
+ end
+end
Follow ups