← Back to team overview

openlp-core team mailing list archive

Re: [Merge] lp:~trb143/openlp/duphash into lp:openlp

 

Review: Needs Fixing



Diff comments:

> 
> === modified file 'openlp/plugins/songs/forms/duplicatesongremovalform.py'
> --- openlp/plugins/songs/forms/duplicatesongremovalform.py	2017-06-09 06:06:49 +0000
> +++ openlp/plugins/songs/forms/duplicatesongremovalform.py	2017-08-08 19:40:26 +0000
> @@ -174,6 +174,17 @@
>                      self.duplicate_search_progress_bar.setValue(1)
>                      self.notify_no_duplicates()
>                      return
> +
> +                search_results = \
> +                    self.plugin.manager.session.query(func.count(Song.id), Song).group_by(Song.song_hash).all()
> +                for y, x in search_results:
> +                    hash = x.song_hash
> +                    if y > 1:
> +                        dupls = self.plugin.manager.session.query(Song).filter(Song.song_hash==hash).all()
> +                        for d in dupls:
> +                            print(d.title)
> +                        print("-------")

You know my "love" of print statements...

> +
>                  # With x songs we have x*(x - 1) / 2 comparisons.
>                  max_progress_count = max_songs * (max_songs - 1) // 2
>                  self.duplicate_search_progress_bar.setMaximum(max_progress_count)
> 
> === modified file 'openlp/plugins/songs/lib/__init__.py'
> --- openlp/plugins/songs/lib/__init__.py	2017-08-01 20:59:41 +0000
> +++ openlp/plugins/songs/lib/__init__.py	2017-08-08 19:40:26 +0000
> @@ -390,6 +390,11 @@
>          song.add_author(author)
>      if song.copyright:
>          song.copyright = CONTROL_CHARS.sub('', song.copyright).strip()
> +    import hashlib

Not sure why we have the import here...

> +    m = hashlib.md5()
> +    m.update(song.search_lyrics.encode('utf-8'))
> +    print("{a} {b}".format(a=str(m.digest_size), b=str(m.hexdigest())))
> +    song.song_hash = str(m.hexdigest())

Remove the print and condense this down to:
song.song_hash = hashlib.md5(song.search_lyrics.encode('utf-8')).hexdigest()

My comment still stands, deduplication is more than just comparing hashes, and there are cases where duplication is intentional.

>  
>  
>  def get_encoding(font, font_table, default_encoding, failed=False):


-- 
https://code.launchpad.net/~trb143/openlp/duphash/+merge/328736
Your team OpenLP Core is subscribed to branch lp:openlp.


References