linuxdcpp-team team mailing list archive
-
linuxdcpp-team team
-
Mailing list archive
-
Message #08986
[Bug 1774502] Re: Free GeoIP Database Format Change
<eMTee> I've checked klondike's geoip lib and as I precieve it does the lookups staight from the data file. That doesn't sound too efficient. Also I'm not sure the whole thing is thread-safe at all. So maybe it's Maxmind's lib is the only option...
<cologic> I agree with you (or, well, your conclusion -- whatever your reasoning was) that klondike's design, while it is as he says, a "This is a purposefully slow implementation for memory constrained environments", achieves that fseek()ing and similar around the file in a thread-unsafe way, to avoid mmap() and other access methods. That the mmdb_t data structure itself can only meaningfully be used by one thread at a time, so the choice ends up being multiply loading in the files -- which, actually, sort of works with his design, since it is so low-memory-usage -- or keep fewer mmdb_t objects around than threads, and mux access to them somehow (say, mutexes: "TODO: mutex handling").
<cologic> It's a reasonable tradeoff for a different environment than DC++ lives in these days.
<cologic> He jumps through endless hoops just not to keep anything he doesn't need in memory, IMO to the code/design's detriment in this context.
<cologic> The whole dbip-country-lite-2020-05.mmdb is about 5MB, which seems completely acceptable to mmap() or just load completely into memory and not deal with all those seeks, mutex questions, etc.
<cologic> Just to take the most obvious option (the same overall approach, just, all in memory). Other options exist too.
<cologic> The completely unoptimized representation (text-based, no fancy reused DAG-like substructures, etc) from their CSV file is still only 17MB. So not saying this is a great option, but just a dumb CSV parser on that could work too.
<cologic> line by line of 2.17.115.0,2.17.115.255,GB -- trivial.
<cologic> The other concern I have with the style of klondike's code is that it's full of basically untrusted-data-driven pointer-chasing (via seeks at the moment, but still) which as klondike acknowledges can result in exponential blowup.
<cologic> It's not his fault -- he did what one could with the format -- but from working on other code dealing with a conceptually similar file format, I'm not a fan of it.
<cologic> There's something to be said for the relative simplicit/verifiability/fewer-weird-failure-modes of a constrained CSV parser which builds an in-memory representation with some reasonable std::foo data structure.
<cologic> But I suspect that if one fuzzed most of these geoip format parsers, the results would be dire
--
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1774502
Title:
Free GeoIP Database Format Change
Status in DC++:
Confirmed
Bug description:
"Updated versions of the GeoLite Legacy databases are now only available to redistribution license customers, although anyone can continue to download the March 2018 GeoLite Legacy builds. Starting January 2, 2019, the last build will be removed from our website. GeoLite Legacy database users will need to switch to the GeoLite2 or commercial GeoIP databases and update their integrations by January 2, 2019."
https://dev.maxmind.com/geoip/legacy/geolite/
To manage notifications about this bug go to:
https://bugs.launchpad.net/dcplusplus/+bug/1774502/+subscriptions
References