Storing the output from 'curl' commands directly as shell variables is
very inefficent, and requires much more RAM gravity.sh any time there is
an update to the block lists (and especially on the first run). Store
the raw blocklists in a temporary file on disk, and process those.
This will increase the swap file to 500MB before downloading the lists.
Most of the issue comes from the mahakala list, which is so large. If
no swap file is found, one is created.
Prepend "^" to start of latentWhitelist.txt lines.
The -x switch requires a full line match of the regexp, where as -w
will try to find the match somewhere in the line, looking for work
breaks. Combined with turning the whitelist lines into full regexps,
this results in significantly faster parsing.
Having "^" prepended to the lines also keeps false whitelisting from
occuring, such as the following example:
If whitelist.txt contains "google.com" it would whitelist many other
sites that end in "google.com" as long as there is a non-word
character preceeding the google (such as "-", or ".").
manually running gravity.sh
This will print "Getting $domain list... " for each domain, followed
by either "Done" if data was received and validated, or "Skipping
list because it does not have any new entries" if no updates were
needed.
I also wanted to replace the for loop iterating over indices with
something like:
`for url in $sources[@]}`
It made the use of `$i` in the save location more annoying though.
Some people use a hostname other than raspberrypi, so their hostname
did not resolve to 127.0.0.1. I replaced that hardcoded value with a
variable so that does not happen.
I also added a few comments and minor formatting adjustments,.
Pushing files so they are available when the new article gets posted.
If the Pi's loopback is set in the hosts file, clients using it as a
DNS server will try to connect to their own loopback, which does not
have a Web server. So the real IP of the Pi is used. It is
recommended to use a static IP since this will be acting as a server.
Made one small change from some hard coded values to a variable.
Originally, I had this set to /run/shm (in RAM) but ran into errors
when the list reached 900,000 entries.
Then I moved it to /tmp.
Finally, I decided to just put the files in the pihole dir so they are
available after reboots. This will help with only downloading the
lists when absolutely needed--respecting the bandwidth of the people
serving the lists.
It is also possible to add addn-hosts=/path/to/hosts.conf within the
dnsmasq.conf file if you don't want to use hosts. For simplicity and
speed, I just use the regular hosts file.
Still need to get lighted to use IPv6. I am doing this because some
ads can get through using IPv6 if the IPv4 version is blocked. Also,
it seems to work fine as far as performance even though it doubles the
file size...
Also added a few comments for better documentation.