Removes more duplicates that uniq could not find

http://jacobsalmela.com/raspberry-pi-ad-blocker-advanced-setup/#comment-
1860675175

Thanks to napgravy for figuring this out.  It seems the DOS-style
line-endings that prevented uniq from getting rid of them.  This
reduces the ad domains from ~140,000 to around ~120,000 but it is much
more accurate.
This commit is contained in:
jacobsalmela 2015-02-17 16:10:28 -06:00
parent b7573a533a
commit 47baa1a6fd

View file

@ -27,7 +27,7 @@ curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0' -
# Sort the aggregated results and remove any duplicates # Sort the aggregated results and remove any duplicates
echo "Removing duplicates and formatting to address=/<ad domain>/"$piholeIP echo "Removing duplicates and formatting to address=/<ad domain>/"$piholeIP
cat /tmp/matter.txt | sort | uniq | sed '/^$/d' | awk -v "IP=$piholeIP" '{sub(/\r$/,""); print "address=/"$0"/"IP}' > /tmp/andLight.txt cat /tmp/matter.txt | sed $'s/\r$//' | sort | uniq | sed '/^$/d' | awk -v "IP=$piholeIP" '{sub(/\r$/,""); print "address=/"$0"/"IP}' > /tmp/andLight.txt
# Count how many domains were added so it can be displayed to the user # Count how many domains were added so it can be displayed to the user
numberOfAdsBlocked=$(cat /tmp/andLight.txt | wc -l | sed 's/^[ \t]*//') numberOfAdsBlocked=$(cat /tmp/andLight.txt | wc -l | sed 's/^[ \t]*//')