You are not logged in.
- Topics: Active | Unanswered
Pages: 1
#1 2009-12-05 9:41 pm
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Gmail goodness, well, from us anyway
As some of you are aware, gmail sucks for spam. They dont seem to care about shutting down email accounts very quickly and they breach RFC by allowing all sorts of random . and + in the email addresses that all "normalize" down to the same
joe.blogs+spammer@gmail.com
j.o.e.blogs+spammer@gmail.com
j.oe.bl.ogs+spa.m.m.e.r@gmail.com
joeblog.....s@gmail.com
Are all the same email accounts. Until now, they "fooled" our database as they werent identical to the database.
Well, good news, Ive put normalization into the API.
Now, the above email addresses will all give positive results to a query if joeblogs@gmail.com is listed.
The homepage will now show the original email address in [ ] under the normalized version of it. This is for your information but mainly so that google will index it and return search results to people getting spammed
I know, Im awesome, thanks
Offline
#2 2009-12-06 2:17 am
- hhopper
- Member
- From: Florida
- Registered: 2008-11-20
- Posts: 151
- Website
Re: Gmail goodness, well, from us anyway
Actually, you ARE awesome!
Hop
Offline
#3 2009-12-06 2:13 pm
- MacHeadCase
- Member
- From: Montréal, Québec
- Registered: 2008-09-07
- Posts: 346
- Website
Re: Gmail goodness, well, from us anyway
Bravo, pedigree! Bravo!!!
Offline
#4 2009-12-06 2:46 pm
- TVwas.com
- Member
- Registered: 2009-10-24
- Posts: 6
Re: Gmail goodness, well, from us anyway
We are not worthy! We are not worthy!
But we'll use the function with gratitude...
Offline
#5 2009-12-06 2:48 pm
- Alessandra
- Member
- From: Chicago, Illinois, USA
- Registered: 2009-11-29
- Posts: 165
- Website
Re: Gmail goodness, well, from us anyway
Very nice!
Offline
#6 2009-12-06 11:31 pm
- techtirie
- Member
- From: Ireland
- Registered: 2009-10-20
- Posts: 56
- Website
Re: Gmail goodness, well, from us anyway
Thanks
you are a star.
Offline
#7 2009-12-13 6:31 am
- nolimitssjca
- Member
- From: San Jose
- Registered: 2009-05-31
- Posts: 6
- Website
Re: Gmail goodness, well, from us anyway
Back to normal
Offline
#8 2019-12-14 6:11 am
- Visman
- Member
- Registered: 2019-12-14
- Posts: 6
Re: Gmail goodness, well, from us anyway
You only do this with Google?
There is such information:
1.
Addresses of this form, using various separators between the base name and the tag, are supported by several email services, including Runbox (plus), Gmail (plus),[16] Rackspace Email (plus), Yahoo! Mail Plus (hyphen),[17] Apple's iCloud (plus), Outlook.com (plus),[18] ProtonMail (plus),[19] FastMail (plus and Subdomain Addressing),[20] MMDF (equals), Qmail and Courier Mail Server (hyphen).[21][22] Postfix and Exim allow configuring an arbitrary separator from the legal character set.[23][24]
https://en.wikipedia.org/wiki/Email_address#Address_tags
2. username @ googlemail.com === username @ gmail.com
3. username @ yandex.ru === username @ yandex.com (I personally checked by sending myself an email on the com domain, although my mailbox is on the ru domain. Perhaps all Yandex domains are duplicates of the ru domain.)
Offline
#9 2019-12-16 12:58 am
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
Thanks for this. I'll look at how I can update the code in order to handle this. The API strips all plus symbols (up to the @) for all domains, but its very useful to know about whcih domains do further normalisation.
Offline
#10 2019-12-24 4:41 pm
- Visman
- Member
- Registered: 2019-12-14
- Posts: 6
Re: Gmail goodness, well, from us anyway
Made such an address normalizer https://github.com/MioVisman/NormEmail
So far, such a result of work:
// some string
ExampLe => example
exaMple.COM => example.com
.example.com => .example.com
@example.com => example.com
"example.com => "example.com
"USER+++NAME@EXAMpLE.com => "USER+++NAME@example.com
googlemail.com => gmail.com
pm.me => protonmail.com
yandex.tj => yandex.ru
ya.ru => yandex.ru
// Unicode
ПОЛЬЗОВАТЕЛЬ@домен.РУ => пользователь@xn--d1acufc.xn--p1ag
пользователь+тег@домен.ру => пользователь@xn--d1acufc.xn--p1ag
// Gmail
User.namE+tag@gmail.com => username@gmail.com
u.sern.ame+tag+tag+tag@googlemail.com => username@gmail.com
// Protonmail
u_s.e-rname+tag@pm.me => username@protonmail.com
user-name@protonmail.ch => username@protonmail.com
// Yahoo (.com, .ae, .at, ...)
username-tag@yahoo.com => username@yahoo.com
user+name-tag@yahoo.fr => user+name@yahoo.fr
// Yandex (13 domains)
user.name+tag@яндекс.рф => user-name@yandex.ru
user-name@yandex.com => user-name@yandex.ru
username@ya.ru => username@yandex.ru
Offline
#11 2019-12-25 12:51 am
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
awesome. I'm going to port all of this to the API
I was about to ask if these are confirmed all the same, but you answered that
a.l.e.x.a.l.e.x.2.091@yandex.ru | 1 | 2019-08-01 13:40:03 |
| a.l.e.x.a.l.ex2.0.91@yandex.ru | 1 | 2019-08-12 16:35:04 |
| a.l.e.x.a.le.x.2091@yandex.ru | 1 | 2019-08-03 01:12:34 |
| a.l.e.x.al.ex.200.1@yandex.ru | 1 | 2019-08-07 15:05:16 |
| a.l.e.x.al.ex2.0.0.1@yandex.ru | 1 | 2019-08-04 13:10:19 |
| a.l.e.x.ale.x.2.0.0.1@yandex.ua | 2 | 2019-08-09 07:10:22 |
| a.l.e.x.alex.2.00.1@yandex.ru | 1 | 2019-08-07 15:05:06 |
| a.l.e.x.alex.200.1@yandex.ua | 1 | 2019-08-01 14:23:57 |
| a.l.e.x.alex2.091@yandex.ru | 1 | 2019-08-04 20:00:10 |
| a.l.e.xa.l.e.x.20.0.1@yandex.ua | 1 | 2019-08-10 12:35:14 |
| a.l.e.xa.le.x.2.0.0.1@yandex.com | 1 | 2019-08-08 23:05:06 |
| a.l.e.xa.le.x.2.0.0.1@yandex.ru | 1 | 2019-08-07 08:42:31 |
| a.l.e.xal.e.x2.0.91@yandex.ua | 1 | 2019-08-11 02:25:02 |
| a.l.e.xalex20.0.1@yandex.com | 2 | 2019-08-11 11:03:10 |
| a.l.ex.a.le.x.2.00.1@yandex.ru | 1 | 2019-08-12 07:20:44 |
| a.l.ex.a.le.x.20.0.1@yandex.ru | 1 | 2019-07-31 19:28:36 |
| a.l.ex.a.lex.20.0.1@yandex.ru | 1 | 2019-08-03 02:15:11 |
| a.l.ex.al.e.x200.1@yandex.com | 2 | 2019-08-11 05:51:33 |
| a.l.ex.alex20.91@yandex.com | 1 | 2019-08-08 12:10:19 |
| a.l.exa.l.e.x.2.0.0.1@yandex.ua | 1 | 2019-08-07 19:25:05 |
| a.l.exa.l.ex2.0.0.1@yandex.ru | 1 | 2019-08-04 06:50:12 |
| a.l.exa.le.x.2.00.1@yandex.ru | 1 | 2019-08-05 17:40:06 |
this is what I love about you guys
Offline
#12 2019-12-25 3:47 am
- Visman
- Member
- Registered: 2019-12-14
- Posts: 6
Re: Gmail goodness, well, from us anyway
These are different email addresses. Besides
| a.l.e.xa.le.x.2.0.0.1@yandex.com | 1 | 2019-08-08 23:05:06 |
| a.l.e.xa.le.x.2.0.0.1@yandex.ru | 1 | 2019-08-07 08:42:31 |
- this is one address.
Yandex does not ignore dots in the local part of the address. '.' == '-' for Yandex.
Offline
#13 2019-12-25 5:29 am
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
I'll get started on porting this to node, and pulling your code into a test API. It's then a case of renormaling the existing database sets
Offline
#14 2019-12-25 5:57 am
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
I'm undecided on something here
Should I go back and renormalise the entire dataset for matching new entries, thus "removing" (for yandex) all yandex domains other than .ru, or should I leave them as it in the DB and normalise at the API level...
The first will leave the original email in the database, more as a note than anything, which is visible on searches however it will return incomplete partial matches from searches if the domain isnt matching (search is a LIKE search, not an = search). This option adds a tiny bit of SHA overhead on insert but makes for a cleaner update to the API datastore.
The second option could confuse people who wont see @yandex.kz data that matches API data as it will be .ru, and .kz simply wont appear on downloads/searches. The benefits here is that I dont have to touch the source DB on changes to normalisation as they are found, as the process is handled by the API by processing domains/settings on request
A note on the API lookup process. All email addresses are normalised on insert into the DB, and then stored in a hash table. Lookups are normalised and then checked against a matching hash. If the hashes are different then no match is found. If asdf@yandex.com and asdf@yandex.ru need to be processed by the API then another lookup is required (option two). Option one would insert them into the hash table with the same hash but requires lots more SQL work on change (and the initial normalisation)
I'm thinking that maybe a hybrid. Normalise the user part in the DB, normalise them completely in the API. Ideas and thoughts about how you would best use the information at hand?
I might try both in code and see which is going to be easier to maintain going forward.
Offline
#15 2019-12-25 6:28 pm
- Alex Kemp
- Moderator
- From: Nottingham, England
- Registered: 2009-12-02
- Posts: 2,423
- Website
Re: Gmail goodness, well, from us anyway
Smaller (almost always) = quicker. However, I've found that not to be true too often, so always now check.
How much disk & cpu do you have? If enough, construct 2 sets of DB and test. Now is the time to do it, whilst nice & quiet. And anyway, who wants a family life?
Offline
#16 2019-12-25 8:02 pm
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
There is no real constraint on the insert in SQL. I just always try to keep it tight. I'll work with normalisation on insert into the API as this code is a lot easier to change than the SQL code or data
Offline
#17 2020-02-14 6:57 am
- pedigree
- uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
- From: New Zealand
- Registered: 2008-04-16
- Posts: 7,056
Re: Gmail goodness, well, from us anyway
I'm looking at your code visman, and I was wondering the reasoning behind
.ya.ru => .yandex.ru
dot at the start of a domain
Offline
#18 2020-02-14 9:22 am
- Visman
- Member
- Registered: 2019-12-14
- Posts: 6
Re: Gmail goodness, well, from us anyway
I'm looking at your code visman, and I was wondering the reasoning behind
.ya.ru => .yandex.ru
dot at the start of a domain
I am going to use NormEmail to ban users on the forum.
When creating a ban, I will install ".ru" (or ".fastmail.com"), then all mail domains from the ru zone (or all subdomains and the fastmail.com domain itself) will be banned.
Offline
Pages: 1