You are not logged in.

#1 2009-12-05 9:41 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Gmail goodness, well, from us anyway

As some of you are aware, gmail sucks for spam.  They dont seem to care about shutting down email accounts very quickly and they breach RFC by allowing all sorts of random . and + in the email addresses that all "normalize" down to the same

joe.blogs+spammer@gmail.com
j.o.e.blogs+spammer@gmail.com
j.oe.bl.ogs+spa.m.m.e.r@gmail.com
joeblog.....s@gmail.com

Are all the same email accounts.  Until now, they "fooled" our database as they werent identical to the database.

Well, good news, Ive put normalization into the API.

Now, the above email addresses will all give positive results to a query if joeblogs@gmail.com is listed.

The homepage will now show the original email address in [ ] under the normalized version of it.  This is for your information but mainly so that google will index it and return search results to people getting spammed

I know, Im awesome, thanks smile

Offline

#2 2009-12-06 2:17 am

hhopper
Member
From: Florida
Registered: 2008-11-20
Posts: 151
Website

Re: Gmail goodness, well, from us anyway

Actually, you ARE awesome!

Hop 838525006_3484f8d76a_o.gif

Offline

#3 2009-12-06 2:13 pm

MacHeadCase
Member
From: Montréal, Québec
Registered: 2008-09-07
Posts: 346
Website

Re: Gmail goodness, well, from us anyway

Bravo, pedigree! Bravo!!! big_smile

Offline

#4 2009-12-06 2:46 pm

TVwas.com
Member
Registered: 2009-10-24
Posts: 6

Re: Gmail goodness, well, from us anyway

We are not worthy! We are not worthy!
But we'll use the function with gratitude...

Offline

#5 2009-12-06 2:48 pm

Alessandra
Member
From: Chicago, Illinois, USA
Registered: 2009-11-29
Posts: 165
Website

Re: Gmail goodness, well, from us anyway

Very nice!

Offline

#6 2009-12-06 11:31 pm

techtirie
Member
From: Ireland
Registered: 2009-10-20
Posts: 56
Website

Re: Gmail goodness, well, from us anyway

Thanks

you are a star.

Offline

#7 2009-12-13 6:31 am

nolimitssjca
Member
From: San Jose
Registered: 2009-05-31
Posts: 6
Website

Re: Gmail goodness, well, from us anyway

Back to normal

Offline

#8 2019-12-14 6:11 am

Visman
Member
Registered: 2019-12-14
Posts: 6

Re: Gmail goodness, well, from us anyway

You only do this with Google?

There is such information:
1.

Addresses of this form, using various separators between the base name and the tag, are supported by several email services, including Runbox (plus), Gmail (plus),[16] Rackspace Email (plus), Yahoo! Mail Plus (hyphen),[17] Apple's iCloud (plus), Outlook.com (plus),[18] ProtonMail (plus),[19] FastMail (plus and Subdomain Addressing),[20] MMDF (equals), Qmail and Courier Mail Server (hyphen).[21][22] Postfix and Exim allow configuring an arbitrary separator from the legal character set.[23][24]

https://en.wikipedia.org/wiki/Email_address#Address_tags

2. username @ googlemail.com === username @ gmail.com

3. username @ yandex.ru === username @ yandex.com (I personally checked by sending myself an email on the com domain, although my mailbox is on the ru domain. Perhaps all Yandex domains are duplicates of the ru domain.)

Offline

#9 2019-12-16 12:58 am

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

Thanks for this.  I'll look at how I can update the code in order to handle this.  The API strips all plus symbols (up to the @) for all domains, but its very useful to know about whcih domains do further normalisation.

Offline

#10 2019-12-24 4:41 pm

Visman
Member
Registered: 2019-12-14
Posts: 6

Re: Gmail goodness, well, from us anyway

Made such an address normalizer https://github.com/MioVisman/NormEmail

So far, such a result of work:

// some string
ExampLe                               => example
exaMple.COM                           => example.com
.example.com                          => .example.com
@example.com                          => example.com
"example.com                          => "example.com
"USER+++NAME@EXAMpLE.com              => "USER+++NAME@example.com
googlemail.com                        => gmail.com
pm.me                                 => protonmail.com
yandex.tj                             => yandex.ru
ya.ru                                 => yandex.ru

// Unicode
ПОЛЬЗОВАТЕЛЬ@домен.РУ                 => пользователь@xn--d1acufc.xn--p1ag
пользователь+тег@домен.ру             => пользователь@xn--d1acufc.xn--p1ag

// Gmail
User.namE+tag@gmail.com               => username@gmail.com
u.sern.ame+tag+tag+tag@googlemail.com => username@gmail.com

// Protonmail
u_s.e-rname+tag@pm.me                 => username@protonmail.com
user-name@protonmail.ch               => username@protonmail.com

// Yahoo (.com, .ae, .at, ...)
username-tag@yahoo.com                => username@yahoo.com
user+name-tag@yahoo.fr                => user+name@yahoo.fr

// Yandex (13 domains)
user.name+tag@яндекс.рф               => user-name@yandex.ru
user-name@yandex.com                  => user-name@yandex.ru
username@ya.ru                        => username@yandex.ru

Offline

#11 2019-12-25 12:51 am

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

awesome.  I'm going to port all of this to the API

I was about to ask if these are confirmed all the same, but you answered that smile

 a.l.e.x.a.l.e.x.2.091@yandex.ru  |     1 | 2019-08-01 13:40:03 |
| a.l.e.x.a.l.ex2.0.91@yandex.ru   |     1 | 2019-08-12 16:35:04 |
| a.l.e.x.a.le.x.2091@yandex.ru    |     1 | 2019-08-03 01:12:34 |
| a.l.e.x.al.ex.200.1@yandex.ru    |     1 | 2019-08-07 15:05:16 |
| a.l.e.x.al.ex2.0.0.1@yandex.ru   |     1 | 2019-08-04 13:10:19 |
| a.l.e.x.ale.x.2.0.0.1@yandex.ua  |     2 | 2019-08-09 07:10:22 |
| a.l.e.x.alex.2.00.1@yandex.ru    |     1 | 2019-08-07 15:05:06 |
| a.l.e.x.alex.200.1@yandex.ua     |     1 | 2019-08-01 14:23:57 |
| a.l.e.x.alex2.091@yandex.ru      |     1 | 2019-08-04 20:00:10 |
| a.l.e.xa.l.e.x.20.0.1@yandex.ua  |     1 | 2019-08-10 12:35:14 |
| a.l.e.xa.le.x.2.0.0.1@yandex.com |     1 | 2019-08-08 23:05:06 |
| a.l.e.xa.le.x.2.0.0.1@yandex.ru  |     1 | 2019-08-07 08:42:31 |
| a.l.e.xal.e.x2.0.91@yandex.ua    |     1 | 2019-08-11 02:25:02 |
| a.l.e.xalex20.0.1@yandex.com     |     2 | 2019-08-11 11:03:10 |
| a.l.ex.a.le.x.2.00.1@yandex.ru   |     1 | 2019-08-12 07:20:44 |
| a.l.ex.a.le.x.20.0.1@yandex.ru   |     1 | 2019-07-31 19:28:36 |
| a.l.ex.a.lex.20.0.1@yandex.ru    |     1 | 2019-08-03 02:15:11 |
| a.l.ex.al.e.x200.1@yandex.com    |     2 | 2019-08-11 05:51:33 |
| a.l.ex.alex20.91@yandex.com      |     1 | 2019-08-08 12:10:19 |
| a.l.exa.l.e.x.2.0.0.1@yandex.ua  |     1 | 2019-08-07 19:25:05 |
| a.l.exa.l.ex2.0.0.1@yandex.ru    |     1 | 2019-08-04 06:50:12 |
| a.l.exa.le.x.2.00.1@yandex.ru    |     1 | 2019-08-05 17:40:06 |

this is what I love about you guys

Offline

#12 2019-12-25 3:47 am

Visman
Member
Registered: 2019-12-14
Posts: 6

Re: Gmail goodness, well, from us anyway

These are different email addresses. Besides

| a.l.e.xa.le.x.2.0.0.1@yandex.com |     1 | 2019-08-08 23:05:06 |
| a.l.e.xa.le.x.2.0.0.1@yandex.ru  |     1 | 2019-08-07 08:42:31 |

- this is one address.

Yandex does not ignore dots in the local part of the address. '.' == '-' for Yandex.

Offline

#13 2019-12-25 5:29 am

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

I'll get started on porting this to node, and pulling your code into a test API.  It's then a case of renormaling the existing database sets

Offline

#14 2019-12-25 5:57 am

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

I'm undecided on something here

Should I go back and renormalise the entire dataset for matching new entries, thus "removing" (for yandex) all yandex domains other than .ru, or should I leave them as it in the DB and normalise at the API level...

The first will leave the original email in the database, more as a note than anything, which is visible on searches however it will return incomplete partial matches from searches if the domain isnt matching (search is a LIKE search, not an = search).  This option adds a tiny bit of SHA overhead on insert but makes for a cleaner update to the API datastore.

The second option could confuse people who wont see @yandex.kz data that matches API data as it will be .ru, and .kz simply wont appear on downloads/searches.  The benefits here is that I dont have to touch the source DB on changes to normalisation as they are found, as the process is handled by the API by processing domains/settings on request

A note on the API lookup process.  All email addresses are normalised on insert into the DB, and then stored in a hash table.  Lookups are normalised and then checked against a matching hash.  If the hashes are different then no match is found.  If asdf@yandex.com and asdf@yandex.ru need to be processed by the API then another lookup is required (option two).  Option one would insert them into the hash table with the same hash but requires lots more SQL work on change (and the initial normalisation)

I'm thinking that maybe a hybrid.  Normalise the user part in the DB, normalise them completely in the API.  Ideas and thoughts about how you would best use the information at hand?

I might try both in code and see which is going to be easier to maintain going forward.

Offline

#15 2019-12-25 6:28 pm

Alex Kemp
Moderator
From: Nottingham, England
Registered: 2009-12-02
Posts: 2,420
Website

Re: Gmail goodness, well, from us anyway

Smaller (almost always) = quicker. However, I've found that not to be true too often, so always now check.

How much disk & cpu do you have? If enough, construct 2 sets of DB and test. Now is the time to do it, whilst nice & quiet. And anyway, who wants a family life?

Offline

#16 2019-12-25 8:02 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

There is no real constraint on the insert in SQL.  I just always try to keep it tight.  I'll work with normalisation on insert into the API as this code is a lot easier to change than the SQL code or data

Offline

#17 2020-02-14 6:57 am

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: Gmail goodness, well, from us anyway

I'm looking at your code visman, and I was wondering the reasoning behind

.ya.ru                                => .yandex.ru

dot at the start of a domain

Offline

#18 2020-02-14 9:22 am

Visman
Member
Registered: 2019-12-14
Posts: 6

Re: Gmail goodness, well, from us anyway

I'm looking at your code visman, and I was wondering the reasoning behind

.ya.ru                                => .yandex.ru

dot at the start of a domain

I am going to use NormEmail to ban users on the forum.
When creating a ban, I will install ".ru" (or ".fastmail.com"), then all mail domains from the ru zone (or all subdomains and the fastmail.com domain itself) will be banned.

f08ac39fdd0857adfdb539332640d824.png

Offline

Board footer

Powered by FluxBB

Close
Close