You are not logged in.

#1 2012-02-27 9:07 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

New "Confidence" Feature is Now Live

Ive just added a new feature to the API, called confidence.  Using some serious maths, the API will now report a score, as a percentage (floating point) of any field being a spammer.  Its not an exact science but it will give you a better idea of the data and how to handle it.

It is only supported using serial/XML data and is not available in the standard API called.  if you append &f=json (or XML/serial etc) to each call, then you will see a new field, called 'confidence'

eg

http://www.stopforumspam.com/api?ip=1.2.3.4&f=json

will show

{"success":1,"ip":{"lastseen":"2012-01-31 16:17:22","frequency":35,"appears":1,"confidence":54.12}}

This example shows that the chances of this IP being used to possibly spam or post  unsolicited commercial adverts, based on reported values vs days since we last saw it, is 54%.  Its an estimation based on the Wilson scoring function.

I hope that people start using it and use it to help decide what to do with those entries that you could otherwise either reject or accept.

As usual, as bugs, please post here, via the contact form, or if you can keep the swearing to under 140 characters, then Twitter smile

Offline

#2 2012-02-27 9:36 pm

TheVisitors
Member
Registered: 2010-10-15
Posts: 13
Website

Re: New "Confidence" Feature is Now Live

Such a cool little feature.

12 GB of ram, huh?  (as you said on Twitter)

Maybe something that would be worth contributing for.

Offline

#3 2012-02-27 10:51 pm

Katana
Member
Registered: 2009-08-18
Posts: 1,886

Re: New "Confidence" Feature is Now Live

Squee, my scoring code's live! :3


うるさいうるさいうるさい!

Offline

#4 2012-02-27 10:53 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: New "Confidence" Feature is Now Live

The server has 24gb now which is enough to move forward with reducing the MySQL footprint and moving a massive chunk of processing into MongoDB

Offline

#5 2012-02-27 11:15 pm

kpatz
Member
Registered: 2008-10-09
Posts: 1,437

Re: New "Confidence" Feature is Now Live

I think Ped is trying to get the entire SFS database loaded into RAM. big_smile


Spam happens when greed meets stupidity.

Offline

#6 2012-02-27 11:33 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: New "Confidence" Feature is Now Live

Would need more RAM for that smile  48gb

Offline

#7 2012-02-27 11:39 pm

Katana
Member
Registered: 2009-08-18
Posts: 1,886

Re: New "Confidence" Feature is Now Live

pedigree wrote:

Would need more RAM for that smile  48gb

100+ considering how it's constantly growing.


うるさいうるさいうるさい!

Offline

#8 2012-02-28 1:27 pm

kpatz
Member
Registered: 2008-10-09
Posts: 1,437

Re: New "Confidence" Feature is Now Live

Wow... 48 GB? That's a lot of spammers. smile

Well, with 7 billion "humans" on this planet, 16,402,099 complete retards is probably a conservative estimate. wink


Spam happens when greed meets stupidity.

Offline

#9 2013-07-16 3:10 pm

banp
Member
Registered: 2013-07-16
Posts: 2

Re: New "Confidence" Feature is Now Live

Hey,

Thanks for this feature, I believe this could emerge into something really useful. Although to really assess the confidence and use the score I think it is necessary to understand how it is derived. Could you share some details on it? Do you fit it to some standard distribution or do you use some nonparametric methods?

Offline

#10 2013-07-16 3:15 pm

Alex Kemp
Moderator
From: Nottingham, England
Registered: 2009-12-02
Posts: 2,420
Website

Re: New "Confidence" Feature is Now Live

Hi banp, welcome to SFS.

Search for `Wilson scoring'.

Offline

#11 2013-07-16 8:36 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: New "Confidence" Feature is Now Live

Its not the perfect usage of the scoring system used but its better than the binary yes/no

Offline

#12 2013-07-24 9:45 am

banp
Member
Registered: 2013-07-16
Posts: 2

Re: New "Confidence" Feature is Now Live

Hello guys and thanks for your quick answer !

I have now had time to have a look at Wilson Score and I have to say I do not fully understand how it is used here.

As I understand it, the Wilson score is (as any binomial proportion confidence intervals) a confidence interval for a proportion. So it should be like "With 95% probability the real proportion belongs to the interval (0.4;0.6)", whereas you supply just one number like "54.12%".

In the "API usage" one can find that the score is "based on the last seen date and the number of sightings". So is the proportion for a given email defined as "unclean sightings / clean sightings + unclean sightings" or "time periodds when the email was unclean / all time periods checked" ? I am also not sure if the distribution is binomial. It seems to me that it is quite autoregressive - once an email becomes a spammer it is more probable that it will stay this way. And from the moment it becomes a spammer almost all posts from it will be spam posts...

If it is not secret please reveal a little more smile

Kind regards,
banp

Offline

#13 2013-07-24 2:53 pm

Alex Kemp
Moderator
From: Nottingham, England
Registered: 2009-12-02
Posts: 2,420
Website

Re: New "Confidence" Feature is Now Live

As pedigree says, it is far from perfect; we wanted a way to try to express the reliability of those db results numerically, and a former contributor suggested `Wilson scoring' since algorithms were available for that. You are not the first to cast doubts on it's suitability but, at this moment, it is all that we have.

The discussions at the point of it's consideration & incorporation are all available to Registered members like yourself; just search to discover them. Then, offer something better. ped's time is *very* limited, so do not expect him to do all your work for you. Put together a better algorithm & it will be snapped up, with grateful thanks.

Offline

#14 2013-07-24 3:13 pm

pedigree
uıɐbɐ ʎɐqǝ ɯoɹɟ pɹɐoqʎǝʞ ɐ buıʎnq ɹǝʌǝu ɯ,ı
From: New Zealand
Registered: 2008-04-16
Posts: 7,054

Re: New "Confidence" Feature is Now Live

yes, its not perfect, it is weighted a lot towards number of listings and leans away from the lastseen.  If you do have other suggestions, I would love to look into implementing something more suitable... with credit given on the site smile

The data you have to work with is.

1. Number of times we've seen the listing
2. The lastseen date for the listing.

Thats all that is stored in the caches.  The mysql database provides a lot more but the API does not access that.

Considerations

1. It has to be quick
2. It cannot access mysql
3. It has to change with time as active records become older.

I can expand the data in the caches to include other data but cache size needs to be considered.  Currently the API data is running at 1GB of ram, about 16 bytes per record and chunked into slabs.  The API cannot use stupid amounts of memory are the mirror nodes have minimal ram.

Offline

#15 2017-11-09 3:27 am

mcserverstore
Member
Registered: 2017-11-08
Posts: 6

Re: New "Confidence" Feature is Now Live

This confidence feature is a great idea, I just developed an API Client using this feature.

Offline

Board footer

Powered by FluxBB

Close
Close