[pLog-svn] r6088 - plog/branches/lifetype-1.2/class/security

Thu Nov 29 13:13:48 EST 2007

On Fri, 30 Nov 2007, Mark Wu wrote:
> What you do is make the bayesian filter  work worse not better.
>
> There are several plugins use pipeline filter, we take authimage,
> host-block, anti-dns-spam, content-filter as exmple:
>
> If one of them failed, you ask baysianFilter to training the comment as a
> spam, It is not a good idea.
>
> 1. Authimage failed: it is due to user input wrong code, but he is not spam.
 	I disagree.  What percentage of the auth image failures are due to 
being a spammer versus a user.  If users are failing a lot, this plugin 
should be removed from the system.  The whole point is that spammers fail 
the auth image, right?

> 2. host-block: some one keep post something from some IP, but the content is
> not spam, and the "IP" yes .... and our bayesian does not training IP
> 3. anti-dns: the same as host block
 	I disagree again.  I guess if you are trying to block your friends 
rather than spammers, you would be correct.  I think the point of the host 
block is to try to catch the cases where the bayesian filter missed the 
spam.
 	I guess it is a disagreement about how the bayesian filter should 
work.  For my email, I train every single spam and non-spam that I get for 
the last three years.  I am not really affected by the so-called image 
spams, with random text, etc.  My bayesian filter happily trains away, and 
blocks pretty much all of those sorts of spams.  Bayesian filtering works 
better if it is helped, not worse.  It is a matter of statistics.

> 4. content filter: We just don't want some "word" to post into our comment,
> but it is not spam.
 	This one is your best argument, but I still say that you should 
let the bayesian filter train on everything and it will come out the best. 
Your method has the filter trained the opposite way - the bayesian filter 
takes a comment that fails the auth image and trains it as non-spam. 
Which is worse?

> And, If you DO reallty won't to solve the order problem, we should add a
> priority to register filter , like
>
> $registerFilter( $filter, $priority )
>
> And, before we run the process(), we need to sort the global filter array()
> by priority. That's the right method.
>
> Becasue we can always put the baysianFilter in the last order everytime.
 	Yes, but I think there could be other plugins that would want to 
be treated "special" and run last.  Maybe plugins could request to be run 
at the end or something.

> Nit just run them twice, and add a "hacky" secondRun flag there to avoid
> filter to something.
 	I think the $secondRun flag could be replaced with 
$previouslyRejected, if you would like, since I think that since the 
bayesian filter always runs first, the only time previouslyRejected will 
be set to true is when it is the second run.
 	By the way, if we go with your code, you should probably use the 
previouslyRejected flag.