[pLog-svn] r6088 - plog/branches/lifetype-1.2/class/security
Jon Daley
plogworld at jon.limedaley.com
Thu Nov 29 13:13:48 EST 2007
On Fri, 30 Nov 2007, Mark Wu wrote:
> What you do is make the bayesian filter work worse not better.
>
> There are several plugins use pipeline filter, we take authimage,
> host-block, anti-dns-spam, content-filter as exmple:
>
> If one of them failed, you ask baysianFilter to training the comment as a
> spam, It is not a good idea.
>
> 1. Authimage failed: it is due to user input wrong code, but he is not spam.
I disagree. What percentage of the auth image failures are due to
being a spammer versus a user. If users are failing a lot, this plugin
should be removed from the system. The whole point is that spammers fail
the auth image, right?
> 2. host-block: some one keep post something from some IP, but the content is
> not spam, and the "IP" yes .... and our bayesian does not training IP
> 3. anti-dns: the same as host block
I disagree again. I guess if you are trying to block your friends
rather than spammers, you would be correct. I think the point of the host
block is to try to catch the cases where the bayesian filter missed the
spam.
I guess it is a disagreement about how the bayesian filter should
work. For my email, I train every single spam and non-spam that I get for
the last three years. I am not really affected by the so-called image
spams, with random text, etc. My bayesian filter happily trains away, and
blocks pretty much all of those sorts of spams. Bayesian filtering works
better if it is helped, not worse. It is a matter of statistics.
> 4. content filter: We just don't want some "word" to post into our comment,
> but it is not spam.
This one is your best argument, but I still say that you should
let the bayesian filter train on everything and it will come out the best.
Your method has the filter trained the opposite way - the bayesian filter
takes a comment that fails the auth image and trains it as non-spam.
Which is worse?
> And, If you DO reallty won't to solve the order problem, we should add a
> priority to register filter , like
>
> $registerFilter( $filter, $priority )
>
> And, before we run the process(), we need to sort the global filter array()
> by priority. That's the right method.
>
> Becasue we can always put the baysianFilter in the last order everytime.
Yes, but I think there could be other plugins that would want to
be treated "special" and run last. Maybe plugins could request to be run
at the end or something.
> Nit just run them twice, and add a "hacky" secondRun flag there to avoid
> filter to something.
I think the $secondRun flag could be replaced with
$previouslyRejected, if you would like, since I think that since the
bayesian filter always runs first, the only time previouslyRejected will
be set to true is when it is the second run.
By the way, if we go with your code, you should probably use the
previouslyRejected flag.
More information about the pLog-svn
mailing list