[pLog-svn] r6088 - plog/branches/lifetype-1.2/class/security

Paul Westbrook paul at westbrooks.org
Thu Nov 29 14:41:31 EST 2007


Hello,
   Sorry that I am joining this discussion late.

As mentioned above, the reason that I made this change was that the bayesian
database was getting poorly trained.  I had cases where the same message was
not being caught by the bayesian filter, and being trained as ham, and then
failing some other spam filter, which deleting the message.  This was
happening several hundred times with the same message, so the bayesian
database was getting really screwed up.

  The root of this problem was that that the message got removed from the
database, so I (the user) had no way to help fix the database.

  Here is a potential proposal that would solve this problem, and not
require the filters to be run twice

1) Make it possible to have a plugin return the fact that some state has
been persisted.
2) Set this as a property of the comment.
3) If one of the following plugins wants to delete the comment, it will call
the Delete() method.
4) In this method it checks to see if the "something has been persisted" bit
and doesn't actually delete the comment, but just hides it.


An alternative to this is to have the comment keep a reference to the
filters that have persisted something.  In the delete method, before the
comment is deleted, all of the filters are given a chance to clean up their
state.


--Paul

On 11/29/07, Jon Daley <plogworld at jon.limedaley.com> wrote:
>
> On Fri, 30 Nov 2007, Mark Wu wrote:
> > What you do is make the bayesian filter  work worse not better.
> >
> > There are several plugins use pipeline filter, we take authimage,
> > host-block, anti-dns-spam, content-filter as exmple:
> >
> > If one of them failed, you ask baysianFilter to training the comment as
> a
> > spam, It is not a good idea.
> >
> > 1. Authimage failed: it is due to user input wrong code, but he is not
> spam.
>         I disagree.  What percentage of the auth image failures are due to
> being a spammer versus a user.  If users are failing a lot, this plugin
> should be removed from the system.  The whole point is that spammers fail
> the auth image, right?
>
> > 2. host-block: some one keep post something from some IP, but the
> content is
> > not spam, and the "IP" yes .... and our bayesian does not training IP
> > 3. anti-dns: the same as host block
>         I disagree again.  I guess if you are trying to block your friends
> rather than spammers, you would be correct.  I think the point of the host
> block is to try to catch the cases where the bayesian filter missed the
> spam.
>         I guess it is a disagreement about how the bayesian filter should
> work.  For my email, I train every single spam and non-spam that I get for
> the last three years.  I am not really affected by the so-called image
> spams, with random text, etc.  My bayesian filter happily trains away, and
> blocks pretty much all of those sorts of spams.  Bayesian filtering works
> better if it is helped, not worse.  It is a matter of statistics.
>
> > 4. content filter: We just don't want some "word" to post into our
> comment,
> > but it is not spam.
>         This one is your best argument, but I still say that you should
> let the bayesian filter train on everything and it will come out the best.
> Your method has the filter trained the opposite way - the bayesian filter
> takes a comment that fails the auth image and trains it as non-spam.
> Which is worse?
>
> > And, If you DO reallty won't to solve the order problem, we should add a
> > priority to register filter , like
> >
> > $registerFilter( $filter, $priority )
> >
> > And, before we run the process(), we need to sort the global filter
> array()
> > by priority. That's the right method.
> >
> > Becasue we can always put the baysianFilter in the last order everytime.
>         Yes, but I think there could be other plugins that would want to
> be treated "special" and run last.  Maybe plugins could request to be run
> at the end or something.
>
> > Nit just run them twice, and add a "hacky" secondRun flag there to avoid
> > filter to something.
>         I think the $secondRun flag could be replaced with
> $previouslyRejected, if you would like, since I think that since the
> bayesian filter always runs first, the only time previouslyRejected will
> be set to true is when it is the second run.
>         By the way, if we go with your code, you should probably use the
> previouslyRejected flag.
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://limedaley.com/mailman/listinfo/plog-svn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://limedaley.com/pipermail/plog-svn/attachments/20071129/3f2d718a/attachment.htm 


More information about the pLog-svn mailing list