[pLog-svn] r6088 - plog/branches/lifetype-1.2/class/security

Oscar Renalias oscar at renalias.net
Wed Dec 5 10:14:17 EST 2007


Is anyone taking care of making these changes so that the Bayesian
filter runs last?

On Dec 1, 2007 11:36 AM, Oscar Renalias <oscar at renalias.net> wrote:
> The bayesian filter needs to perform additional checks on the incoming
> comment because if it's going to end up being saved in the database
> (marked as spam), we first need to make sure that things like the blog
> id and the article id are correct. But it's not strictly necessary
> that it runs first, and in fact it doesn't really matter, so I guess
> making it run last is still good enough for now.
>
> Oscar
>
>
> On Dec 1, 2007, at 12:19 AM, Jon Daley wrote:
>
> >       Should we have a small filter at the front that does these sort of
> > checks, and then have the bayesian filter at the end?  Or perhaps
> > the real
> > reason is that since the bayesian filter actually saves the comment,
> > it
> > needs to have additional checks, no matter where in the order it
> > falls?
> >       There are two filters before the bayesian filter, and maybe that
> > logic could go in there?
> >       It would be nice to have filters be able to lower the cpu usage on
> > comments that have invalid article ids, etc. since presumably that is
> > spammers trying to mess with the system.
> >
> > ------------------------------------------------------------------------
> > r5918 | oscar | 2007-09-07 17:38:00 -0400 (Fri, 07 Sep 2007) | 5 lines
> >
> > This should solve issues http://bugs.lifetype.net/view.php?id=1386
> > ("Spammers are able to post comments even if comments are disabled
> > for a
> > particular post") and http://bugs.lifetype.net/view.php?id=1387
> > ("comments
> > with article_id = 0 created by some spam bots")
> >
> > The problem here was that since the bayesian filter is run *before*
> > any
> > application logic is run, it should also check things like whether
> > comments are enabled or not and if the article is found at all or not,
> > even though this same checks are applied later on in the
> > AddCommentAction
> > class. The articleId parameter was taken as is from the request,
> > without
> > performing any check other than checking if it is an integer, so this
> > caused some comments to point to an article with an id of '0'
> > because we
> > did not check if the article really existed before saving the spam
> > comment. And the same applies to the other situation, with the
> > toggle for
> > enabling and disabling comments.
> >
> > The solution was to add some additional logic to the BayesianFilter
> > filter
> > class and perform these checks, that does indeed duplicate some of the
> > logic found later in the process flow but I did not find a more
> > elegant
> > solution for this (at least not without a redesign of the whole filter
> > architecture anyway)
> > ------------------------------------------------------------------------
> >
> >
> > On Fri, 30 Nov 2007, Paul Westbrook wrote:
> >> Hello,
> >>  That should be fine.  But in revision 5918 it looks like it is
> >> intentional that the Bayesian filter runs first.
> >>
> >> --Paul
> >>
> >> On 11/30/07, Oscar Renalias <oscar at renalias.net> wrote:
> >>>
> >>> So can this issue be closed by placing the Bayesian filter at the
> >>> end
> >>> of the pipeline chain?
> >>>
> >>> On Nov 30, 2007, at 6:48 AM, Jon Daley wrote:
> >>>
> >>>> On Fri, 30 Nov 2007, Mark Wu wrote:
> >>>>> Why can't we just put the bayesian filter in last order? it seems
> >>>>> solve this
> >>>>> problem easier.
> >>>>      Does that fix everything?  It is certainly the easiest
> >>>> (coding and
> >>>> performance) wise.
> >>>>      With my thinking it seems like that fixes it - at least for
> >>>> now,
> >>>> because we don't have any other plugins that would use the inputs
> >>>> of
> >>>> others.  And we can maybe do Mark's priority idea if we ever need
> >>>> that sort of thing.
> >>>>      As long as it works for Paul's stuff, I think that sounds
> >>>> good.
> >>> So,
> >>>> then we should take Mark's rev 6088 or whatever it is and use that,
> >>>> but modify it to pass in the previouslyRejected flag, and then put
> >>>> the bayesian at the end.
> >>>>
> >>>>> BTW,  most lifetype installations in CJK site does rely on
> >>>>> Bayesian
> >>>>> Filter to protect the spam attack. Because the tokenize algorithm
> >>>>> can't separate CJK into each atomic token. We don't use stop words
> >>>>> and "white space" to seperate a paragraph into "word".
> >>>>      I am not sure what you are saying.  It seems like you are
> >>>> saying
> >>>> the tokenizer doesn't work, so then it seems that the bayesian
> >>>> filter wouldn't be very good at all...
> >>>>
> >>>>      Well, it's been 10 minutes since I read your idea of simply
> >>> putting
> >>>> the bayesian filter at the end, and haven't come up with a reason
> >>>> why it won't work.  So, probably good.  Do you want to do it, or
> >>>> me?
> >>>>
> >>>> --
> >>>> Jon Daley
> >>>> http://jon.limedaley.com/
> >>>>
> >>>> Whenever people agree with me I always feel I must be wrong.
> >>>> -- Oscar Wilde_______________________________________________
> >>>> pLog-svn mailing list
> >>>> pLog-svn at devel.lifetype.net
> >>>> http://limedaley.com/mailman/listinfo/plog-svn
> >>>
> >>> _______________________________________________
> >>> pLog-svn mailing list
> >>> pLog-svn at devel.lifetype.net
> >>> http://limedaley.com/mailman/listinfo/plog-svn
> >>>
> >>
> >
> > --
> > Jon Daley
> > http://jon.limedaley.com/
> >
> > All who would win joy, must share it; happiness was born a twin.
> > -- Lord Byron
> > _______________________________________________
> > pLog-svn mailing list
> > pLog-svn at devel.lifetype.net
> > http://limedaley.com/mailman/listinfo/plog-svn
>
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://limedaley.com/mailman/listinfo/plog-svn
>


More information about the pLog-svn mailing list