[pLog-svn] r6088 - plog/branches/lifetype-1.2/class/security

Sat Dec 1 04:36:29 EST 2007

The bayesian filter needs to perform additional checks on the incoming  
comment because if it's going to end up being saved in the database  
(marked as spam), we first need to make sure that things like the blog  
id and the article id are correct. But it's not strictly necessary  
that it runs first, and in fact it doesn't really matter, so I guess  
making it run last is still good enough for now.

Oscar

On Dec 1, 2007, at 12:19 AM, Jon Daley wrote:

> 	Should we have a small filter at the front that does these sort of
> checks, and then have the bayesian filter at the end?  Or perhaps  
> the real
> reason is that since the bayesian filter actually saves the comment,  
> it
> needs to have additional checks, no matter where in the order it  
> falls?
> 	There are two filters before the bayesian filter, and maybe that
> logic could go in there?
> 	It would be nice to have filters be able to lower the cpu usage on
> comments that have invalid article ids, etc. since presumably that is
> spammers trying to mess with the system.
>
> ------------------------------------------------------------------------
> r5918 | oscar | 2007-09-07 17:38:00 -0400 (Fri, 07 Sep 2007) | 5 lines
>
> This should solve issues http://bugs.lifetype.net/view.php?id=1386
> ("Spammers are able to post comments even if comments are disabled  
> for a
> particular post") and http://bugs.lifetype.net/view.php?id=1387  
> ("comments
> with article_id = 0 created by some spam bots")
>
> The problem here was that since the bayesian filter is run *before*  
> any
> application logic is run, it should also check things like whether
> comments are enabled or not and if the article is found at all or not,
> even though this same checks are applied later on in the  
> AddCommentAction
> class. The articleId parameter was taken as is from the request,  
> without
> performing any check other than checking if it is an integer, so this
> caused some comments to point to an article with an id of '0'  
> because we
> did not check if the article really existed before saving the spam
> comment. And the same applies to the other situation, with the  
> toggle for
> enabling and disabling comments.
>
> The solution was to add some additional logic to the BayesianFilter  
> filter
> class and perform these checks, that does indeed duplicate some of the
> logic found later in the process flow but I did not find a more  
> elegant
> solution for this (at least not without a redesign of the whole filter
> architecture anyway)
> ------------------------------------------------------------------------
>
>
> On Fri, 30 Nov 2007, Paul Westbrook wrote:
>> Hello,
>>  That should be fine.  But in revision 5918 it looks like it is
>> intentional that the Bayesian filter runs first.
>>
>> --Paul
>>
>> On 11/30/07, Oscar Renalias <oscar at renalias.net> wrote:
>>>
>>> So can this issue be closed by placing the Bayesian filter at the  
>>> end
>>> of the pipeline chain?
>>>
>>> On Nov 30, 2007, at 6:48 AM, Jon Daley wrote:
>>>
>>>> On Fri, 30 Nov 2007, Mark Wu wrote:
>>>>> Why can't we just put the bayesian filter in last order? it seems
>>>>> solve this
>>>>> problem easier.
>>>>      Does that fix everything?  It is certainly the easiest  
>>>> (coding and
>>>> performance) wise.
>>>>      With my thinking it seems like that fixes it - at least for  
>>>> now,
>>>> because we don't have any other plugins that would use the inputs  
>>>> of
>>>> others.  And we can maybe do Mark's priority idea if we ever need
>>>> that sort of thing.
>>>>      As long as it works for Paul's stuff, I think that sounds  
>>>> good.
>>> So,
>>>> then we should take Mark's rev 6088 or whatever it is and use that,
>>>> but modify it to pass in the previouslyRejected flag, and then put
>>>> the bayesian at the end.
>>>>
>>>>> BTW,  most lifetype installations in CJK site does rely on  
>>>>> Bayesian
>>>>> Filter to protect the spam attack. Because the tokenize algorithm
>>>>> can't separate CJK into each atomic token. We don't use stop words
>>>>> and "white space" to seperate a paragraph into "word".
>>>>      I am not sure what you are saying.  It seems like you are  
>>>> saying
>>>> the tokenizer doesn't work, so then it seems that the bayesian
>>>> filter wouldn't be very good at all...
>>>>
>>>>      Well, it's been 10 minutes since I read your idea of simply
>>> putting
>>>> the bayesian filter at the end, and haven't come up with a reason
>>>> why it won't work.  So, probably good.  Do you want to do it, or  
>>>> me?
>>>>
>>>> --
>>>> Jon Daley
>>>> http://jon.limedaley.com/
>>>>
>>>> Whenever people agree with me I always feel I must be wrong.
>>>> -- Oscar Wilde_______________________________________________
>>>> pLog-svn mailing list
>>>> pLog-svn at devel.lifetype.net
>>>> http://limedaley.com/mailman/listinfo/plog-svn
>>>
>>> _______________________________________________
>>> pLog-svn mailing list
>>> pLog-svn at devel.lifetype.net
>>> http://limedaley.com/mailman/listinfo/plog-svn
>>>
>>
>
> -- 
> Jon Daley
> http://jon.limedaley.com/
>
> All who would win joy, must share it; happiness was born a twin.
> -- Lord Byron
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://limedaley.com/mailman/listinfo/plog-svn