[pLog-svn] r5925 - in plugins/branches/lifetype-1.2/related: . class/action class/view locale templates

Paul Westbrook paul at westbrooks.org
Wed Sep 12 04:40:13 EDT 2007


Hello,
   This plugin uses a brute force algorithm.  In order to calculate the list
of related articles for a given post, first it gets a list of all of the
unique words in the post.  Then it for each keyword, it uses LifeType's
builtin search engine to find a list of the posts that have that
keyword.  After it does that for all of the keywords, it lists the returned
articles based on the number of times that the article is returned by the
search engine.  Then the plugin returns the top x articles.

   Since this is expensive, the plugin implements a cache.  The list of
related articles for a post is cached in the file system, so the next time
that it runs, the queries do not have to be run.  The user can specify the
lifetime of that cache.  For example, I have mine set to 1 month, so the
queries will only be run once every 30 days for each article.

   There are serveral things that could be done to improve performance

   1. Implement or use a library that will actually summarize the article
   text.  This shouldn't simply take the nth sentence, but should generate a
   representative summary that contains the important keywords.  This would
   reduce the number of search engine queries that are run
   2. Change the search engine to be able to return a list of articles
   that contains at least one of a specified list of words.  This would allow
   the search engine to be run only once for each article
   3. Use native tag support.  When lifetype nativelly supports tags, the
   plugins wouldn't have to use all of the text in the article, but could just
   find posts that are tagged with the same tags.


--Paul



On 9/12/07, Ayalon <ayalon at blog.nl> wrote:
> Hi There,
>
> This is really a great plugin, but there are some problems with it.
>
> When i switch on this plugin my apache process rise to a memory use that's
> incredible. Is there something to change about it? My database is pretty
> large with a lot of articles and I have a lot of reads on the blogs.
Anybody
> an idea how to optimize?
>
> I know this is not really something for the list, but I tried to reprogram
> the plugin making a different way of searching but I got stuck..
>
> Regards
>
>
> -----Oorspronkelijk bericht-----
> Van: plog-svn-bounces at devel.lifetype.net
> [mailto:plog-svn-bounces at devel.lifetype.net] Namens
> pwestbro at devel.lifetype.net
> Verzonden: dinsdag 11 september 2007 7:10
> Aan: plog-svn at devel.lifetype.net
> Onderwerp: [pLog-svn] r5925 - in plugins/branches/lifetype-1.2/related: .
> class/action class/view locale templates
>
> Author: pwestbro
> Date: 2007-09-11 01:09:42 -0400 (Tue, 11 Sep 2007)
> New Revision: 5925
>
> Modified:
>
> plugins/branches/lifetype-1.2
/related/class/action/pluginrelatedupdateconfig
> action.class.php
>
> plugins/branches/lifetype-1.2
/related/class/view/pluginrelatedconfigview.cla
> ss.php
>    plugins/branches/lifetype-1.2/related/locale/locale_en_UK.php
>    plugins/branches/lifetype-1.2/related/pluginrelated.class.php
>    plugins/branches/lifetype-1.2/related/templates/related.template
> Log:
> Added a setting so the minimum number for keywords that are used to
generate
> the list of related articles
>
>
> Modified:
> plugins/branches/lifetype-1.2
/related/class/action/pluginrelatedupdateconfig
> action.class.php
> ===================================================================
> ---
> plugins/branches/lifetype-1.2
/related/class/action/pluginrelatedupdateconfig
> action.class.php        2007-09-10 19:45:42 UTC (rev 5924)
> +++
> plugins/branches/lifetype-1.2
/related/class/action/pluginrelatedupdateconfig
> action.class.php        2007-09-11 05:09:42 UTC (rev 5925)
> @@ -18,43 +18,45 @@
>      Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
>      */
>
> -       lt_include(
> PLOG_CLASS_PATH."class/action/admin/adminaction.class.php" );
> +    lt_include(
PLOG_CLASS_PATH."class/action/admin/adminaction.class.php"
> );
>      lt_include(
>
PLOG_CLASS_PATH."plugins/related/class/view/pluginrelatedconfigview.class.ph
> p" );
>
>      $apiKeyValid = true;
>
>      class PluginRelatedUpdateConfigAction extends AdminAction
> -       {
> -
> -               var $_pluginEnabled;
> -               var $_numRelatedArticles;
> -               var $_minWordLength;
> -               var $_refreshInterval;
> -               var $_parseBody;
> -               var $_bannedKeywords;
> +        {
> +
> +                var $_pluginEnabled;
> +                var $_numRelatedArticles;
> +                var $_minWordLength;
> +                var $_minNumKeywords;
> +                var $_refreshInterval;
> +                var $_parseBody;
> +                var $_bannedKeywords;
>
>
> -       /**
> +        /**
>           * Constructor. If nothing else, it also has to call the
> constructor of the parent
>           * class, BlogAction with the same parameters
>           */
>          function PluginRelatedUpdateConfigAction( $actionInfo, $request )
>          {
> -               $this->AdminAction( $actionInfo, $request );
> +                $this->AdminAction( $actionInfo, $request );
>          }
> -
> -               function validate()
> -               {
> +
> +       function validate()
> +       {
>
>              $this->_pluginEnabled = $this->_request->getValue(
> "pluginEnabled" );
> -            $this->_pluginEnabled = ($this->_pluginEnabled != "" );
>
> +            $this->_pluginEnabled = ($this->_pluginEnabled != "" );
>
>
>
>              $this->_numRelatedArticles = $this->_request->getValue(
> "numArticles" );
>              $this->_minWordLength = $this->_request->getValue(
> "minWordLength" );
> +            $this->_minNumKeywords = $this->_request->getValue(
> "minNumKeywords" );
>              $this->_refreshInterval = $this->_request->getValue(
"interval"
> );
>              $this->_parseBody = $this->_request->getValue( "parseBody" );
> -            $this->_parseBody = ($this->_parseBody != "" );
>
> +            $this->_parseBody = ($this->_parseBody != "" );
>
>              $this->_bannedKeywords = $this->_request->getValue(
> "bannedKeywords" );
>
>
> @@ -98,28 +100,49 @@
>                          return false;
>                      }
>                  }
> +
> +
> +                if( $this->_minNumKeywords == "" ) {
> +                    $this->_view = new PluginRelatedConfigView(
> $this->_blogInfo );
> +                    $this->_view->setErrorMessage(
> $this->_locale->tr("related_missing_num_keywords"));
> +                    $this->setCommonData();
> +
> +                    return false;
> +                }
> +                else {
> +                    $val3 = new IntegerValidator();
> +                    if( !$val3->validate( $this->_minNumKeywords )) {
> +                        $this->_view = new PluginRelatedConfigView(
> $this->_blogInfo );
> +                        $this->_view->setErrorMessage(
> $this->_locale->tr("related_invalid_num_keywords"));
> +                        $this->setCommonData();
> +
> +                        return false;
> +                    }
> +                }
> +
>              }
> -                       return true;
> -               }
> -
> +            return true;
> +        }
> +
>          /**
>           * Carries out the specified action
>           */
>          function perform()
>          {
>              // update the plugin configurations to blog setting
> -                       $blogSettings = $this->_blogInfo->getSettings();
> +                        $blogSettings = $this->_blogInfo->getSettings();
>              $blogSettings->setValue( "plugin_related_enabled",
> $this->_pluginEnabled );
>              $blogSettings->setValue( "plugin_related_num_articles",
> $this->_numRelatedArticles );
>              $blogSettings->setValue( "plugin_related_min_word_length",
> $this->_minWordLength );
> +            $blogSettings->setValue( "plugin_related_min_num_keywords",
> $this->_minNumKeywords );
>              $blogSettings->setValue( "plugin_related_refresh_interval",
> $this->_refreshInterval );
>              $blogSettings->setValue(
> "plugin_related_extract_keywords_from_body", $this->_parseBody );
>              $blogSettings->setValue( "plugin_related_banned_keywords",
> $this->_bannedKeywords );
>
>              $this->_blogInfo->setSettings( $blogSettings );
> -
> -                       // save the blogs settings
> -                       $blogs = new Blogs();
> +
> +                        // save the blogs settings
> +                        $blogs = new Blogs();
>              if( !$blogs->updateBlog( $this->_blogInfo )) {
>                  $this->_view = new PluginRelatedConfigView(
> $this->_blogInfo );
>                  $this->_view->setErrorMessage(
> $this->_locale->tr("error_updating_settings"));
> @@ -127,20 +150,20 @@
>
>                  return false;
>              }
> -
> -                       // if everything went ok...
> +
> +                        // if everything went ok...
>              $this->_blogInfo->setSettings( $blogSettings );
>              $this->_session->setValue( "blogInfo", $this->_blogInfo );
>              $this->saveSession();
> -
> -                       $this->_view = new PluginRelatedConfigView(
> $this->_blogInfo );
> -                       $this->_view->setSuccessMessage(
> $this->_locale->tr("related_settings_saved_ok"));
> -                       $this->setCommonData();
> -
> -                       // clear the cache
> -                       CacheControl::resetBlogCache(
> $this->_blogInfo->getId());
> +
> +                        $this->_view = new PluginRelatedConfigView(
> $this->_blogInfo );
> +                        $this->_view->setSuccessMessage(
> $this->_locale->tr("related_settings_saved_ok"));
> +                        $this->setCommonData();
> +
> +                        // clear the cache
> +                        CacheControl::resetBlogCache(
> $this->_blogInfo->getId());
>
> -            return true;
> +            return true;
>          }
>      }
>
>
> Modified:
> plugins/branches/lifetype-1.2
/related/class/view/pluginrelatedconfigview.cla
> ss.php
> ===================================================================
> ---
> plugins/branches/lifetype-1.2
/related/class/view/pluginrelatedconfigview.cla
> ss.php  2007-09-10 19:45:42 UTC (rev 5924)
> +++
> plugins/branches/lifetype-1.2
/related/class/view/pluginrelatedconfigview.cla
> ss.php  2007-09-11 05:09:42 UTC (rev 5925)
> @@ -38,6 +38,7 @@
>                         $pluginEnabled = $blogSettings->getValue(
> "plugin_related_enabled" );
>                         $numArticles = $blogSettings->getValue(
> "plugin_related_num_articles" );
>                         $minWordLength = $blogSettings->getValue(
> "plugin_related_min_word_length" );
> +                       $minNumKeyword = $blogSettings->getValue(
> "plugin_related_min_num_keywords" );
>                         $refreshInterval = $blogSettings->getValue(
> "plugin_related_refresh_interval" );
>                         $parseBody = $blogSettings->getValue(
> "plugin_related_extract_keywords_from_body" );
>                         $bannedKeywords = $blogSettings->getValue(
> "plugin_related_banned_keywords" );
> @@ -49,6 +50,7 @@
>                         $this->setValue( "pluginEnabled", $pluginEnabled
);
>                         $this->setValue( "numArticles", $numArticles );
>                         $this->setValue( "minWordLength", $minWordLength
);
> +                       $this->setValue( "minNumKeywords", $minNumKeyword
);
>                         $this->setValue( "interval", $refreshInterval );
>                         $this->setValue( "parseBody", $parseBody );
>                         $this->setValue( "bannedKeywords", $bannedKeywords
> );
>
> Modified: plugins/branches/lifetype-1.2/related/locale/locale_en_UK.php
> ===================================================================
> --- plugins/branches/lifetype-1.2/related/locale/locale_en_UK.php
> 2007-09-10 19:45:42 UTC (rev 5924)
> +++ plugins/branches/lifetype-1.2/related/locale/locale_en_UK.php
> 2007-09-11 05:09:42 UTC (rev 5925)
> @@ -10,12 +10,15 @@
>  $messages["related_settings_saved_ok"] = "Related Posts settings saved
> successfully!";
>  $messages["related_missing_num_articles"] = "Number of articles needs to
be
> specified";
>  $messages["related_invalid_num_articles"] = "Number of articles needs to
be
> an integer";
> +$messages["related_missing_num_keywords"] = "Number of keywords needs to
be
> specified";
> +$messages["related_invalid_num_keywords"] = "Number of keywords needs to
be
> an integer";
>  $messages["related_missing_min_length"] = "Minimum keyword length needs
to
> be specified";
>  $messages["related_invalid_min_length"] = "Minumum keyword length needs
to
> be an integer";
>  $messages["related_banned_keywords"] = "Keywords that should not be used
to
> find related posts (comma separated).";
>
>  $messages["related_articles"] = "Number of related articles to return.";
>  $messages["related_word_length"] = "Minimum length of keyword used to
> generate related article.";
> +$messages["related_num_keywords"] = "Minimum number of keywords required
to
> determine list of related articles.";
>  $messages["related_cache"] = "Lifetime for the related article cache.";
>  $messages["parse_body"] = "Parse the body of articles to generate
keywords.
> (This may cause generating related posts to take longer.)";
>
> @@ -24,6 +27,7 @@
>
>  $messages["related_max_articles"] = "Number Articles";
>  $messages["related_min_word_length"] = "Minimum Keyword Length";
> +$messages["related_min_num_keywords"] = "Minimum Number of Keywords";
>  $messages["related_cache_lifetime"] = "Cache Lifetime";
>  $messages["related_parse_body"] = "Parse Body";
>  $messages["banned_keywords"] = "Banned Keywords";
>
> Modified: plugins/branches/lifetype-1.2/related/pluginrelated.class.php
> ===================================================================
> --- plugins/branches/lifetype-1.2/related/pluginrelated.class.php
> 2007-09-10 19:45:42 UTC (rev 5924)
> +++ plugins/branches/lifetype-1.2/related/pluginrelated.class.php
> 2007-09-11 05:09:42 UTC (rev 5925)
> @@ -25,10 +25,11 @@
>                 var $pluginEnabled;
>                 var $numRelatedArticles;
>                 var $minWordLength;
> +               var $minNumKeywords;
>                 var $refreshInterval;
> -        var $cacheFolder;
> -        var $extractKeywordsFromBody;
> -        var $bannedWords;
> +               var $cacheFolder;
> +               var $extractKeywordsFromBody;
> +               var $bannedWords;
>
>                 function PluginRelated( $source = "" )
>                 {
> @@ -38,7 +39,7 @@
>                         $this->desc    = "The Related plugin will generate
a
> list of related posts.";
>                         $this->author  = "Paul Westbrook";
>                         $this->locales = Array( "en_UK" );
> -            $this->version = "20070602";
> +                       $this->version = "20070910";
>
>
>                         if( $source == "admin" )
> @@ -62,6 +63,7 @@
>                         $this->pluginEnabled = $blogSettings->getValue(
> "plugin_related_enabled" );
>                         $this->numRelatedArticles =
$blogSettings->getValue(
> "plugin_related_num_articles" );
>                         $this->minWordLength = $blogSettings->getValue(
> "plugin_related_min_word_length" );
> +                       $this->minNumKeywords = $blogSettings->getValue(
> "plugin_related_min_num_keywords" );
>                         $this->refreshInterval = $blogSettings->getValue(
> "plugin_related_refresh_interval" );
>                         $this->extractKeywordsFromBody =
> $blogSettings->getValue( "plugin_related_extract_keywords_from_body" );
>                     $this->bannedWords = $blogSettings->getValue(
> "plugin_related_banned_keywords" );
> @@ -116,8 +118,14 @@
>
>                  // Get the keywords
>                  $keywords = $this->getArticleKeywords($article);
> +
> +                // Make sure that there are enough keywords to make
> +                // generating the list of articles worth while
> +                if ($this->minNumKeywords != "" && count($keywords) <
> $this->minNumKeywords) {
> +                    return $relatedArticles;
> +                }
> +
>
> -
>                  foreach($keywords as $word) {
>                      // Build the list of articles that have this keyword
>                      lt_include(
> PLOG_CLASS_PATH."class/dao/searchengine.class.php" );
>
> Modified: plugins/branches/lifetype-1.2/related/templates/related.template
> ===================================================================
> --- plugins/branches/lifetype-1.2/related/templates/related.template
> 2007-09-10 19:45:42 UTC (rev 5924)
> +++ plugins/branches/lifetype-1.2/related/templates/related.template
> 2007-09-11 05:09:42 UTC (rev 5925)
> @@ -35,6 +35,17 @@
>    </div>
>
>    <div class="field">
> +   <label for="width">{$locale->tr("related_min_num_keywords")}</label>
> +   <span class="required">*</span>
> +   <div class="formHelp">{$locale->tr("related_num_keywords")}</div>
> +   <input class="text" type="text" name="minNumKeywords"
> id="minNumKeywords"
> +           {user_cannot_override
> +               key=plugin_related_min_num_keywords}readonly="readonly"
> +           {/user_cannot_override}
> +          value="{$minNumKeywords}" width="10" />
> +  </div>
> +
> +  <div class="field">
>     <label for="size">{$locale->tr("related_cache_lifetime")}</label>
>     <span class="required">*</span>
>     <div class="formHelp">{$locale->tr("related_cache")}</div>
>
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://limedaley.com/mailman/listinfo/plog-svn
>
>
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://limedaley.com/mailman/listinfo/plog-svn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://limedaley.com/pipermail/plog-svn/attachments/20070912/213da1c7/attachment-0001.htm 


More information about the pLog-svn mailing list