[pLog-svn] db optimization

Oscar Renalias oscar at renalias.net
Thu Apr 13 19:31:12 GMT 2006


We can use devel.lifetype.net for data generation purposes and get a
big db there. But I am not sure we can use the server for stress test
and load purposes since it's the same server used for forums, wiki and
the main site.

I can provide local accounts if needed too.

Oscar

On 4/13/06, Christoph Feddersen <feddersen at herr-der-ringe-film.de> wrote:
> Yes, the data I worked with probably isn't very realistic for an
> installation with several hundred blogs and thousands of articles and
> comments. That's why I asked to enhance the generateData script to
> generate a more realistic test scenario.
> What I've in mind is a installation with about 10.000 users, 15.000
> blogs, 300.000 articles, 500.000 comments (some of them spam),
> the bayesian tables need to be filled accordingly etc, standard plugins
> installed and used etc.
>
> That would be step one. Step two would be to simulate traffic to the
> installation. There are testing tools that can use apache log entries to
> simulate users. That would allow us to get some realistic data to
> identify the current bottlenecks.
>
> So my current suggestions are just some general hints. A detailed
> analysis will require a good test environment, as described above.
>
> Jon Daley wrote:
> >     That looks like an excellent analysis, and some good tips - I didn't
> > know any of that.
> >     One thing to keep in mind is that Christoph only looked at Oscar's
> > log, so there are some features that presumably aren't used, and might
> > affect other people's blogs negatively, say, (I'm guessing) if they
> > don't use custom URLs, or other things like that?
> >
> >
> > On Thu, 13 Apr 2006, Christoph Feddersen wrote:
> >
> >> Hi there,
> >>
> >> I examined the provided logfiles and came up with some changes to the
> >> db-indices. They just improve the performance of several queries without
> >> changing the db-schema or code.
> >> I dropped several indices, because there's no need to two indices for a
> >> single column. It just adds an overhead when inserting/updating/deleting
> >> records.
> >>
> >>
> >> ALTER TABLE `plog_articles_comments` ADD INDEX ( `blog_id` , `type` );
> >> ALTER TABLE `plog_articles_comments` DROP INDEX `blog_id`;
> >> ALTER TABLE `plog_articles_comments` DROP INDEX `article_id`;
> >> ALTER TABLE `plog_articles_comments` ADD INDEX ( `article_id` , `type`);
> >> ALTER TABLE `plog_articles` DROP INDEX `blog_id`;
> >> ALTER TABLE `plog_gallery_albums` DROP INDEX `owner_id`;
> >> ALTER TABLE `plog_gallery_resources` DROP INDEX `album_id`;
> >> ALTER TABLE `plog_referers` DROP INDEX `blog_id`;
> >> ALTER TABLE `plog_users_permissions` DROP INDEX `user_id`;
> >> ALTER TABLE `plog_articles` DROP INDEX `blog_id_slug`;
> >> ALTER TABLE `plog_articles` ADD INDEX ( `blog_id` , `status` , `date` );
> >>
> >> Some general advices
> >> ====================
> >>
> >> Don't use COUNT(*), use COUNT(Primary key field) instead. There's no
> >> need to load complete record sets when you only want to know the number
> >> of rows.
> >>
> >>
> >> If possible, the conditions of a queries should be in the same order and
> >> have an index in exactly that order. There are several queries that use
> >> the same conditions, but they appear in a different order. Example:
> >>
> >> SELECT date  FROM plog_articles  WHERE status = 1  AND blog_id = 1 AND
> >> date >= 20060300000000 AND date <= NOW();
> >>
> >> SELECT * FROM plog_articles WHERE blog_id = 1 AND date >
> >> '20060311172502' AND status = 1 ORDER BY DATE ASC LIMIT 0,1;
> >>
> >> You see that I added a multi column index with:
> >> ADD INDEX ( `blog_id` , `status` , `date` );
> >>
> >> So the conditions should be order that way:
> >> SELECT date  FROM plog_articles  WHERE  blog_id = 1 AND status = 1 AND
> >> date >= 20060300000000 AND date <= NOW();
> >>
> >> SELECT * FROM plog_articles WHERE blog_id = 1  AND status = 1 AND date >
> >> '20060311172502' ORDER BY DATE ASC LIMIT 0,1;
> >>
> >>
> >> Avoid like% statements, they are very slow and can't make use of indices:
> >> SELECT a.id as id, a.id, a.date,
> >> a.user_id,a.blog_id,a.status,a.properties, a.num_reads, a.slug, 1 AS
> >> relevance, a.num_comments AS num_comments, a.num_nonspam_comments AS
> >> num_nonspam_comments, a.num_trackbacks AS num_trackbacks,
> >> a.num_nonspam_trackbacks AS num_nonspam_trackbacks,
> >> a.global_category_id AS global_category_id, a.in_summary_page AS
> >> in_summary_page
> >> FROM plog_articles a, plog_articles_categories c,
> >> plog_article_categories_link l
> >> WHERE  a.blog_id = 1
> >> AND FROM_UNIXTIME(UNIX_TIMESTAMP(a.date)+0)+0 LIKE '200601%'
> >> AND c.id = l.category_id AND a.id = l.article_id  AND a.status = '1' AND
> >> a.date <= '20060402144438'
> >> GROUP BY a.id
> >> ORDER BY a.date DESC LIMIT 0, 5;
> >>
> >> Use fulltext index with match against instead of like
> >> SELECT article_id FROM plog_articles_text WHERE (((normalized_topic LIKE
> >> 'S') OR (normalized_text LIKE 'S')) OR ((normalized_topic LIKE 'S') OR
> >> (normalized_text LIKE 'S')));
> >>
> >>
> >> I'll take a closer look at queries in the code in the next week. I'll
> >> need to fill me database with some more data to get better test results.
> >> That's why I ported the generateData.php to 1.1. Maybe someone can
> >> enhance it even more so that all database tables will be filled with
> >> realistic values.
> >>
> >> The overall db-structure is looking good. Regarding the MyISAM vs.
> >> InnoDB question, there's an article that compares these storage engines:
> >> http://dev.mysql.com/tech-resources/articles/storage-engine/part_3.html
> >>
> >> I'd stick with MyISAM in the standard distribution. As most people will
> >> run small installations, it's the simplest and fastest way and you can
> >> use the FULLTEXT search. Installations with a large database may benefit
> >> from using InnoDB but I'd leave that up to the admin. Maybe add a notice
> >> to the documentation that this can be done (beware: InnoDB doesn't
> >> support fulltext).
> >>
> >> Christoph
> >>
> >> _______________________________________________
> >> pLog-svn mailing list
> >> pLog-svn at devel.lifetype.net
> >> http://devel.lifetype.net/mailman/listinfo/plog-svn
> >>
> >
> > **************************************
> > Jon Daley
> > http://jon.limedaley.com/
> >
> > Our software is not 'released'.
> > It *escapes*, leaving a trail of mangled QA engineers in its path.
> > _______________________________________________
> > pLog-svn mailing list
> > pLog-svn at devel.lifetype.net
> > http://devel.lifetype.net/mailman/listinfo/plog-svn
> _______________________________________________
> pLog-svn mailing list
> pLog-svn at devel.lifetype.net
> http://devel.lifetype.net/mailman/listinfo/plog-svn
>


More information about the pLog-svn mailing list