[pLog-svn] db optimization
Christoph Feddersen
feddersen at herr-der-ringe-film.de
Thu Apr 13 17:55:57 GMT 2006
Hi there,
I examined the provided logfiles and came up with some changes to the
db-indices. They just improve the performance of several queries without
changing the db-schema or code.
I dropped several indices, because there's no need to two indices for a
single column. It just adds an overhead when inserting/updating/deleting
records.
ALTER TABLE `plog_articles_comments` ADD INDEX ( `blog_id` , `type` );
ALTER TABLE `plog_articles_comments` DROP INDEX `blog_id`;
ALTER TABLE `plog_articles_comments` DROP INDEX `article_id`;
ALTER TABLE `plog_articles_comments` ADD INDEX ( `article_id` , `type`);
ALTER TABLE `plog_articles` DROP INDEX `blog_id`;
ALTER TABLE `plog_gallery_albums` DROP INDEX `owner_id`;
ALTER TABLE `plog_gallery_resources` DROP INDEX `album_id`;
ALTER TABLE `plog_referers` DROP INDEX `blog_id`;
ALTER TABLE `plog_users_permissions` DROP INDEX `user_id`;
ALTER TABLE `plog_articles` DROP INDEX `blog_id_slug`;
ALTER TABLE `plog_articles` ADD INDEX ( `blog_id` , `status` , `date` );
Some general advices
====================
Don't use COUNT(*), use COUNT(Primary key field) instead. There's no
need to load complete record sets when you only want to know the number
of rows.
If possible, the conditions of a queries should be in the same order and
have an index in exactly that order. There are several queries that use
the same conditions, but they appear in a different order. Example:
SELECT date FROM plog_articles WHERE status = 1 AND blog_id = 1 AND
date >= 20060300000000 AND date <= NOW();
SELECT * FROM plog_articles WHERE blog_id = 1 AND date >
'20060311172502' AND status = 1 ORDER BY DATE ASC LIMIT 0,1;
You see that I added a multi column index with:
ADD INDEX ( `blog_id` , `status` , `date` );
So the conditions should be order that way:
SELECT date FROM plog_articles WHERE blog_id = 1 AND status = 1 AND
date >= 20060300000000 AND date <= NOW();
SELECT * FROM plog_articles WHERE blog_id = 1 AND status = 1 AND date >
'20060311172502' ORDER BY DATE ASC LIMIT 0,1;
Avoid like% statements, they are very slow and can't make use of indices:
SELECT a.id as id, a.id, a.date,
a.user_id,a.blog_id,a.status,a.properties, a.num_reads, a.slug, 1 AS
relevance, a.num_comments AS num_comments, a.num_nonspam_comments AS
num_nonspam_comments, a.num_trackbacks AS num_trackbacks,
a.num_nonspam_trackbacks AS num_nonspam_trackbacks,
a.global_category_id AS global_category_id, a.in_summary_page AS
in_summary_page
FROM plog_articles a, plog_articles_categories c,
plog_article_categories_link l
WHERE a.blog_id = 1
AND FROM_UNIXTIME(UNIX_TIMESTAMP(a.date)+0)+0 LIKE '200601%'
AND c.id = l.category_id AND a.id = l.article_id AND a.status = '1' AND
a.date <= '20060402144438'
GROUP BY a.id
ORDER BY a.date DESC LIMIT 0, 5;
Use fulltext index with match against instead of like
SELECT article_id FROM plog_articles_text WHERE (((normalized_topic LIKE
'S') OR (normalized_text LIKE 'S')) OR ((normalized_topic LIKE 'S') OR
(normalized_text LIKE 'S')));
I'll take a closer look at queries in the code in the next week. I'll
need to fill me database with some more data to get better test results.
That's why I ported the generateData.php to 1.1. Maybe someone can
enhance it even more so that all database tables will be filled with
realistic values.
The overall db-structure is looking good. Regarding the MyISAM vs.
InnoDB question, there's an article that compares these storage engines:
http://dev.mysql.com/tech-resources/articles/storage-engine/part_3.html
I'd stick with MyISAM in the standard distribution. As most people will
run small installations, it's the simplest and fastest way and you can
use the FULLTEXT search. Installations with a large database may benefit
from using InnoDB but I'd leave that up to the admin. Maybe add a notice
to the documentation that this can be done (beware: InnoDB doesn't
support fulltext).
Christoph
More information about the pLog-svn
mailing list