Some months ago I setup Mediawiki with the latest Wikipedia dump. It was the most horrible experience. Its easier to work with Linux kernel code than installing Mediawiki and setup Wikipedia. So here are the steps:
1. Use BitNami MediaWiki
BitNami MediaWiki stack has MediaWiki, Apache, MySQL, PHP and phpMyAdmin. So everything that you need for Wikipedia gets installed in one shot.
2. Drop and recreate all the tables, within the Mediawiki database in MySQL, to have ENGINE=MYISAM and everything from latin1 has to change to utf8. Here is my sql file to recreate the tables.
IMPORTANT: MYISAM engine is optimal for reading from the database. So if you want to provide Wikipedia edits then MYISAM is not for you. But its rare to provide editing for a duplicate Wikipedia site.
3. Change the MySQL configuration file to this. Basically you increase the memory limits as Wikipedia is massive.
4. Use mwdumper to import the Wikipedia dump into Mediawiki. All the instructions and troubleshooting problems are listed at that link. But the key advice is to read each and every instruction and follow it exactly, even for the troubleshooting options.
5. Mediawiki is very very slow with Wikipedia. So you need a caching mechanism within Mediawiki. Follow the steps listed here. Installing a PHP cache engine is a must.
6. Next install a reverse proxy cache(caching outside mediawiki) like Squid if you will have large number of hits. Wikipedia itself uses Squid extensively. But this is optional and only for the highest form of optimization. I just used reverse proxy caching in IIS7.
That should be it. Am sure you will break your head even with these steps but atleast the wall your banging your head on is not made of titanium now.
I am going to revisit this soon and see what other optimization I can do, as its still slow. That can be another Wikipedia Optimization post.
Thanks for reading!
No comments:
Post a Comment