Guide to blogging in Dhivehi using WordPress

Here is a short guide to using Thaana to make posts in Dhivehi using the free WordPress blogging service. I regularly get emails from different people asking me for help on this topic, so I hope this is helpful to all such people :-).

The solution presented here can be used on the free Wordpress blogging accounts available from WordPress.com and also on custom installs of WordPress. Please be aware, however, that there are much better ways to setup WordPress for Dhivehi posting if you have a custom install of WordPress or have a paid WordPress account.

Requirements

You need to have the Dhivehi keyboard installed on your computer. If you already type Dhivehi, say using MS Word or OpenOffice, then you have this already. See this post by Fayid if you need help with getting the Dhivehi keyboard installed.

Steps

1. Login to WordPress and click on the "Write" tab to start a new post.


2. In the post editor area, find and click the "HTML" tab to switch the view to HTML mode.


3. Copy and paste the following code into the editing area. It contains the bare minimum HTML/CSS needed to correctly display Dhivehi on all browsers supporting right-to-left text display in Unicode.
Test



4. Tinker around with the font family and font size settings if you know some CSS (or are feeling adventurous enough!). Faruma is probably the most decent Dhivehi Unicode font and is installed on most computers - hence is listed as the first preference. Different people like different font sizes for Dhivehi but 14px and 16px, I think, tend to be the easiest on the eye.

5. Click the "Visual" tab to switch the view back to the normal WYSIWYG mode.


6. Switch the keyboard to "Divehi" on the computer.


7. Select or delete the "Test" text on the post editor and start writing in Dhivehi.


That's all it takes!

Notes:
Unfortunately, there is no way to apply proper formatting to the post title with the free accounts on WordPress so you will have to settle for the out-of-alignment title display.

Video guide:
I thought it'd be fun to make a video guide for this and here is what thus materialized: View good quality | View crap quality (Youtube)

Good luck with the blogging!

Thaana date formatting for PHP 5

Here is a PHP 5 class that provides a drop-in function replacement/equivalent for the built-in PHP date() function to output formatted dates in Thaana/Dhivehi. It follows the standard method of writing Gregorian dates in Thaana by using transliterations of the English month names and using the native Dhivehi names for the week days. It accepts all the usual formatting arguments permitted by the original date() function thus allowing the same degree of formatting freedom as the original. The output returned from the function uses ASCII Thaana and, if needed, can then be converted to Unicode/UTF-8 by using the Thaana Conversions class. This class does not support Hijri dates (yet).

The class is being released under the Open Source MIT License.

Functions exposed

format()
Returns a Dhivehi date string formatted according to the given format string using the given integer timestamp

Usage

<?php
// Load class include
require 'thaana_date.obj.php';

// Format date
$thaanatoday = Thaana_Date::format('j M Y', time());
?>

Download

- Thaana_Date.zip (v0.2, 1.4KB)

Drop me a line if you have comments/queries. Enjoy :-)

Thaana conversions class for PHP 5 - v0.2

Here is an update to the Thaana conversions class I released in Nov 2007. This new version 0.2 release expands the varieties of conversions available and should be more than adequate for almost all uses. This version, most importantly, adds solid UTF-8 conversion functions allowing for more flexibility in PHP-based Unicode/UTF-8 Thaana handling. Further, the class is now licensed under the pretty liberal Open Source MIT License. The code still relies solely on core PHP 5 functions and does not demand any extra PHP extensions to be installed.

Functions exposed by the class

- convertUtf8ToUnicodeIntegers()
Convert UTF-8 data to Unicode character integer representations

- convertUtf8ToAscii()
Convert UTF-8 data to Ascii

- convertEntitiesToUnicodeIntegers()
Convert HTML Unicode entitied string to Unicode Integer characters array

- convertEntitiesToUtf8
Convert HTML Unicode entities to UTF-8

- convertEntitiesToAscii()
Convert HTML Unicode entities to Dhivehi Ascii equivalents

- convertUnicodeIntegersToUtf8()
Convert Unicode Integer array to UTF

- convertUnicodeIntegersToEntities()
Convert Unicode char integers to HTML entities

- convertUnicodeIntegersToAscii()
Convert Unicode char integers to Ascii

- convertAsciiToUtf8()
Convert Ascii Thaana to UTf-8

- convertAsciiToUnicodeEntities()
Convert Ascii Thaana to Unicode HTML entities

- convertAsciiToUnicodeIntegers()
Convert Ascii Thaana to an array of Unicode integers

Usage

<?php
// Load class and initialize object
require 'thaana_conversions.obj.php';
$thaana = new Thaana_Conversions();

echo $thaana->convertEntitiesToAscii('ދިވެހި');
echo $thaana->convertAsciiToUtf8('rWacje');
?>

Download:

- Thaana_Conversions.zip (v0.2, 3KB)

Drop me a line if you run into trouble with any of the functionality or have comments/queries. Enjoy :-)

Update (11-Sep-2008): This version is now superseded by the v0.3 release.

Javascript Unicode Keyboard Handler for Thaana

Here's something that is probably going to be very useful to the Maldivian web developers working on Unicode-based Thaana web pages. It is a Javascript utility function that translates keystrokes into the appropriate Unicode Thaana characters. Hence, it makes it possible for HTML text input and textarea fields (and similar) to accept Thaana without having to require the user to switch the keyboard language on their computer. Such a feature contributes for a better user experience as the user can simply enter Dhivehi without the extra hassle. The code has been tested with no problems found on Firefox 1/2/3 and Internet Explorer 5/6/7.

If you would like a demo, I recommend you check out the text entry box at Radheef.com and see the HTML behind it. A few developers seem to have already adopted my code as at Radheef.com and utilized it in their work - haamadaily.com, sangudaily.com and jazeera.com.mv and haveeru.com.mv is using the code far as I know.

I originally wrote this around 2002 while experimenting with different methods of Thaana entry for the web. The version I'm releasing here, marked as version 2.0, is a modified version from 2006. It is being released under the MIT License.

- Download unicodehandler-2.0.js

Usage

1. Link the file in the HEAD section of the page:
<script type="text/javascript" src="/unicodehandler.js"></script>

2. Attach the handler to any text INPUT, TEXTAREA or editable DIV tag:
<textarea rows="1" onkeypress="return juk_HandleKeyPress(event);"></textarea>

3. Set any Unicode-compatible Dhivehi font to be used for the field using CSS.

4. That's it!

Drop a line here if you use it and/or have problems. Enjoy.

Update (16-Aug-2008): This version is now superseded by the new and improved v3.0.

Dhivehi Radheef application for Facebook

I am launching a new Maldivian Facebook application which I had planned out as part of a scheduled series of feature updates to Radheef.com. This new Facebook application displays a random word, and its associated meaning, from the Dhivehi Radheef on your Facebook profile. Words are automatically updated everyday so as to keep your profile fresh and, perhaps even, educational.

Give it a go if it catches your fancy.

- View the application's About page on Facebook


Fingerprinting Thaana

What is the frequency of characters in a typical Dhivehi writing? What is the most commonly used Thaana akuru/fili in Dhivehi? Is there a general pattern of akuru and fili to be expected in any given Dhivehi document?

These questions, and especially the latter, kindled my curiosity yesterday and had me off to explore a little bit. Although seemingly trivial and of no practical use, these are serious questions that probe into the finer details of Dhivehi and help produce computational models of Dhivehi - which have practical applications. Even the generalizations and patterns that result from the simplest statistical analysis transcend the (quirks of) individual writing and give a broader picture of what a language is really like. For example, I'm employing a statistical fingerprint of Dhivehi that was generated during this little exercise as part of an experimental procedure that identifies (the presence of Dhivehi) content in web pages. It takes advantage of the fact that the fingerprint for Dhivehi and that for English are dramatically different thus allowing a computer program to discern the type of content it is dealing with - all without really "understanding" a language.

I conducted the analysis on a dataset consisting of ~5000 Dhivehi articles from Haveeru Daily and ~7000 Dhivehi articles from Jazeera Daily. They may not represent the whole varieties of Dhivehi literature available but I think they are a very good approximation - especially of Dhivehi web content which is what I was mostly interested in. My focus was on the individual character level and ran basic mean, mode, variance, standard deviation and frequency calculations with a further character correlation analysis. Despite these being quite simple analyses, I don't think anyone's ever explored as much before and hence the following should make for (exciting!) new information.

Enjoy :-)


Mean fili usage in Dhivehi writing


Mean akuru usage in Dhivehi writing


Thaana character frequencies

Towards a (true) Dhivehi search engine

As much as I would like the Dhivehi language to die and rot away, it seems it won't happen, atleast for a while. The (relatively) newly minted freedom to publish newspapers and the growth of web-based news sites may have poised Dhivehi for a serious revival of the language. The revival probably isn't so much in terms of improvements in the vocabulary or other more linguistics related changes but rather a revival in terms of the amount of information now being pumped out in Dhivehi - and in my opinion, that's a great start.

A (if not THE) point worth noting here is that much of this new information is being produced - and published - by digital means. Most government authorities now have web portals and an increasing number of them maintain them diligently. Most, if not all, newspapers and magazines also seem to maintain web portals with their content being made available online on the web. This modern revival thus presents a very interesting and a very much modern set of problems (to geeks like me atleast :-P) :- accessing it. It is probably the first time in Maldivian history that a "dhivehi search engine" makes practical sense.

Now, I am aware that Google and other search engines can be used to search for Dhivehi and I'm also aware that there are a few local operations that purport/aspire to be Maldivian search engines but they all share important shortcomings. These shortcomings are mostly inherent to the various methods of writing Thaana as used on the World Wide Web.

Say you want to search for the word "rayyithunge". Typing that into a search engine would bring an entirely different set of results from typing in "rwacyituncge" or "ރައްޔިތުންގެ" - both of which are alternative forms of representing the same thing in Dhivehi. The different set of results arise because of the differences in the representation schemes used on the different sites. A search with the phrase "rayyithunge" would bring in results with pages that seem to mostly contain English and that's because "rayyithunge" is Dhivehi "Latin"ised into English so that we could use standard English characters to write Dhivehi words. People commonly use such Latinised Dhivehi when writing emails or chatting - say "haalu kihineththa" etc. Meanwhile, a search with the phrase "rwacyituncge" results in a listing of content from sites like Haveeru and Miadhu who use standard ASCII coupled with custom Dhivehi fonts with the characters mapped. If you try copy-pasting something written on the Haveeru page you'd see that it comes out as a seemingly meaningless jumble of letters. Lastly, a search with the phrase "ރައްޔިތުންގެ would bring in results from sites like Minivan Daily and Sangu Daily who use Unicode to display Dhivehi. Anyway, the technical explanations aside, the point is that Dhivehi search is (currently) a messy enterprise.

The solution to this problem can (seem to) be pretty simple. A custom search interface could be made to simply take the search query from a user and convert it into the three different representation schemes and then spawn search a search for each representation phrase on any of the existing search engines. This would work just fine... until you run into peculiar problems related to Latinised and ASCII Dhivehi schemes. Take for example the word "ފަލަ" Latinised into "fala" - a search on the word would result in almost entirely non-Dhivehi results totally unrelated to what we really want. Similarly, a search on the ASCII'ed phrase "Oled" (which is the word "ދެލޯ") would result in a large number of non-Dhivehi results with no bearing on what we wanted. These problems occur because Latinised and ASCII Dhivehi representations can result in text that have meaning in English as well - such as the case of "Oled" as above which happens to be a popular technical term in English.

A more sophisticated approach to the search problem probably could successfully iron out (most of) these quirks. An ideal solution would be to do away with the existing search engines such as Google, despite their awesomeness, and develop a custom search engine. A custom engine would allow for the recognition of the various representation schemes used and the subtle differences between them. A search phrase entered on such an engine would perhaps standardize the phrase and search through a standardized index to return results that are a better mirror of the Dhivehi content that is out there. Such a custom search engine could bundle in extra Dhivehi-related facilities such as conversions to allow for lack of (particular) fonts as used on sites and spelling correction among others.

So, perhaps the question now is, is there a real need for a Dhivehi search engine yet? When should a Maldivian "Google" be born?