Thaana Unicode<->Ascii conversions PHP class

Here is something that would probably be very handy to Maldivian web developers dabbling with Dhivehi sites. This PHP class addresses the need for converting text to and from Thaana in Ascii and Thaana in Unicode.

The class makes it easy to standardize text into one format irrespective of how it was/is written. This means that you can take text written in Accent, MS Word 97 (and prior) or written using Unicode as featured on recent MS Word editions and use the class to present output in the format of your choice without the need for imposing restrictions on the people who write the text. The class comes in even more handy when you have a form submission that takes input in Unicode but needs to be stored in the database or presented later as Ascii, or vice versa.

The class was something I originally wrote around 2001 and was used in the free Online Document Converter that featured on maldivianunderground.net. I rewrote it for PHP 5 recently for use in a project I am working on. The original class had support for Letin dhivehi -> Unicode/Ascii conversions as well which I haven't included in this release but will add it a future update.

Usage should be pretty straightforward but here is an example just to illustrate:

Example:
<?php
require 'thaana_conversions.obj.php';
$thaana = new Thaana_Conversions();

echo $thaana->convertUnicodeToAscii('&#1931;&#1960;&#1928;&#1964;&#1920;&#1960;');
echo $thaana->convertAsciiToUnicode('rWacje');
?>

Download:
- Thaana_Conversions.zip (v0.1, 2KB)

Enjoy :-)

Update (7-May-2008): This version is now superseded by v0.2.

Guide to using Thaana on the WWW

Developing Dhivehi web pages is pretty easy and there are quite a few methods to do it. However, information on how to go about it seems to be lacking, leaving newbies stumped. Here is a general overview on the various methods for displaying Thaana on the WWW and should contain enough information to help anyone, designer or programmer, get started.

1. CSS: rtl + bidi-override

This method is applicable only to non-Unicode text. It works on all modern browsers but requires for the user to have atleast one of the fonts specified in the page - otherwise the text would be displayed as a mostly meaningless jumble of English letters.

This is the least-effort route to getting any non-Unicode Thaana text (such as those written using MS Word 97/2000, Accent Express, MLS or Faseyha Thaana) on to the web. The websites of Haveeru and Miadhu currently take this approach.

Usage:
To use this method, apply the following CSS to any HTML elements that contain Thaana text. You may use inline style attributes or CSS class/ids to achieve this. You may change the font names to suit your needs but make sure you list several popular fonts and that the fonts specified are all non-Unicode fonts. You could, of course, also add further CSS styling (font size, font color, line height etc) but the following are the required minimum.
font-family: A_Ilham, A_Randhoo, A_Faruma, A_Waheed;
direction: rtl;
unicode-bidi: bidi-override;

Demo:
View example



2. Unicode Dhivehi

This method is applicable to text in Unicode. It works well on all modern browsers but requires for the user to have atleast one Unicode Thaana font - and unlike method (1) the system defaults to a Thaana font it does have if it cannot find any of the fonts named in the page.

This is the best method for any new and modern Thaana-based website. It is used in the online Radheef, Jazeera Daily and Haama Daily.

Usage:
To use this method, first add the following to the page's HTML HEAD section.


Next, apply the following CSS to any HTML elements that contain Thaana text. You may use inline style attributes or CSS class/ids to achieve this. You may change the font names to suit your needs but make sure that the fonts specified are all Unicode fonts. You could, of course, also add further CSS styling (font size, font color, line height etc) but the following are the required minimum.
font-family: Faruma, "MV Elaaf Normal";
direction: rtl;
text-align: right;

Demo:
View example



3. Image

This approach basically renders the Dhivehi text as an image. This is perhaps the most obvious and was the only method available early on. However, this method is still a pretty lucrative solution especially given that many computers just don't have the required fonts available. Using an image for the text rids the requirement on the client browser/computer to have the proper fonts available.

The basic approach of rendering the text into an image using Photoshop, MS Word etc is pretty tedious as the process is entirely manual. However, there is a more sophisticated approach that renders the text into Dhivehi on-the-fly on the web server side (perhaps coupled with caching to reduce load). A server-side scripting language such as PHP can be used to render text into an image using any font of choice by the designer/programmer. The rendered images (typically PNGs) are of very small size and hence have a negligible effect on the page load time in most cases.

Refer to the imagettftext function for details on how to do it in PHP.



4. Flash

This method uses text loaded in Macromedia Flash with the required font(s) being embedded in the Flash clip. ActionScript and/or Flash variables are used to load the text into text areas in the Flash file. This method has the advantage that it works whether the client computer/browser has Dhivehi font available or not but then again it does require the client to have Flash installed and enabled. If you are only seeking to have nice one-line headline sort of text in Dhivehi then you might consider using sIFR.

Refer to Font Embedding help page at Adobe LiveDocs for details on font embedding in Flash.



5. WEFT

Web Embedding Fonts Tools is a Internet Explorer only solution offered by Microsoft. It involves using the Windows-only WEFT utility to create font "objects" that can then be placed on web pages. This method is not recommended unless the target only involves use of Internet Explorer.

Refer to Microsoft WEFT page for more information.



6. TrueDoc

TrueDoc is a solution offered by Bitstream Inc. It is a solution similar to Microsoft's WEFT in that TrueDoc solutions create a embeddable font resource called a Portable Font Resource. Any font (ie. Dhivehi font) can be loaded once users install a custom font "viewer" (called the Character Shape Player by the company). This solution is NOT free and requires the purchase of special software from BitStream to produce the custom embeddable font packages.

Refer to the TrueDoc site for more information.



Good luck ;-)

Update (24-Nov-2008): Method 1 and 2 rewritten for clarity and demos added.

Accent2RTF Converter and MLS Converter

These are programs that I published on my (previous) digital playground at maldivianunderground.net. The site has been offline for ages, yet these two software are something quite a lot of people ask me for. So here they are; the Accent2RTF converter and MLS Converter.

Accent2RTF Converter
Accent and Accent Express have been used for creating Dhivehi documents for quite a while and remains in high use even now. Unfortunately however, the Accent format is not supported in any of the Microsoft Word or OpenOffice versions and an independant converter is needed to convert the Accent prepared documents for use in these software. This is where Accent2RTF comes into play: point it to a Accent (*.acc) file and it will spit out a RTF (*.rtf) file that can be opened and used in any text editing software that supports RTF formats (eg. MS Word, OpenOffice Write). This is the original version as it was first released in 2001.
- Download Accent2RTF Converter 1.0 Installer ( 217Kb, MS Windows only)

MLS Converter 2
Multi Lingual Scholar is a text editing software that has been used for the creation of Dhivehi documents on computers since 1988, I believe. However, it requires MS DOS to run and the software was discontinued in 1998. The use of MLS has greatly decreased since the introduction of dhivehi text entry using Unicode on Windows XP and later Microsoft operating systems. Despite this, there remain a bundle of MLS format files that were created in its time. This converter takes in any MLS (*.mls) file and gives out a RTF file for use in Microsoft Word and all the other various RTF supporting text editing software. MLS Converter was first released in 2001 and updated to version 2 a year later.
- Download MLS Converter 2 Installer ( 221 Kb, MS Windows only)

Enjoy!

Ajax flavoured Radheef released!

Oh you all know what "radheef" is right? (Psst. In case you didn't know, Radheef is the Dhivehi dictionary.)

I had (err unlawfully?) ripped off the data from the Radheef released by the National Centre for Linguistics and Historical Research when the software came out a couple of years ago. It was those times that I was into the MaldivianUnderground project - and quite soon I had programmed an online radheef interface to do lookups. There have been various versions of the online radheef since then: one on MaldivianUnderground that relied on Dhivehi entry in latin, another on Bichoo.net that sported a Flash front-end and yet another somewhere that used what I call "dynamic font rendering" to show the output in Dhivehi - which is neat as it shows up whether the computer has Dhivehi fonts installed or not. However, all of these radheef apps no longer exist, thanks to the disappearance of each of the projects that the radheef was released under, and so I decided to slap up yet another radheef!

The new radheef now resides at its own domain name at http://www.radheef.com/. A cool feature maybe the ability to link to words definitions directly via the use of special URLs like this.

The radheef will be kept alive this time hopefully. Give it a spin. It will be useful if you work with Dhivehi and, like me, have questionable command over Dhivehi vocabulary. Please note that it is at an "experimental" stage at the moment and might not work smoothly on all browsers/operating systems. I'd appreciate if you let know if that is the case - do mention the browser name/version and your operating system name/version.


Techie stuff:
This latest version of my online radheef uses AJAX technology - to suit the current ajax application craze. The new radheef also relies on Unicode Dhivehi and you should be able to enter and read the Dhivehi used on the radheef as long as you have a recent browser with Unicode support. Further options to enable you to use the radheef without having Dhivehi fonts installed would be made available later. The radheef does require that you have JavaScript enabled but that shouldn't be a problem to most, after all almost all browsers these days come with JavaScript and unless you have turned it off manually, the radheef should work fine.

I should note that the Unicode text entry is a bit dodgy at the moment. The text entry relies on Unicode fonts coupled with a custom written keyboard handler (in JavaScript) to map the normal keycodes into Unicode. I shall be releasing the JavaScript keyboard handler script under GPL soon. The script is again something that I had written a couple of years ago but has now been rewritten to accommodate the browser advances and changes. I have tested the handler to work fine under IE 6, Firefox 1.5, K-Meleon 0.9.1, Safari and Opera 9.

Dhivehinnaai Portugeesun

I was rummaging through my backup disks when I stumbled across this PDF document that I produced in 2003. It is called "Dhivehinnaai Portugeesun" and is a digitized version of a similarly titled series of articles that was featured in the "Faiythoora" journal published by the National Center for Linguistic and Historical Research (NCLHR). It was authored by "Khaassa Musheer" Naseema Mohamed from the Center and details the interactions between the Portuguese and the Maldivians through the years 1479 to 1650 in Maldivian history. I thought I'd share it since this undoubtedly would be of much use to anyone looking for such material.

This series was contributed for distribution at the Book Fair 2003, organized by the NCLHR. It was part of a presentation I made at the fair and was used to demo how even old Dhivehi MLS documents maybe converted to modern formats. Adobe Acrobat (PDF) format was chosen to show how Dhivehi can be used and displayed in this (mostly) universal format to be used for distribution as a step in embracing the digital revolution. I was hoping that this would encourage production of Dhivehi e-books and e-documents. The document was converted from the original MLS format to Rich Text format using a converter application that I had released on my then technology playground at bichoo.net. The Rich Text format file that was produced was then imported into Acrobat and the necessary pictures scanned in and inserted to prepare the final document. The font(s) used were embedded into the PDF document so that viewers do not need to have any special Dhivehi fonts installed.

Anyway, I hope someone finds this useful!

- Click here to download DhivehinaaiPortugesun1-3.zip (3.5 MB, Zip file).

JavaScript Dhivehi Character Recognition

Here is another of my pet projects brought back from the land of the deceased.

This one is called "JavaScript Dhivehi Character Recognition". It was created early 2003 (or maybe late 2002) and made available on bichoo.net. Basically, it lets you draw a Thaana character using your mouse and then it "recognizes" what you have drawn. The purpose was mostly to satisfy my curiosity into artificial intelligence and pattern recognition at the time, however it also showed promises of the beginnings of a future where Dhivehi documents maybe scanned in and processed by a computer to convert it to text just as Optical Character Recognition technology has been doing for English documents. I think this rudimentary application was the first ever Dhivehi character recognition implementation released to the public. More interestingly, this seems to be the only character recognition implementation programmed in JavaScript floating around on the Internet even now. :-D

I spent a bit of time tonight reworking some bits of the code for clarity. The entire implementation is done using JavaScript and DHTML. You are welcome to study the code to see how it works. The code is well commented and maybe a good starter into AI and pattern recognition basics. It uses a single layer single Perceptron model to really simplify things however it is a good enough practical implementation to work for characters drawn on a 10x10 grid. The grid makes up the input data to the neural network. The neural network is hard-coded into the page and has definitions for each character in the alphabet. I do hope you are surprised by the accurateness of the recognition of this little application.

Have a look at it HERE. Let me know if you find it amusing... or not.

My company - Technova Pvt Ltd - is currently working on bringing a full fledged Dhivehi OCR software to the Maldivian public. It will probably be made available early 2006, as a service for customers requiring bulk OCR processing. We shall be releasing Windows, Linux and Mac versions of the software for home and business use around mid 2006.