Thaana conversions class for PHP 5 - v0.4

Here is a major update to the Thaana Conversions class for PHP 5 that I have been maintaining. This new version adds two new functions convertLatinToAscii() and its counterpart convertAsciiToLatin(), which transliterates text to and from the latinized/romanized Thaana form. That means you can pass on text bits like "miadhakee reethi dhuvahekeve" and have it converted to "މިއަދަކީ ރީތި ދުވަހެކެވެ".

This feature would facilitate Thaana websites to add cool and interesting new features like, say, offering to display a news article in latinized Thaana if the user does not have the required fonts installed and/or cannot install the Thaana fonts!

Info

The Thaana Conversions class for PHP provides a number of useful functions for the conversion and transliteration of text between various Thaana representation formats.

Functions exposed

- convertUtf8ToUnicodeIntegers()
- convertUtf8ToAscii()
- convertUtf8ToEntities()
- convertEntitiesToUnicodeIntegers()
- convertEntitiesToUtf8
- convertEntitiesToAscii()
- convertUnicodeIntegersToUtf8()
- convertUnicodeIntegersToEntities()
- convertUnicodeIntegersToAscii()
- convertAsciiToUtf8()
- convertAsciiToEntities()
- convertAsciiToUnicodeIntegers()
- convertLatinToAscii()
- convertAsciiToLatin()

Requirements

PHP 5

License

This script is released under the Open Source MIT License, allowing its use in both personal and commercial applications as long as the copyright and license permission notice remains intact.

Usage

<?php
// Load the class
require 'thaana_conversions.obj.php';

// Initialize the Thaana object
$thaana = new Thaana_Conversions();

// Example: Converting latin to ascii
echo $thaana->convertLatinToAscii('miadhakee haadha reethi dhuvahekeve.');

// Example: Converting ascii to latin
echo $thaana->convertAsciiToLatin('miawdwkI hWdw rIti duvwhekeve.');
?>

Download

- Thaana_Conversions.zip (v0.4, 4.8KB)

Enjoy :-)

Javascript Thaana Keyboard version 4.2

Here's a minor update to my Javascript Thaana Keyboard library. This release, version 4.2, amounts to a single bugfix addressing an error in key translation for some keys when operating under the "Phonetic-HH" keyboard mode. Everything else remains as per the earlier v4 series releases.

Thanks goes to Nattu for bringing the bug to my attention.

Changelog:

+ Fixed handling of keys when in Phonetic-HH keyboard mode

Usage:

Usage remains same as before. Please refer to my detailed post on the 4.0 release.

Demo:

Check out the demonstration and testing page here.

Download:

- full source version (5.51 KB)
- packed version (2.46 KB) [recommended]

Update (13-Apr-2009): This version is now superseded by release v4.2.1.

Guide to using Thaana on the WWW - updated

I published an article last year, titled "Guide to using Thaana on the WWW", with the aim of presenting a quick overview of the various approaches/methods for developing Thaana-based websites. It introduced 6 different methods and included enough implementation details to help a beginner get started. I've now rewritten bits of the article for increased clarity and also added some examples to help fortify the usage instructions.

Click here to read the updated article.

Athuliyun: Thaana Handwriting Recognition demo

I dug up this old project from my backup disks today and worked a little magic to bring it back to life. This was and still is among my favourite experiments. Named "Athuliyun", I developed this software shortly after I bought my first PDA around 2005, with the goal of getting Thaana handwriting recognition on the platform. I didn't have much experience with software development for Windows CE (a.k.a Windows Mobile) and so it ended up being a Windows application. The project got binned when my interests moved to Optical Character Recognition for document scanning...

Athuliyun supports, as it stands now, the Thaana characters but not the filis (diacritics). This ofcourse severely limits its practical use but I reckon adding support for fili would be a relatively trivial task. I will be releasing this publicly, hopefully later this month, after adding that functionality and also retraining the recognition neural networks used in the software for improved performance.

Anyway, below is a short screencast of the application where you can see me scribble Thaana letters quite clumsily using the touchpad on my laptop - let's call it a software/technology preview ;-)



[An alternative lower-quality version can be found on Youtube]

Latin Thaana Converter 2.0

Latin Thaana Converter is a small, simple software for Microsoft Windows that performs transliteration on latinized (i.e. romanized) Thaana to convert it back into the Thaana script. This is a tool I originally released in 2003 under the name "Latin Dhivehi Converter"/"Lat2Dhiv". This new release carries a new name (which I think is a more technically correct name for what it does) and sports a few aesthetic changes but is functionally almost exactly the same as the original - it is basically a recompile of my old code within the .Net framework.

Automated transliteration of Latin Thaana is not an entirely easy task. Look up table based algorithms are simple to implement but are unable to correctly handle cases of sukun, present issues with most other fili and generally have a host of other problems as well. Latin Thaana Converter utilizes a finite state machine and its transliteration mappings are based on a more extensive scheme extracted from an analysis of a body of Latin Thaana-to-Thaana sample data. It maybe worth mentioning that the analysis had revealed that upto 4 characters were being used (and needed) for some Thaana transliterations. However, it must be said that the quality of transliteration from this is limited by the accuracy and diversity of the sample data I had used and hence is by no means perfect.

Since writing this program in 2003, I have experimented with probabilistic FSMs and also put machine learning techniques to the task with better results. I plan to write more extensively on Thaana transliteration algorithms at a later time...

Usage

1. Copy-paste or type the Latin Thaana text into the "Text in Latin Thaana" box.
2. Click "Convert".
3. The converted text appears in the "Text in Thaana" box.

Download

- Latin Thaana Converter 2.0 Installer (126KB, MS Windows)
- Latin Thaana Converter 2.0 Executable only (22.8KB, MS Windows)

Hope someone finds it useful :-)

Javascript Thaana Keyboard version 4.1

I released Javascript Thaana Keyboard v4.0 only 10 days ago but I've since been made aware of a few bugs in the script that had gone unnoticed during testing back then. I decided to cut another release to fix those bugs which, although minor, could potentially be annoying to end-users. This new release also crams in a few tweaks to improve performance.

Changelog:

+ Fixed handling of Delete key and other special keys
+ Added correct handling for Thaana brackets "()"
+ Improved performance

Usage:

Usage remains same as before, so please refer to my post on the 4.0 release.

Demo:

Check out the demonstration and testing page here.

Download:

- full source version (10.6 KB)
- packed version (2.46 KB) [recommended]

Update (19-Dec-2008): This version is now superseded by release v4.2.

Javascript Thaana Keyboard version 4.0

Here is an update to my Javascript Thaana Keyboard (JTK). This 4.0 release packs in a bunch of new features, making JTK much more powerful and more flexible than any of the earlier releases.

Keyboard support:

Most notable on this new release is the introduction of support for the various different types of Thaana keyboard in use. JTK now supports the following keyboard layouts:

Phonetic (Segha version): This keyboard is perhaps the most popular Thaana keyboard layout. JTK identifies this keyboard as "phonetic".

Phonetic (Hassan Hameed version): This keyboard is similar to above but notably differs in its mapping of alifu, abafili, aabaafili, gaafu and the sukun. JTK identifies this keyboard as 'phonetic-hh'.

Typewriter: This is the standardized Thaana layout used on typewriters. JTK identifies this keyboard as 'typewriter'.

Browser support:

JTK 4.0 adds support for IE5.5, which has a very significant market share still. Hence JTK should now work perfectly well on Microsoft Internet Explorer 5.5+, Mozilla Firefox 1+, Opera 9+, Apple Safari 2+ and Google Chrome 0.1+.

Basic usage:

The basic usage allows for fast and easy integration of JTK on your Thaana web pages.

1. Link the file in the HEAD section of the page:

2. For any element accepting input (i.e. INPUTs, TEXTAREAs, content-editable DIVs), assign them the special class name "thaanaKeyboardInput". JTK will automatically handle text entry to any element with that class name. You can assign further classes to the elements without ill-effect, if needed.

3. There are two ways to set the keyboard used for an element.
defaultKeyboard method: This method allows setting a default keyboard to be used on all elements handled by JTK. To do this, add the following to the HEAD section of your web page but make sure it is added after the code inserted from step 1 above.

thaanaKeyboardState method: This method allows per element control on the type of keyboard used by an element handled by JTK. To do this, add a form element (can be a radio, checkbox, select, hidden or text field) with its name set to the text entry element id suffixed with the string "_thaanaKeyboardState". The value of these fields should specify either 'off', 'phonetic', 'phonetic-hh' or 'typewriter', indicating the status and the keyboard in use.

So, if you had a text entry field with the id "fullname" then the keyboard could be specified using a hidden field as follows:



4. Make sure that the text direction for the Thaana fields is set to "rtl". This can be easily achieved using CSS, by adding a class definition for the "thaanaKeyboardInput" class or by any other method of your choice. Adding the following to your CSS definition should suffice for most uses:
.thaanaKeyboardInput {
    font-family: faruma, 'mv iyyu nala', 'mv elaaf normal';
    direction: rtl;
}

If the above instructions are followed correctly, the JTK Thaana functionality would be in effect soon as the page has loaded!

Advanced Usage: The JTK object, methods and properties

To facilitate advanced integration functionality for developers looking to have (finer) control over its behavior, JTK now makes itself available as a public object named "thaanaKeyboard".

The following properties and methods exposed by the "thaanaKeyboard" object:

defaultKeyboard: [property] The Thaana keyboard layout type to default to when JTK enabled elements do not have a keyboard specified.
Valid values are: 'off' to keep Thaana disabled, 'phonetic' to use the standard phonetic layout, 'phonetic-hh' to use the phonetic layout by Dr. Hassan Hameed and 'typewriter' to use the typewriter layout.

setHandlerById ( id, action ): [method] Sets the state of the JTK handler for a page element.
The id argument should be a string containing the id of any content-editable element. The action argument should specify either "enable" or "disable" depending on whether input handling for Thaana should be enabled or disabled, respectively.

setHandlerByClass ( class, action ): [method] Sets the state of the JTK handler for a set of page elements.
The class argument should be a string containing the class name of any content-editable element (i.e input, textarea etc). The action argument should specify either "enable" or "disable" depending on whether input handling for Thaana should be enabled or disabled, respectively.

License:

JTK 4.0 is released under the MIT License, allowing its use in both personal and commercial applications as long as the copyright and license permission notice remains intact.

Demo:

Check out the demonstration and testing page here.

Download:

- original full source version (10.0 KB)
- packed version (2.33 KB) [recommended]

As always, drop a line here if you use it and/or have problems or suggestions. Enjoy. :-)

Update (31-Oct-2008): This version is now superseded by release v4.1.