Tamilnet'97- International Tamil Conference on the use of Tamil in Information Technology

Tamils - a Trans State Nation..

"To us all towns are one, all men our kin.
Life's good comes not from others' gift, nor ill
Man's pains and pains' relief are from within.
Thus have we seen in visions of the wise !."
- Tamil Poem in Purananuru, circa 500 B.C

Home

Whats New

Trans State Nation

One World

Unfolding Consciousness

Comments

Search

TamilNet'97

The urge for having a new Standard is arising from the usage of Computers to non-word processing requirements. The major segment where computers could be used massively is the Government sector, where, Indian languages (Tamil) could be used as medium in which databases could be maintained for various applications (requirements). The linguistic research Institutes also find interest in using computers for the linguistic study of the language. There has been a requirement to modify the existing ISCII Standard as it lacks proper sorting, indexing and character Identification in analysis. The purpose of this paper is to reflect the deficiencies in the draft ISCII Standard.This paper brings out the suitable modifications in the draft Standard. It mainly focuses on the codification of the basic characters of Tamil to overcome the problems.The Standardisation committee set up by the Tamil Nadu Government may take these issues to DoE and persuade them to rectify the deficiencies. If these views are not presented properly and at the earliest possible opportunity then the TAMIL language is likely to suffer from its setbacks and may be isolated from the Indian languages.

INTRODUCTION

All the Indian languages are believed to have been originated from the ancient Brahmi Script. All the Indian languages characters have varying shapes and forms, but have the common phonetic nature in sound, which is the basic for ISCII. The department of official languages and the department of electronics have been evolving Standards for character codes and keyboarding, which could cater to all Indian languages.

Even though these Standards are meant for all Indian languages, there is some discrepancies in the Standards which are not accommodating Tamil in full.

The current approach of DoE to have only Standard for Indian languages characters, leaving the display rendering and keyboarding mechanism to developers, seems to be a correct approach.

Initially, Tamil was used as merely sound to convey the feelings (unarchigal), and to communicate with each other. Later on a script (eluthu) is used to represent the sound. The script, after having undergone various changes at various stages (Kodugal, Pada eluthukkal, Vatteluthukkal, Chadhura eluthukkal) and is available in its present form. Today, we are discussing how best Tamil can be represented for the computer media. Again, the Tamil sounds are being coded not the script (eluthu).

Tamil language has (30) basic characters (sounds) and is denoted by 12 vowels (uyir eluthukkal) and 18 consonants (mei eluthukkal).

Consonants : k, ng, c, nj, t, n, th, N, p, m, y, r, l, v, zh, L, R, nn
Composite consonants (uyirmei eluthukkal ) are formed when vowel and consonants join together.

Note the order, vowels come after consonants and combine with consonants to form composite consonant. When a vowel comes after a consonant, it always joins the consonant and it is represented by a auxiliary sign called vowel sign. This way, the combination of 12 vowels and 18 consonants form 216 composite consonants.

Tamil number system is not same as English number system. Number '0' is not having any separate symbol. Numbers like 10,100,1000 etc., are having their own distinctive symbols. In practice, for e.g., in Government calendars we see these numbers are used as English numbers.

Sorting and Indexing is one of the basic necessities of the database management system. Let me explain in detail, so as to give the impact on the importance of sorting and indexing. For example, we have to release the list of candidates of some examinations. We have to list the names in the alphabetical order. This helps to locate a name very easily. This type of sorting is used in real life applications.

Tamil words and names are sorted using the Tamil alphabetical order. Therefore, it is important not to alter this ordering.

For the past few years the Department of Electronics (DoE) is sponsoring various projects using ISCII-91. Based on the requests for a revision in ISCII-91 from different developers, the DoE setup a committee in November 1996 to look into the problems faced by using the present ISCII-91 and to recommend the necessary revisions. The draft copy of ISCII - 97 is the outcome of the recommendations of this Committee.

Observation : The proposed ISCII code is based on the ANSI Standard (please note that there are differences in the ASCII and ANSI Standards). Windows based software follows ANSI while DOS follows ASCII with extended characters (called graphics characters).

The Phonetic nature of Indian languages : Based on the phonetic nature of the Indian languages, a common alphabet code is made possible. All vowels, consonants, graphic signs, punctuation marks, special symbols and extenders are coded. It provides a unique (common) encoding for all Indian languages.

ISCII Coding : The revised Indian Standard Code for Information Interchange (ISCII) is a common encoding for all Indian languages. Table 1 shows this encoding with ASCII.

The ISCII code represents or contains only the basic alphabets. All syllables are formed through the combination of these basic characters. The rendering of the shape of a character is the process of the rendering software.

The ISCII has provision for future expansion in terms of inclusion of new language or new vowels or consonants by providing special codes (VMI and VML for vowels CMI and CML for consonants). This gives extensibility to encode newer Indian languages that may get official recognition in future. With these special codes, the ISCII can be extended to support 12x3 = 36 vowels and 34x3 = 102 consonants.

Study on the ISCII : When we study the Standard in detail, we see the negligent care taken for Tamil. It puts one to think that Tamil is not considered as basic language, rather, it tries to accommodate the language as per the phonetic order instead of its own position. There exists some non-order of alphabets in general, which affects all Indian languages.

Ordering of alphabets : The ordering of basic syllables itself has to be changed to have proper sorting. The combination of Anuswar and Visarg with consonants always comes after the vowel and the consonant combinations. In the Standard, as Visarg and Anuswar is having higher precedence than the vowels itself, the sorted list will have the combination of Visarg and Anuswar with consonants in the beginning rather than at the end. The new order could be vowels, graphic signs, consonants and special symbols.

How Tamil is affected : The coding for Tamil is not as per the Tamil alphabetical order. The coding is affected just because of consideration of Sanskrit as a primary language. The Committee has tried to code the Tamil sounds as per the Sanskrit sounds. If you observe the ISCII table, surprisingly you will not find three Tamil characters. They are L, R, nn. These characters are supposed to be derived using other characters (l, r, n) with consonant extenders (special codes - for coding variations in consonant sounds differing in the pronunciation from Sanskrit). One can see that these characters were not considered as basic consonants but representation of sound borrowed from other languages and hence ordered as per the phonetic sound of Sanskrit. This not only affects the sorting, but also linguistic analysis of the language.

These characters have to be restored in its original position to have proper sorting and indexing.

Variants of varg consonants : When we observe the ISCII code, we see that the variants of primary varg consonants (k, c, t, th, p) are not coded in Tamil. In Tamil, even though all the varg consonants are represented by a single consonant in written form, the phonetic variance is retained. As the ISCII is phonetic based, and is a common code for all Indian languages facilitating Transliteration. I suggest to have codes for all the left out varg consonants.

Problems in transliteration : One of the good feature of ISCII is Transliteration. Transliteration is also affected by the ISCII code for he reasons mentioned below :-

Hence, it is not possible to achieve at least a simple minded 'one to one' transliteration.

In the interest of better understanding of the ISCII, let me discuss in more detail. ISCII is meant for adaptation of Indian languages on computers in general. Unfortunately, even for English the standards are at variance for different platforms.

The more popular and widely used IBM PC range of computers use different operating system, such as DOS and windows 95 (win3.x is not an operating system by itself, it relies on DOS). As, DOS being the living system, and continue to be so for some more years, our ISCII also have to cater to DOS.

We can't use the proposed ISCII in PCs using DOS, because of the difference in extended characters of ASCII and ANSI. Therefore it needs appropriate arrangement of ISCII for ASCII (with extended graphics characters). As a software developer for Indian languages I have observed few things while implementing ISCII-91 in my software. The different software running even in the same Operating System do not accept all the character codes. Even if it accepts, its interpretation will be different. Therefore , a study on the fairly good number of software will help us in deciding the codes.

Multilingual System : Indian language numbers are coded in the place of English numerals. There is no code defined for language numerals. Now a days we see the requirement of multilingual software or at least trilingual where two Indian languages will have to coexist along with English. This coding could be implemented only in systems like windows. Whereas, it is not possible to have multilingual facility and different number system in systems like DOS and DOS based application software.

Compatibility with DOS : Applications software developed for DOS in text mode relies heavily on graphics characters to have good looking screen designs. And also all leading software use graphics characters. Therefore, we can't have a common ISCII code for both ANSI and ASCII, and hence I would like to suggest that we have a different coding for ASCII also.

I request the Department of Tamil and Culture and the Tamil Nadu Government Standardising Committee to look into this subject and forward necessary changes to DoE.

In the interest of Tamil and Tamil people, I request the DoE to reconsider the draft copy and amend the required changes. If changes are not effected in the Standard there is every possibility that the Standard will remain just like a Standard, not used by Tamils. By accommodating the changes the ISCII is going to accommodate Tamil in the mainstream. If not, it may construe that Tamil language and its cherished richness are denied the just consideration.

Designation : Chief Executive Officer -
Organisation : APPLESOFTEmail address : [email protected]
Address for Paper Mail : No.39, 1st Cross, 1st Main, Shivanagar, W.O.C. Road, BANGALORE - 560 010.
Telephone : 080 3357167 (Office) 080 3424765 (Residence)

Technology Developed: (1) A graphical interface for DOS to have Indian languages on DOS.

Software Developed : Developed a series of software to cater to various Indian languages requirements, as listed below.