"To us
all towns are one, all men our kin. |
Home | Whats New | Trans State Nation | One World | Unfolding Consciousness | Comments | Search |
Home > Tamil Digital Renaissance > Tamilnet'97 > ISCII And Tamil - A Perspective - N. Anbarasan
ISCII And
Tamil - A Perspective
N. Anbarasan
Abstract
The urge for having a new Standard is arising from the usage
of Computers to non-word processing requirements. The major segment where
computers could be used massively is the Government sector, where, Indian
languages (Tamil) could be used as medium in which databases could be maintained
for various applications (requirements). The linguistic research Institutes also
find interest in using computers for the linguistic study of the language. There
has been a requirement to modify the existing ISCII Standard as it lacks proper
sorting, indexing and character Identification in analysis. The purpose of this
paper is to reflect the deficiencies in the draft ISCII Standard.This paper
brings out the suitable modifications in the draft Standard. It mainly focuses
on the codification of the basic characters of Tamil to overcome the
problems.The Standardisation committee set up by the Tamil Nadu Government may
take these issues to DoE and persuade them to rectify the deficiencies. If these
views are not presented properly and at the earliest possible opportunity then
the TAMIL language is likely to suffer from its setbacks and may be isolated
from the Indian languages.
INTRODUCTION
All the Indian languages are believed to have been originated from the ancient Brahmi Script. All the Indian languages characters have varying shapes and forms, but have the common phonetic nature in sound, which is the basic for ISCII. The department of official languages and the department of electronics have been evolving Standards for character codes and keyboarding, which could cater to all Indian languages.
Even though these Standards are meant for all Indian languages, there is some discrepancies in the Standards which are not accommodating Tamil in full.
The current approach of DoE to have only Standard for Indian languages characters, leaving the display rendering and keyboarding mechanism to developers, seems to be a correct approach.
TAMIL SCRIPT
Initially, Tamil was used as merely sound to convey the feelings (unarchigal), and to communicate with each other. Later on a script (eluthu) is used to represent the sound. The script, after having undergone various changes at various stages (Kodugal, Pada eluthukkal, Vatteluthukkal, Chadhura eluthukkal) and is available in its present form. Today, we are discussing how best Tamil can be represented for the computer media. Again, the Tamil sounds are being coded not the script (eluthu).
Tamil Alphabets :
Tamil language has (30) basic characters (sounds) and is denoted by 12 vowels (uyir eluthukkal) and 18 consonants (mei eluthukkal).
Vowels : a, A, i, I, u, U, e, E, ai, o, O, au
Consonants : k, ng, c, nj, t, n, th, N, p, m, y, r, l, v, zh,
L, R, nn
Composite consonants (uyirmei eluthukkal ) are formed when vowel and consonants
join together.
For example : ka = k + a , ki = k + i etc.
Note the order, vowels come after consonants and combine with consonants to form composite consonant. When a vowel comes after a consonant, it always joins the consonant and it is represented by a auxiliary sign called vowel sign. This way, the combination of 12 vowels and 18 consonants form 216 composite consonants.
Tamil script is being taught, in schools based on this method only.
Number system :
Tamil number system is not same as English number system. Number '0' is not having any separate symbol. Numbers like 10,100,1000 etc., are having their own distinctive symbols. In practice, for e.g., in Government calendars we see these numbers are used as English numbers.
Sorting and Indexing :
Sorting and Indexing is one of the basic necessities of the database management system. Let me explain in detail, so as to give the impact on the importance of sorting and indexing. For example, we have to release the list of candidates of some examinations. We have to list the names in the alphabetical order. This helps to locate a name very easily. This type of sorting is used in real life applications.
Tamil words and names are sorted using the Tamil alphabetical order. Therefore, it is important not to alter this ordering.
ISCII STANDARD
A Brief History :
For the past few years the Department of Electronics (DoE) is
sponsoring various projects using ISCII-91. Based on the requests for a revision
in ISCII-91 from different developers, the DoE setup a committee in November
1996 to look into the problems faced by using the present ISCII-91 and to
recommend the necessary revisions. The draft copy of ISCII - 97 is the outcome
of the recommendations of this Committee.
Observation : The proposed ISCII code is based on the ANSI Standard (please note that there are differences in the ASCII and ANSI Standards). Windows based software follows ANSI while DOS follows ASCII with extended characters (called graphics characters).
The Phonetic nature of Indian languages : Based on the phonetic nature of the Indian languages, a common alphabet code is made possible. All vowels, consonants, graphic signs, punctuation marks, special symbols and extenders are coded. It provides a unique (common) encoding for all Indian languages.
ISCII Coding : The
revised Indian Standard Code for Information Interchange (ISCII) is a common
encoding for all Indian languages. Table 1 shows this encoding with ASCII.
Table 1 ISCII-97 along with ASCII
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
A |
B |
C |
D |
E |
F |
|
0 |
NUL | SOH | STX | ETX | EOT | ENQ | ACK | BEL | BS | HT | LF | VT | FF | CR | SO | SI |
1 |
DLE | DC1 | DC2 | DC3 | DC4 | NAK | SYN | ETB | CAN | EM | SUB | ESC | FS | GS | RS | US |
2 |
SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3 |
0/0 | 1/1 | 2/2 | 3/3 | 4/4 | 5/5 | 6/6 | 7/7 | 8/8 | 9/9 | : | ; | < | = | > | ? |
4 |
@ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5 |
P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6 |
` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7 |
p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | DEL |
8 |
||||||||||||||||
9 |
||||||||||||||||
A |
Om | VM1 | VM2 | Chan-dra bindu |
Anus-war |
Visa-rg | Ayud-ham | a | A | I | I | u | U | ru | Ru | |
B |
lu | lRu | e | E | ai | o | O | au | CM1 | CM2 | ka | Ka | ga | Ga | nga | cha |
C |
Cha | ja | Ja | nja | ta | Ta | da | Da | Na | tha | Tha | dha | Dha | na | pa | pha |
D |
ba | bha | ma | ya | ra | la | La | va | Sha | sha | sa | ha | dot | avag-rah | vira-m | hala-nt |
E |
add-ak | |||||||||||||||
F |
The ISCII code represents or contains only the basic alphabets. All syllables are formed through the combination of these basic characters. The rendering of the shape of a character is the process of the rendering software.
The ISCII has provision for future expansion in terms of inclusion of new language or new vowels or consonants by providing special codes (VMI and VML for vowels CMI and CML for consonants). This gives extensibility to encode newer Indian languages that may get official recognition in future. With these special codes, the ISCII can be extended to support 12x3 = 36 vowels and 34x3 = 102 consonants.
Study on the ISCII :
When we study the Standard in detail, we see the negligent care taken for Tamil.
It puts one to think that Tamil is not considered as basic language, rather, it
tries to accommodate the language as per the phonetic order instead of its own
position. There exists some non-order of alphabets in general, which affects all
Indian languages.
Ordering of alphabets :
The ordering of basic syllables itself has to be changed to have proper sorting.
The combination of Anuswar and Visarg with consonants always comes after the
vowel and the consonant combinations. In the Standard, as Visarg and Anuswar is
having higher precedence than the vowels itself, the sorted list will have the
combination of Visarg and Anuswar with consonants in the beginning rather than
at the end. The new order could be vowels, graphic signs, consonants and special
symbols.
How Tamil is affected : The coding for Tamil is not as per the Tamil alphabetical order. The coding is affected just because of consideration of Sanskrit as a primary language. The Committee has tried to code the Tamil sounds as per the Sanskrit sounds. If you observe the ISCII table, surprisingly you will not find three Tamil characters. They are L, R, nn. These characters are supposed to be derived using other characters (l, r, n) with consonant extenders (special codes - for coding variations in consonant sounds differing in the pronunciation from Sanskrit). One can see that these characters were not considered as basic consonants but representation of sound borrowed from other languages and hence ordered as per the phonetic sound of Sanskrit. This not only affects the sorting, but also linguistic analysis of the language.
These characters have to be restored in its original position
to have proper sorting and indexing.
Variants of varg consonants :
When we observe the ISCII code, we see that the variants of primary varg
consonants (k, c, t, th, p) are not coded in Tamil. In Tamil, even though all
the varg consonants are represented by a single consonant in written form, the
phonetic variance is retained. As the ISCII is phonetic based, and is a common
code for all Indian languages facilitating Transliteration. I suggest to have
codes for all the left out varg consonants.
Problems in transliteration :
One of the good feature of ISCII is Transliteration. Transliteration is also
affected by the ISCII code for he reasons mentioned below :-
Hence, it is not possible to achieve at least a simple minded 'one to one' transliteration.
General Discussion on ISCII
In the interest of better understanding of the ISCII, let me discuss in more detail. ISCII is meant for adaptation of Indian languages on computers in general. Unfortunately, even for English the standards are at variance for different platforms.
The more popular and widely used IBM PC range of computers use different operating system, such as DOS and windows 95 (win3.x is not an operating system by itself, it relies on DOS). As, DOS being the living system, and continue to be so for some more years, our ISCII also have to cater to DOS.
We can't use the proposed ISCII in PCs using DOS, because of the difference in extended characters of ASCII and ANSI. Therefore it needs appropriate arrangement of ISCII for ASCII (with extended graphics characters). As a software developer for Indian languages I have observed few things while implementing ISCII-91 in my software. The different software running even in the same Operating System do not accept all the character codes. Even if it accepts, its interpretation will be different. Therefore , a study on the fairly good number of software will help us in deciding the codes.
Multilingual System :
Indian language numbers are coded in the place of English numerals. There is no
code defined for language numerals. Now a days we see the requirement of
multilingual software or at least trilingual where two Indian languages will
have to coexist along with English. This coding could be implemented only in
systems like windows. Whereas, it is not possible to have multilingual facility
and different number system in systems like DOS and DOS based application
software.
Compatibility with DOS :
Applications software developed for DOS in text mode relies heavily on graphics
characters to have good looking screen designs. And also all leading software
use graphics characters. Therefore, we can't have a common ISCII code for both
ANSI and ASCII, and hence I would like to suggest that we have a different
coding for ASCII also.
Conclusion
I request the Department of Tamil and Culture and the Tamil Nadu Government Standardising Committee to look into this subject and forward necessary changes to DoE.
In the interest of Tamil and Tamil people, I request the DoE to reconsider the draft copy and amend the required changes. If changes are not effected in the Standard there is every possibility that the Standard will remain just like a Standard, not used by Tamils. By accommodating the changes the ISCII is going to accommodate Tamil in the mainstream. If not, it may construe that Tamil language and its cherished richness are denied the just consideration.
Name : N. ANBARASAN
Designation : Chief Executive Officer -
Organisation : APPLESOFTEmail address : [email protected]
Address for Paper Mail : No.39, 1st Cross, 1st Main, Shivanagar,
W.O.C. Road, BANGALORE - 560 010.
Telephone : 080 3357167 (Office)
080 3424765 (Residence)
Technology Developed: (1) A graphical interface for DOS to have Indian languages on DOS.
Software Developed : Developed a series of software to cater
to various Indian languages requirements, as listed below.
SURABHI - 'Software only' solution for all Indian languages on DOS for text based software.
SIP - SURABHI Inscript Processor, a bilingual user friendly Inscript Word Processor having Wordstar compatible commands.
SUBASE VER 2.00 - Bilingual, general purpose database management software. It is software only solution.
SURABHI GEM - Bilingual and Multilingual software working on DOS, Ventura.
SURABHI SDK - Software Development Kit, to develop any bilingual Indian language software.
SURABHI UTILS - For day to day needs of the computer users.
SURABHI PRO - An interface software for Windows and Windows 95 to have all the facilities available.
AKSHARAM - Regional language learning Tutor.
JANANI - Interactive user friendly software to learn vernacular typing.