Size of the manuscript collections

The Archive comprises 1455 manuscript collections of language material, whose size exceeds 280.000 pages. They contain oral dialectal speech. Their collection began a century ago, however some of these manuscripts were written during the second half of the 19thcentury and the language they depict is even older. The Archive is constantly being enriched and nowadays it is expanding with the addition of approximately 10 collections annually.

Origin of the manuscripts

Based on their evaluation order, the manuscripts originate from:

  • the annual assignments of oral speech transcription undertaken by the researchers of the Center who have participated in the project of the Historic Dictionary since 1908. These manuscripts include scientifically documented language material and constitute the richest extant evidence and heritage for the Modern Greek dialects both in Greece and abroad.
  • donations by “The Language Society in Athens” which proclaims competitions for collection of dialectal material and subsequently donates it to the Center.
  • the donation of 80 manuscripts by the “Greek Philological Association of Constantinople” (which has donated 191 manuscripts in total, 80 of which have been copied and are included in the Archive of the 1455 manuscripts used by ILNE. The remaining 111 manuscripts are dilapidated, they are not included in the manuscripts used by ILNE and have been stored in a special Archive for protection purposes).
  • amateur collectors, mainly teachers or other intellectuals, who register in the idiom customs and traditions, toponyms, information, stories, fables, songs and in general language monuments of their region. Often this material may not meet the required scientific specialisation and reliability standards, however it has been collected with love and unselfishness.


Significance of the material rescued in the Manuscript Archive

  • The Archive comprises transcripts of oral speech from all areas within Greece and abroad where the Greek language was or is still spoken.
  • It covers the Modern Greek language in its standard form and mainly its dialectal variations over the last 170 years.

Current status and rescue projects of the Manuscript Archive

The oldest manuscript of ILNE was written in 1854, while 103 manuscripts date back to the 19thcentury. The visible danger of the destruction of the archive due to the delicacy of the paper, the age of the manuscripts and its frequent use calls for its rescue and preservation. In the ‘70s an effort was made to rescue the material by photographing it. An archive of films was created which includes approximately 650 manuscripts of the Archive. In recent years, the efforts for rescuing and preserving the material have turned to the deployment of new technologies. Nowadays an effort is being made to gradually digitise the whole material and to ameliorate the condition of the originals. In addition, the manuscripts are conversed to electronic text files.


Content of the Card Archive

The Card Archive includes the registered language material of the manuscript collections in addition to dialectal material by other sources (magazines, glossaries, studies, literary texts etc). Every card includes the form of the word (e.g. γ’ρούν’),the region where the form is used and the lemma to which it belongs (e.g. γουρούνι). The source is also cited (e.g. number and page of the manuscript, detailed bibliographic references in case of a printed source etc.). Usually the card contains additional information pertaining to the etymology, meaning and use of the word as well as typical examples of use.

The cards are ordered alphabetically per lemma. The form-lemma γουρούνι, for instance, includes all dialectal forms of the word, that have been recorded (e.g. ’γ’ρούν’, γουρούν’, ’ουρούν’, γκουρούνι etc).

Size of the Card Archive

The operation of the Card Archive started with the foundation of the Center and comprises three parts:

  1. the part where the printed lemmas of the Dictionary have been archived (α-δαχτυλωτός), which includes approximately 325.000 cards.
  2. the Annex, where complementary material for the printed version has been archived, which was collected after their publication. It counts approximately 400.000 cards and is constantly being enriched.
  3. the main part, which includes the lemmas from “δε” onwards. It includes approximately 3.000.000 cards and is constantly being enriched with new material.


Condition and digitisation of the Archive Card

The only way to rescue the Archive is the digitisation of the cards, however scanning and documenting the Archive is not an easy process. A number of difficulties are encountered primarily due to the heterogeneous size of the cards, the variety of the paper quality, the handwritten forms and the vast variety of diverse handwriting styles, information coding and -most importantly- the ravages of time. It is characteristically reported that many of the older cards, particularly those registered during the German occupation, have been written in small pieces of paper, in newspaper margins, in voting papers, in already printed paper etc.



A digital toponymic database has been created in the Archive of Toponyms and Proper Nouns since 1984, in which the toponyms appearing in the language manuscript collections of the Center (approximately 200.000) have been integrated. This material has not been published and constitutes perhaps the largest collection so far, while it is being enriched with 3.000 toponyms per year. Until now approximately 46.000 toponyms have been entered in the database. One of the future actions involves the addition of toponyms appearing in written sources in order to create a thorough toponymic corpus.



A number of various magnetic means (such as reel tapes as well as audio and video cassettes) are preserved by the Center, which include recordings of dialectal speech and are derived by the annual assignments of the staff and other resources. There are also some 78 rpm vinyl records from the ‘30s which include recordings of dialectal speech, however these records have been damaged and an effort is made to rescue their content. The duration of this audio material is hundreds of hours.