E-Processing of Historical Manuscripts Collections : A Project of the French Research Organization CNRS

We focus in this paper on a CNRS (Centre National de la Recherche Scientifique, France) project that involves a collection of texts more than 60 years old. This project is the Sources Chrétiennes collection, which assembles in 520 volumes various series of most valuable texts: early Christian, Greek Byzantine, medieval Latin, etc. The content of the texts is theological/historical or even philosophical. The collection is the only bilingual one among the various similar collections. That is, each manuscript is translated in French, no matter what the original language was. We can state, without reservation, that this project is unique —not only amongst French studies— but from a global perspective. Among the various innovative results of our work, we should note in particular the structural analysis of the collection, the hierarchical permutation of the collection’s content, the reordered representation of related information, etc. The information contained in this series is thus displayed in a way that demonstrates its great worth and impact on French and European/international studies.


Introduction
Sources Chrétiennes is a multilingual french editorial project under the auspices of CNRS (Centre National de la Recherche Scientifique) (Note 1).It consists of more than 500 volumes.The Sources Chrétiennes collection is one of the most important and respected worldwide collections regarding the critical edition of manuscripts of theological sources.Consequently it incorporates text-milestones of the Christian literature as well as of the human spirit in French (Mondesert, 1988).That is, all projects are always translated in French.The basic languages of prototype manuscripts are Greek, Latin, Armenian, Syrian etc.
The collection ranks among other collections (such as the Patrology of Migne (Migne, 1857(Migne, -1866)), Philokalia (Kalliakmanis, 2009) of St. Nikodimos from the Holy Mountain).It is being gradually completed and its development so far covered 65-years running (Table 1-a) The rate of progress of the series is shown in Fig. 1.This is the picture of the pace of the work per year.Each period has its indicative average progress.The overall average is about 10 (volumes / year) [straight line (Fig. 1)].

Manuscripts and feature elements
Prior to proceed to enumerate particular collection's features, it is necessary to set out the key elements of the manuscripts in the Sources Chretiennes anthology.These are also the collection's coordinates, namely: Author name -Text Title -Volume Number which are the fundamental attributes of a database too.
We now focus on texts of the Sources Chrétiennes collection written in Greek language.
It is therefore appropriate to develop a kind of statistical analysis which reveals the inherent structure of the entire collection.Initially we should note that there are five fundamental areas of interest (The Sources Chrétiennes Collection, 1943-).The key feature of each region is briefly identified respectively as: Analysis of the statistical distribution of the collection's volumes among the aforementioned periods is indicated in the Tables 1-a and 1-b, regarding the first 500 volumes.

Topology of the project
We now present the modular backbone of the collection which is equivalent to the citation of the concise project's components.We characterize as concise those elements which describe the complete work in the shape of anchors or 'topographical signs'.Such elements (by volume) are given in column 'Field' in Table 2.
Thus the fundamental elements of the project may construct a DataBase (DB), whose distinct fields are these elements.A scheme of this DB (Myridis, 2009) can be defined.
The DB is formed by implementations of the Cartesian product of its fields.Thus the nine remaining fields (omitting the date of writing) form the graph (Gross & Yellen, 2004) shown in Fig. 2 2, where L stands for the length of strings (names, titles etc.).In the fields 'author', 'area ' and 'project category' an indirect redirection (lookup table) between numbers and strings shall be performed.The field value of 'project category' is infinite ( f ) as we assume an infinite number of possible thematic subjects.
We thoroughly identify the values of fields in Table 2.We also incorporate a second column in Table 2, wherein the specific fields' values are depicted, regarding Sources Chrétiennes collection.
The 'project category' value for the Sources Chrétiennes case is 65, equal to the number of distinct thematic categories in the collection (Table 4).

E-Keyword
The work performed during this study leads to the construction of an electronic 'keyword' (eKeyword) for the Sources Chrétiennes collection (Note 2).An excerpt from the printed version of this e-Keyword is shown in Table ǹ [Appendix] (in alphabetical authors ranking).The formation of such a (digital) table gives the best resource for the undertaking of contemporary research relating to the statistical and structural organization of the series.This is particularly important.Indeed the content and functionality of the collection can be better evaluated.Moreover the complete operation of this multi-annual critical editorial work may be understood in depth.We indicatively present some of the main results which are the effects of hierarchical classification in the e-Keyword.
It should be noticed that, both the analyses referred to hereinafter, as well as those not listed (e.g. for reasons of space), are significantly accelerated and particularly facilitated, by using the e-Keyword and, generally speaking, by using Information Technology (IT) resources.

Greek authors in Sources Chrétiennes
The table of the published volumes until now (1-519) indicates that there are 61 authors of Greek texts in this collection.We now cite the final list of authors in Greek Literature (Table 3) (Note 3).

Analysis of Subjects
It is certainly difficult to define precisely all the various subjects dealt with by the Greek library of Sources Chrétiennes.However, in a general effort to analyse and determine the subject matter of Greek Literature we find that there are, at least, three hundred (300) different subjects in the Greek texts of this collection.
The reader could find more on e-processing of texts collections in Bibliography (Myridis, 2006).

Subjects & Conclusion
We have previously reported about 300 subjects in Greek texts in the Sources Chretiennes collection.Given the excellent facility provided by the e-Keyword, we construct a general thematic classification of subjects with which the Greek manuscripts of Sources Chretiennes deals.This classification is given in Table 4.We observe that about 65 different topics can be identified.Further analysis and consequent formulation of new categories certainly results in considering a larger table of thematic units.Note 4. The symbol > is used in order to declare a very large number, while >> declares a huge number.

Table A .
An excerpt of the printed version of the e-keyword of Sources Chrétiennes

Table 2 .
Generalized ranges of values for the fields of Sources Chrétiennes database (Note 4)

Table 4 .
Thematic catalogue of Greek Literature in Sources Chrétiennes