Quantified Grapho-Phonemic Systematicity in Korean Hangeul

,


Introduction
Hangeul, the Korean orthography, is renowned for the availability of information about its origins. It is the only orthography that a king himself designed for the illiterate among his people. Named Hunmin Jeongeum, the Standard Sounds for the Instruction of the People, 28 letters were created in 1443 and promulgated in 1446. Hangeul has been highly appreciated by linguists and others worldwide. It has been dubbed "the most scientific system of writing" (Reischauer & Fairbank, cited in Hyun, 1981) and "unquestionably one of the great intellectual achievements of humankind (Sampson, 1985). The reasons for these commendations are: (i) it is orthographically shallow; (ii) it expresses fine phonemic distinctions; (iii) its letter shapes visualize articulation; and (iv) its letter shapes are consistent with the corresponding phonemes. Orthographic depth is defined by the extent to which the letter-sound association is transparent and predictable (Seymour et al., 2003). Shallow orthography facilitates reading (Martin et al., 2016;Paulesu et al., 2001;Spencer, 2001). It is less effortful (Paulesu et al., 2000). Dyslexics experience fewer difficulties when reading shallow orthography than reading deep orthography (Paulesu et al., 2001). Hangeul is a shallow orthography, along with Finnish, Italian, and Turkish.
Given the unique compositionality of hangeul, there has been little or no attempt to quantify the relation between letters and phonemes. Can this be done? Is letter-sound systematicity in hangeul indeed greater than those in other orthographies?

Procedure
Our grapho-phonemic analysis follows the principles of phono-semantic research (Dautriche et al., 2017;Monaghan et al., 2014;Tamariz, 2008). We measured all possible pairwise visual distances between letter shapes and all the corresponding pairwise phonological distances between phonemes. Then we measured the correlation between these two long lists of corresponding distances. Finally, to verify its statistical significance, we conducted a Monte-Carlo permutation test, as in the literature on sound-meaning systematicity.
We expected hangeul to return a robust, positive correlation between orthographical distances and phonological distances, considering the principles of its creation. A positive correlation indicates that similar letters tend to have similar sounds. We also expected that the level of systematicity in hangeul would be higher than in other less insightfully created orthographies.
The phonological distance between two phonemes was defined as the distance between their vectors (Monaghan et al., 2010). We used different distance measures to ensure robustness: Feature edit distance, the number of different features between two vectors; Euclidean distance, the shortest geometric distance between two vectors; Cosine distance, the angle made by the two vectors; and Jaccard distance, the number of shared features divided by the total number of features. The first two were used by Monaghan et al. (2014).

Stroke Share Rate
Comparing salient sub-letter features to measure the visual difference between letters is not a new idea (Briggs & Hocevar, 1975;Geyer & DeWald, 1973;Watt, 1979). However, there is little or no research on hangeul letters from this perspective. We designed a novel method specifically for hangeul. First, we decomposed the letters into strokes and defined them topologically ( Figure 1). We then re-defined each letter as a binary vector (Table 2). Thus, the distance between two letters now equals the distance between two 19-place vectors (12 places for the consonants, 7 for the vowels).
As with phonological distances, the orthographic distances were measured by four different metrics: feature edit distance, Euclidean distance, Cosine distance, and Jaccard distance.

Hausdorff Distance
We present stroke share rate, above, as a point of comparison with the detailed quantitative measure of Hausdorff distance (Huttenlocher et al., 1993). Unlike the hangeul-specific stroke share rate, Hausdorff distance can be applied to any script system because it treats the letters as images. It converts the image into a black and white raster graphic. Given two sets of black pixels, X = {x 1 , … x n } and Y = {y 1 , … y n }, the directed Hausdorff distance is calculated as follows: ( 1) where Euclidean distance measures the distance between two individual points, |x-y|. Being fundamentally asymmetric (d(X, Y) ≠ d(Y, X)), the larger value between the two (max) is returned. Because Hausdorff distances recognizes letters as images, different fonts return different values. We examined 88 available Korean fonts. Scipy.spatial. distance.directed_hausdorff (ver. 1.3.1) was implemented on Python 3.6.1 (Note 1).

Results
We calculated the correlation between the two lists of corresponding visual and phonological distances using Pearson"s r, to quantify grapho-phonemic systematicity. The results were separately presented below according to the orthographic distance measure. Table 3 shows the correlation coefficients (Pearson"s r) between the orthographic distances and the phonological distances, when stroke share rate was used to measure the orthographic distances.

Grapho-Phonemic Systematicity (Stroke Share Rate)
Positive correlation coefficients in general mean that similar letter shapes tend to have similar sounds, or vice versa, which quantitatively confirms the principle based on which hangeul was created. Very low p-values indicate the significance of the statistical analysis.

Grapho-Phonemic Systematicity (Hausdorff Distance)
The orthographic distances were also measured by Hausdorff distance. Table 4 shows the grapho-phonemic systematicity from 88 Korean fonts. The majority displayed very significant correlations between letters and sounds although the correlation coefficients are not as high as those in Table 3. The results are robust across the phonological distance measures. We further investigated the level of contribution of each letter to the whole systematicity by excluding each letter in turn and re-conducting the correlation test. Table 5 shows that when the overall correlation was .3, removing individual letters increased or decreased the correlation accordingly. The consonants individually tend to contribute positively to the whole grapho-phonemic systematicity, whereas the vowels tend to hinder it. For example, without ㄸ and ㅃ, the correlation decreased to r = .27 (p < .001) whereas excluding ㅡ increased the coefficient to r = .4 (p < .001).

Discussion
Artificially designed with an explicit pedagogical aim, hangeul has a widely known intrinsic systematicity between letter shapes and their pronunciations. We successfully quantified its systematic relation between Korean letter shapes and their sounds and defined it as grapho-phonemic systematicity. Predictably, stroke share rate returned the highest correlation values; strokes reflect higher-level, consciously appreciated structure. However, Hausdorff distance is more cross-linguistically applicable; although it returned slightly reduced correlation values, it was still robust and has the advantage of being able to reveal unappreciated contributions to systematicity. Hangeul, the result of deliberate cultural invention, is the gold standard of grapho-phoneme systematicity among scripts.
In the most recent finding, Chinese characters also showed a positive syllable-character systematicity (Du et al., 2022;Jee et al., 2022b). However, none of them showed higher systematicity than hangeul.
There are two ways to vary letter shape consistently at the level of the whole alphabetic system: (i) add or subtract letter elements; or (ii) change orientation of the identical letter. In conventional orthographies (Hebrew, Burmese, Runic, and even cuneiform) the former is preferred. In some artificial scripts (e.g., the Shavian alphabet) the latter also occurs. There seem to be several reasons for this, all concerned with kinetic efficiency and least effort (Zipf, 1949(Zipf, /2016. First, adding a stroke or dot may be kinetically easier than changing orientation. Efficiency is particularly important for high frequency letters. Just as frequent phonemes have reduced distinctiveness (Gahl et al., 2012;Meylan & Griffiths, 2017;Shi et al., 1998), we assume that high frequency letters have simpler letter shapes. This selective pressure was realized as diacritics, for example, in Arabic and Hebrew vowels and even omission of vowels in the unpointed script.
Second, changing orientation affects the general direction of letter faces of an alphabet set (Watt, 1994). Facing direction is defined as the direction of ornaments and headings: for example, Arabic numbers mostly face left. Watt (1994) claims that we are sensitive to the particular asymmetry of the set of letters; children often reverse "b" until they understand that asymmetric Roman letters generally face rightwards.
Finally, letter shapes have implications for writing position and writing time. With pens and pencils, there exist opposite pressures between the three writing fingers and the two supporting fingers. Depending on writing direction, these two pressures alternate the active and passive roles (Watt, 1994). Therefore, it is plausible that cursive scripts optimize the balance between these kinetic forces for the sake of writing speed. Orientational change of letters may hinder writing speed by varying the starting points of letters.
Some fonts returned higher correlation coefficients than others, which indicate that they may emphasize the phonemic regularity. This implies there are pedagogical implications for beginning readers. We are currently investigating the behavioural consequences of grapho-phonemic systematicity at the letter level.

Conclusion
We have developed what we take to be the first method for quantifying in a detailed way the systematicity between letter shapes and their corresponding sounds. The method is general enough to be applied to any phonographic orthography. It compares well with a hangeul-specific method based on shared strokes. The method allows us to begin studying the behavioural consequences of grapho-phonemic systematicity, with hangeul emerging as the writing system in which this systematicity is clearest-the gold standard of systematicity.