Corpus
Text Types in the Corpus subsystem
The Corpus subsystem contains texts of the following types.
Type [T]. Poetic texts which are translations into Russian of (usually poetic) originals written in French, Italian, Spanish or Portuguese (other Romance languages are beyond the scope of the project at the first stage). By translations we mean, among other things, works that somehow reproduce a non-Russian-language original, yet do not necessarily meet todayʼs criteria of fidelity to the source (free translations, imitations, etc.).
We will be concerned with texts that have served as models for Type T (Russian-language) texts. Three types of such models are distinguished:
Type [O]. Non-Russian-language originals of Russian-language texts (Type T).
Type [I]. Intermediary translations, either poetic or prosaic (if the text was not translated directly from the original; cf. translations from Italian, etc. through French; a Russian translation may be an intermediary for another Russian translation).
Between [T] and [О] there can be more than one [I]. Intermediaries can be included into the [О]–[I1 ... In]–[T] chain both “concurrently” (which means that the author of the translation used several intermediaries at once) and “serially” (the intermediary translation itself is based on an earlier intermediary translation).
Type [S]. “Proto-sources”, i.e. non-Romance sources of Romances originals (if they are, in their turn, imitations or translations). These could be e.g. Ancient Greek or Roman texts (e.g. many of La Fontaineʼs fables are adaptations of Aesopʼs fables).
At the subsequent stages of the project, in addition to translations, i.e. works that reproduce the content of a non-Russian model and (in many cases, but not always) its form, we will be interested in works reproducing only the form of a non-Russian model:
Type [X]. “Non-translated” poetic works in Russian, having a stanzaic form of Romance origin:
- distinctively Romance stanzas (tercet, ottava rima, Gilbertʼs stanza, Ronsardʼs stanza...);
- distinctively Romance “fixed forms” (sonnet, rondeau, triolet ...).
While the models for the “translated” work created under Romance influence are constituted by the original, the intermediary translation(s) and the proto-source, the model for the “non-translated” work created under Romance influence is an entire class of metrical and stanzaic models, represented by a particular stanzaic scheme and the scheme of a particular poetic meter.
Languages
The language of a [T] type text may only be Russian.
The language of a [O] type text may be French, Italian, Spanish or Portuguese.
The language of an [I] type text may be any language (French, English, German, Polish, Russian...).
The language of an [S] type text may be any language (Ancient Greek, Latin, Hebrew, Arabic...).
Poetry and prose
Our study is concerned with linguo-poetic forms, viz. Russian poems and their sources. Why do we need prosaic translations? Without them, we would not be able to establish dependency chains. Oftentimes, intermediary translations are prosaic (this is the norm for nineteenth- and twentieth-century French translations); moreover, sometimes prosaic translations into Russian can serve as intermediary translations themselves. In addition, there are historically and culturally important instances of prose translated into verse, e.g. Trediakovskyʼs “Telemakhida” (a translation of Fénelonʼs prose novel “Les Aventures de Télémaque”), Batiushkovʼs “Istochnik” (a translation of Parnyʼs poème en prose titled “Le Torrent”). Thus, some originals are in fact prosaic. Finally, some cultural epochs feature mixed speech forms (e.g., the eighteenth century in French literature): a prosaic narrative with poetic insertions (the converse is possible as well: a poetic narrative with prosaic insertions).
Metrical and Stanzaic Forms
Metrics describe the properties of the poetic verse (verse meters) and are constituted by two parameters: “Meter” and “Line Length”. Stanza refers to the properties of verse sequences (i.e. properties of stanzas and fixed forms) and is constituted by three parameters: “Stanza or fixed form”, “Clausula” (the sequence of line endings in a stanza or a stanza-like pattern) and “Rhyme” (the rhyme sequence in a stanza or a stanza-like pattern). Other aspects of catalexis and rhyming do not apply to the stanza.
The metrical and stanzaic structure of the work is described in the Corpus subsystem using distinctive features of poetic meters (an approach similar to that implemented in the Poetic Subcorpus of the National Corpus of the Russian Language, see http://ruscorpora.ru/mycorpora-poetic.html). The following features are taken into account:
- 1. Meter. The nomenclature:
- (i) Syllabic-accentual verse:
- trochee
- iamb
- dactyl
- amphibrach
- anapaest
- ternary meter with variable anacrusis
- (ii) Accentual verse and verse with variable inter-ictus intervals:
- dolnik
- taktovik
- accentual verse
- (iii) Logaoedic verse:
- logaoed
- hexameter
- (iv) Other versification systems:
- vers libre
- syllabic verse
- quantitative verse
Explanations on paragraph 1 (on the nomenclature of meters)
Both logaoedic lines and logaoedic stanzas are referred to as “logaoeds”.
The meter of all types of tonic verse is defined as “accentual verse”.
The meter of all types of syllabic verse is defined as “syllabic verse”.
Unrhymed unmetrical verse is defined as “vers libre”.
The meter of quantitative verse (e.g., of ancient proto-sources) is defined as “quantitative verse” and will not be specified any further at the first stage of the project.
- (i) Syllabic-accentual verse:
Line length in the “counting units” for the particular versification system (feet, ictus, syllables). We distinguish monometric texts (with constant line length), regular heterometric texts (with regular alternation of lines of different lengths: formulas “4+3+4+3”, “4+4+4+2”, etc.), as well as free heterometric texts (with irregular alternation of lines of different lengths: formulas “4/5/6”, “1/2/3/4/5/6”, etc.; the frequency of lines of various lengths is not taken into account).
A note to paragraph 2 (on mono- and heterometricity)
Both monometric and heterometric texts are non-polymetric. For polymetric texts see below, paragraph 6.
A note to paragraph 2 (on line length)
The length of syllabic-accentual verse lines is measured in feet (the clausula does not count; in the case of a 3-syllable meter with variable anacrusis, neither the clausula, nor anacrusis count). Line length for dolnik, taktovik and accentual verse is measured in metrically stressed syllables—ictuses (stress in the clausula or anacrusis does not count, nor does extrapattern stress; unstressed ictus do count).
When computing the length of a syllabic verse line, the syllables of the clausula and expanded caesura do not count. Therefore, some traditional national designations of syllabic meters require a “syllable recount”. Thus, while the “length” of French decasyllable verse with the formula “10+(0/1)” is considered equal to 10 syllables, the length of Italian hendecasyllable verse with the formula “10+1” (in low genres: “10+(0/1/2)”) is also considered equal to 10 syllables. Similarly, the length of French alexandrine (dodecasyllable verse) with the formula “6+6+(0/1)” is considered equal to 12 syllables. But the length of Spanish alexandrine with the formula “6+(0/1)+6+1” is also considered equal to 12 syllables, although its traditional representation in the case of the verse with a female caesura and (obligatory) female clausula looks like “7+7” (tetradecasyllable, fourteener). In other words, to determine the length of syllabic verses, only “metric” syllables are counted (as is customary in the French tradition, but not in the Italian, Spanish or Portuguese traditions).
- Clausula (masculine, feminine, dactylic, hyperdactylic), given by a formula [mfmf, ffmffm, dmdm, etc.].
- Rhyme, given by a formula [abab, abba, etc.].
- Some combinations of parameters 1 to 4 producing traditional stanzas and fixed forms, are provided with appropriate meta-information (ottava rima, Ronsardʼs stanza, tercets, sonnet, triolet, etc.). For other stanzaic texts the stanza length (in lines) is indicated. This parameter may have a zero value (for astrophic verse).
Polymetric compositions (PMC). This metrical structure is characteristic of a number of Romance verse forms, in particular, for the cantata. PMC meter and stanza vary across fragments of PMC, which nevertheless remains a single work.
A note to paragraph 6 (on polymetricity)
For PMC, the field must be filled with multiple values: each monometric fragment is described by its own parameters; the whole work is findable by searching for parameters corresponding to one of the monometric fragments. In addition, a separate “Polymetricity” parameter is introduced, making it possible to find, in the Corpus subsystem, PMC of any structure (without specifying the structure). At the first stage of the project, PMCs are defined only by the latter parameter.
Metadata (attributes)
Texts found in the Corpus subsystem are supplied with the following attributes:
- Short title of work (Author + Title)
- Bibliographic description of work (according to the edition from which the text is extracted for the Corpus)
- Author of work (author of the original if a translated work)
- Translator (if a translated work)
- Publication date (according to bibliographic description)
- Date of composition (if the precise date is unknown, a range is given)
- Date of the first publication (if known)
- Language of work
- Type of work in the Corpus (О, T, I or S)
- Speech form (verse; prose; mixed: verse + prose)
- Metrical and stanzaic parameters of work:
- meter
- line length
- clausula
- rhyme
- stanza or fixed form
- polymetricity
- Notes (this field contains data which is kept informal at this stage)
This metadata set accompanies works of all types. This approach lets us obtain the accompanying information (metadata) when working with any text, makes it easier and more intuitive to establish links between texts, and makes attribute search more flexible.
The metadata are invoked by clicking the (i) button on the toolbar.
Titles and texts
Titles found in the Corpus subsystem are represented by full texts and metadata. However, the full text may be temporarily unavailable (to be added later). Metadata can be incomplete (i.e., when the author of an intermediary translation is known, but no other data on the title is at hand, be it temporarily or irremediably).
The Corpus subsystem may contain individual works that are not represented by full texts and simultaneously have unspecified attribute values (e.g. “unknown original”, “unidentified intermediary translation”). Such potential texts are in fact desiderata, as they stimulate research.
Example 1. Alexander Tokarev, a member of the “Green Lamp”, left behind only two poems (1819). One, as Boris Tomashevsky established, is a translation of Paul Scarronʼs comic sonnet “Superbes monuments de lʼorgueil des humains...” The second one is a comic eulogy to tobacco, also, most likely, a translation, but its source is still unknown.
Example 2. Most translations from Italian, Spanish and Portuguese, made in the eighteenth century, when these languages were poorly known in Russia, are based on French and German intermediaries, not all of which have been identified.
A work of any type [O, T, I or S] is represented by one text. Russian-language works of the type [T] are represented by one of the authoritative texts from a publication selected on the basis of expert opinion. Originals [O] and intermediary translations [I] are represented by the edition that was presumably the source of the translation, or that reproduces the text of this source. In the absence of access to the actual text, it is represented by metadata only.
Modern authoritative editions of originals and translated works are considered metatexts and hence are found in the Library subsystem along with other metatexts (research texts). Such publications are reproduced in their entirety, that is, together with the accompanying articles, comments, etc., as in the edition.
On the structure of the Library subsystem see the appropriate section of this description. The issue of the relationship between objects in the two subsystems and switching between them is discussed below.
Originals and intermediary translations
In rare cases, a translated work may have more than one original. Firstly, there are (semi)translated works such as e.g. “excerpts” from Baratynskyʼs poem “Воспоминания” (Recollections), where some of the text is translated from the abbé Delille, some from Gabriel Legouvé, while other parts are translations of a still unknown original (or they may be original themselves). Secondly, there are works translated from several editions of the original, e.g. several Russian translations of Millevoyeʼs elegy “La Chute des feuilles” (available in five redactions, quite divergent from each other). In the second case, the problem of versions again arises, which we temporarily “got rid” of for the case of several translation versions.
One translated work may be based on more than one intermediary translation. And intermediary translation may be hypothetical, i.e. not known for certain (for example, if the translator did not know the original language).
The issues of variability and fragmentation of texts
Variability. The example of the Russian translations of Millevoyeʼs elegy “La Chute des feuilles” (see above) brings us to the extremely important philological issue of the instability and variation in the text of a fiction work. This applies in equal measure to original and translated works. Thus, out of Petrarchʼs four sonnets translated by Osip Mandelshtam (1933–1934), the first translation is known in three redactions, the second one in three, the third in two, and the fourth in five.
At the first stage of the project, we refrain from an overall solution of the complex problem of text variants and the relationship between them. In a situation where a translator has created several significantly different versions (“redactions”) of a translation, they can be represented in the system as separate translated works of the same original and by the same author.
Another problem is fragmentation. A translated work can be the translation not of the entire original, but rather of some of its fragments. Even a small poem can be shortened in translation, sometimes considerably. At times, such fragmentation is caused by critical articles, anthologies, textbooks and other texts, reproducing only one, the most significant or popular, fragment of the work. Thus, the imitation of Petrarch by Ivan Dmitriev (1797) is, in fact, only the fourth stanza from Petrarchʼs canzone “Di pensier in pensier...”, read by Dmitriev in Italian with a parallel French translation by Pierre-Charles Levesque, who printed this stanza as a separate work. Konstantin Batiushkovʼs poem “Рыдайте амуры и нежные грации...” (“Weep, Amours, tender Graces, weep...”, 1810–1811) is a translation of three initial and three final lines from Paolo Rolliʼs poem: only these lines were reproduced in Antonio Scoppaʼs poetic treatise, in which Batiushkov had found Rolliʼs poem.
Quite frequent are incomplete translations of long poems, e.g. epics. Several cantos and even fragments of some cantos from Danteʼs Divine Comedy, Ariostoʼs Orlando Furioso, Tassoʼs Jerusalem Delivered have been translated into Russian a number of times. The first full verse translations of these works appeared quite late (Orlando Furioso appeared in Russian as late as 1993, in Mikhail Gasparovʼs free verse translation; a complete translation reproducing the rhymed ottavas is still lacking). Also common are translations of poetic insets from large prosaic works (see below for an example of translations of a romance from Don Quixote by Cervantes).
At the first stage, the issue of fragmentation will not be addressed.
The issue of metadata hierarchy
The problem of metadata hierarchy is directly related to that of fragmentation. Priority should be given to structurally simple works, so at the first stage we confine ourselves mainly to small lyrical genres and monometric texts.
However, we have already included a few large texts consisting of structural units of lower levels (for example, Divine Comedy > Hell, Purgatory, Paradise > individual cantos from Hell, Purgatory and Paradise). Initially, such texts are identified at the first stage as separate titles (works). The translation of any of their fragments is identified “simply” as a translation of such works. The next step should be to think out ways of identifying the connection of fragmentary translations with the corresponding fragments of the original. In other words, we will need to introduce a two-level hierarchy into the metadata (the whole work and its parts).