the salamanca corpus



    The Salamanca Corpus (SC) attempts to give a sizeable sample of regional linguistic features as used in pre-1974 English counties throughout history. It is not our intention to substitute any other kind of information. Yet, literary data may prove particularly useful for time spans when regional documents are rather scant. The compilation of the corpus has been undertaken according to specific criteria.   

    Chronologically, the corpus covers a time span that extends from the 1500s up to the twentieth century. As can be seen in the present database, texts have been arranged into distinct chronological sections which have been devised with the aim of enabling longitudinal diachronic research. In particular, three broad time periods have been established: 1500-1700, 1700-1800, 1800-1950. This will allow for the comparative study of specific features across time, namely dialect spelling practices in the rendition of a particular variety, syntactic markers, morphological patterns, etc. Obviously, it will depend on the specific demands of a particular search, or scholar, that the diachronic facilities of the corpus will be applied in one way or another.

   Typologically, the corpus is literary restricted. It is worth emphasising that, as shown in the present database, different genres have been considered irrespective of their literary merit: cases of drama, prose and verse. Furthermore, the documents selected have been classified in terms of the type of dialect representation, namely literary dialect and dialect literature (see links on the left). It is fairly impossible to offer a balanced number of texts which are, for example, representative of dialect literature, since the the amount of regionally-anchored material dating to the early modern period is significantly scarce if compared with the nineteenth century. This also holds true for cases characteristic of literary dialects in prose specimens.

    Diatopically, texts representative of pre-1974 English dialects have been selected. The long-standing literary pedigree of counties such as Yorkshire and Lancashire has made it possible to find many documents representative of these varieties. Others such as Essex or Buckinghamshire suffer from a relative lack of vernacular literature, making it more complex to retrieve historical data from these areas. Unbalanced as it might seem, the selection of texts has not been made randomly, but according to the availability of material which is in turn dependent on the literary practices of each time period.

Overall Structure

    Given this, the overall structure of the Salamanca Corpus consists of two broad sections which correspond with the type of dialect rendition (dialect literature, literary dialects) into which texts have been classified depending on the time period they were written, the county to which they are ascribed, genre and authorship. It can be illustrated as follows:

            Type of dialect representation

                    Time span






    The compilation of texts has been eased thanks to the availability of general and county bibliographies of texts which can be accessed by clicking on the links on the left. They have been completed by means of individual searches in The British Library, The Bodleian Library, Cambridge University Library, John Rydlands Library, The Brotherton Library or The Folger Shakespeare Library, to which we are most grateful. A full catalogue of the compiled texts can also be accessed on the link on the left. Texts have been transcribed in MS Word Format so that they can be easily manipulated for corpus software analysis. Most of them have been transcribed in full so that the analysis of the dialect many be undertaken in context. A MS Word and a pdf version of each text can be downloaded from the present database. Detailed information on the source and e-texts has been given. It is worth indicating that some texts which are already available on other databases such as Literature Online or Project Gutenberg have not been included in this digital archive, but have been considered in our compilation. Worthy of notice is also the fact that canonical writers have neither been included nor considered in the compilation for obvious reasons; Elizabeth Gaskell, for example.   



Copyright © 2011-DING, The Salamanca Corpus,

Universidad de Salamanca


The Salamanca Corpus

Structure and criteria of compilation