| Computers & Texts
No. 12 |
Table
of Contents |
July 1996 |
Michael Fraser
CTI
...
I knew that the work in which I engaged is generally considered as
drudgery for the blind, as the proper toil of artless industry; a task that
requires neither the light of learning, nor the activity of genius, but may be
successfully performed without any higher quality than that of bearing burdens
with dull patience, and beating the track of the alphabet with sluggish
resolution.
So Samuel Johnson commenced his letter to the Earl of Chesterfield on The
Plan of an English Dictionary (1747). Some (who know no better) might be
tempted to suggest that nearly 250 years later this description might be better
applied to those who spend their time encoding electronic texts so that the
rest might easily navigate and search their contents. But, of course, as the
Wife of Bath's Prologue amply demonstrates this is most certainly not the case.
The encoding of an electronic edition so that the structure is made apparent,
the content easily searchable, and the whole attractively presented is not a
task for the light of learning even if a considerable portion is fairly harmless
drudgery.
Two editions of Samuel Johnson's Dictionary of the English Language have been published on CD-ROM by Cambridge University Press. The first, produced by Johnson in 1755, and the fourth, revised and published by Johnson in 1773. Entries from both editions can be viewed simultaneously on the screen. The electronic edition, like the Wife of Bath's Prologue, is encoded in TEI-SGML and presented with DynaText. This gives the CD-ROM a similar appearance to the Wife of Bath's Prologue and indeed it is only necessary to have installed one DynaText reader together with the specific fonts in order to view any one of the three CD-ROMs reviewed here.
The structure of Johnson's dictionary falls into the transcriptions and the digitized images of each page of each edition. Although it is possible for the dictionary to be navigated by the transcription, moving, for example, from the letter A to ABE... to Abecdary (Belonging to the Alphabet) it is more useful to locate words using the search forms provided.
The value of a work must be estimated by its use; it is not enough that
a dictionary delights the critick, unless, at the same time, it instructs the
learner; as it is to little purpose that an engine amuses the philosopher by
the subtilty of its mechanism, if it requires so much knowledge in its
application as to be of no advantage to the common workman.
The
subtilty of the underlying encoding system might well amuse the inclined
philosopher. However, the common academic is not required to understand more
than the basics in order to make good use of it. Readers who have been duly
impressed by the search capabilities of the Oxford English Dictionary on CD-ROM
will be pleased to know that similar searches can be carried out on the OED's
illustrious predecessor. Such searches are only possible because the editor,
Anne McDermott, included the encoding of many of the elements identified by the
TEI's Guidelines for print dictionaries (headword, part of speech, etymology,
usage, sense, definition etc.).
![]() |
Johnson's Dictionary: Entry, transcription, and digitized image from the first edition.
The forms interface gives the option of searching the complete dictionary for a keyword or limiting the search to within the headword, definition, quotation, first or fourth edition, quoted author or title. If that is not sufficient then more complex searches can be entered using the underlying markup. This is particularly useful for proximity or Boolean type searching but also for giving access to the additional features encoded in the dictionary.
Barbarous, or impure, words and expressions, may be branded with some
note of infamy, as they are carefully to be eradicated wherever they are found;
and they occur too frequently, even in the best writers.
One of the
pleasures afforded this common workman in the review of Johnson's Dictionary
was attempting to reveal the voice of Johnson beneath the dull (as, to make
dictionaries is dull work) defining of everyday words. Often cited, before
even inspecting the electronic edition, are Johnson's definitions of
lexicographer (a harmless drudge), oats (a grain, which in England
is generally given to horses, but in Scotland supports the people), or to
worm (to deprive a dog of something, nobody knows what, under his tongue,
which is said to prevent him, nobody knows why, from running mad).
One of Johnson's primary concerns in compiling his dictionary was for the purity of the English language. A substantial number of 'barbarous' words are to be found in both editions of the dictionary. Placed there not, one suspects, because his dictionary was intended to be a snapshot of eighteenth century English usage, but rather because such words, being offensive to Johnson's ideal of purity through etymology, were placed in the dictionary to indicate to the common workman precisely which words he should not be using. In total 49 words are described by Johnson as 'barbarous'. A search specified in the form '<entryfree> cont (<note> with type=usg cont barbarous) and (<author> cont shakespeare)' will find, amongst others, those occurrences where Shakespeare himself employed such words (vastidity, worser).
Far more common are instances of 'low' (258) or 'cant' (154) words. Cant is defined by Johnson as 'a corrupt dialect used by beggars and vagabonds', 'barbarous jargon' or ' a whining pretension to goodness, in formal and affected terms'. Examples of the cant include 'black-guard' (a cant word amongst the vulgar), 'confounded' (hateful; detestable; odius as in 'He was a most confounded Tory' -Swift), 'mundungus' (stinking tobacco) and 'slim' (slender, thin of shape; a cant word as it seems, and therefore not to be used). The latter is an example of Johnson's attempt to educate by proscription. Johnson's aim to eradicate the English language of cant or spurious words peaks in the few instances where he presents the headword then the definition followed by the comment that, 'in this sense it is not used'. One can perhaps understand this where 'not used' is an addendum to a word in the fourth edition previously defined without comment in the first edition (Calmy: calm; peaceful or Preach: noun, a discourse, religious oration). This is not the case with the first definition given for 'snuff' (Snot. In this sense it is not used) which appears in both the first and the fourth editions. On finding this one immediately desires to consult the Oxford English Dictionary which has duly taken note of Johnson's claim and not included 'snot' among the definitions given for 'snuff'. Unfortunately, it is nearly impossible to search for all occurrences of 'not used' in the dictionary because 'not' has been designated a stopword and is thus ignored in all searches. As one might expect in a work of this nature the form 'used' is present in great frequency. One thus tends to stumble on Johnson's proclamations quite by accident. The work of purifying the English language, however, continues even if, on occasions, literature can impede its progress (Primal: First. A word not in use, but very commodious for poetry).
So that in search of the progenitors of our speech, we may wander from
the tropick to the frozen zone, and find some in the valleys of Palestine, and
some upon the rocks of Norway.
Picking out the Norwegian, the Indian,
the Icelandic, the Irish and the Saxon, the Greek and the Hebrew words is, at
first sight, easy enough. Searching with the '<etym>' tag containing some
specified language shows 4131 words with some reference to Saxon etymology,
9763 from Latin, 5655 from French, and only 433 with reference to German.
However, one cannot be sure of the accuracy of this method of searching. Greek
words serve to demonstrate the point. One can search for '<etym> cont
Greek' and retrieve a paltry 42 words. A browse through the dictionary shows
that Johnson was not consistent in his specification of etymology. He also uses
the abbreviation Gr. or, on most occasions, leaves it to the reader to
recognise Greek on sight. The advantage, however, with Greek words in the
electronic edition of the Dictionary is that a separate character set is
required. Thus, there is an extra tag within '<etym>' which specifies
Greek (<lang="gk">). Searching for all instances of Greek
within the etymology retrieves a far more realistic 4307 words. It is
unfortunate that a similar tag was not used for all languages and so
regularizing the etymological entries. Thus one could ensure that a search for
'latin' would also pick up lat. (a total of 19035 entries) and instances where
neither form are used. I was rather disappointed to discover that '<lang>'
had not been used for Hebrew words. I assume because no separate character set
was defined and instead the publishers decided to insert graphic images of each
Hebrew word.
Finally, One word was recognised by Johnson as predating the Flood and the fall of the Tower of Babel. That word was 'sack', to be 'found in all languages, and it is therefore conceived to be antediluvian'. The Oxford English Dictionary very nearly agrees with Johnson on this point, but confines itself to referring to the word as having a prehistoric type.
This, my Lord, is my idea of an English dictionary; a dictionary by
which the pronunciation of our language may be fixed, and its attainment
facilitated; by which its purity may be preserved, its use ascertained, and its
duration lengthened.
An idea hardly fulfilled by Johnson's Dictionary.
What was cant then is elegant today and Johnson's refinements are today's
slang. The English language evolves as it ever did. The ease and variety of
ways in which an eighteenth century representation of English can be consulted
on a twentieth century spinning mechanical disk fulfils many of the aims for the
dictionary that Johnson had hoped his dictionary would do for the language of
England. He would surely have approved this and its future provision on
something reticulated or decussated, at equal distances with interstices
between the intersections.
...
[Table of Contents] [Letter to the Editor]
Computers & Texts 12 (1996), 21. Not to be republished in any form
without the author's permission.
HTML Author: Michael
Fraser (mike.fraser@oucs.ox.ac.uk)
Document Created: 22 August 1996
Document
Modified:
The URL of this document is
http://info.ox.ac.uk/ctitext/publish/comtxt/ct12/fraser.html