next up previous
Next: Lexicography with UNIX tools Up: Computational lexicography Previous: Detailed microstructure organisation

Steps in practical lexicography

Practical lexicon development can be located along a scale from large-scale software engineering projects at the one end, to on-demand creation of prototypes for empirical work on linguistic questions on the other. Many software tools have been developed over the years for working with text databases, and a number of programme generations can be observed, from mainframe languages for string processing through PC based tools to tools with graphical user interfaces (GUIs), both in local stand alone and client-server applications, and globally on the World-Wide-Web. The lexicographic tools include string processing languages such as SNOBOL, TUSTEP, as well as UNIX shell tools for processing text streams, proprietary or local DOS tools, and more recently tools based on GUIs in windows environments on PCs, Macs, and UNIX workstations, or as platform independent applications such as hyperlexica for the World Wide Web.

As noted in Section 1.2, the professional software developer follows established engineering procedures, including the basic steps (varying somewhat from one method to another) of requirements specification, software design, implementation, following general principles of modular development. In a computational linguistic environment, development is often more informal; for one thing, the comprehensive range of project support tools which is available to the professional software developer is generally not available.

A frequent strategy is to use the UNIX operating system (or a non-proprietary variety such as Linux), which includes a set of powerful and easy to use tools for text data processing, statistical analysis and formatting. UNIX tools are standardly used in spoken language technology labs and computational lexicography projects for prototyping lexicons. These tools have provided an entry-point into computational linguistics for many generations of students and researchers. But, even though the deployment of UNIX tools can become quite sophisticated, it must be remembered that this type of programming is a long way from professional software development methodology.

Introductions to UNIX tools for linguists are provided in regular computational linguistics courses and at summer schools, but accessible practical reference materials are hard to come by; for this reason, a small selection of UNIX techniques and tools for lexicography is introduced here. There is no space for much explanation, so only hints and ideas can be communicated; the reader is referred to UNIX online documentation (`man pages' or `texinfo') and handbooks for more details.




next up previous
Next: Lexicography with UNIX tools Up: Computational lexicography Previous: Detailed microstructure organisation

Dafydd Gibbon
Thu Nov 19 10:12:05 MET 1998