Language theory and language technology;competence and performance
0Summary02br 02br 00The current generation of language processing systems is based on linguistically motivated competence models of natural languages. The problems encountered with these systems suggest the need for performance-models of language processing, which take into account the statistical properties of actual language use. This article describes the overall set-up of such a model. The system I propose employs an annotated corpus; in analysing new input it tries to find the most probable way to reconstruct this input from fragments that are already contained in the corpus. This perspective on language processing also has interesting consequences for linguistic theory; some of these are briefly discussed. 1. Introduction.02br 02br 00The starting point for this article was the question: what significance can language technology have for language theory? The usual answer to this question is, that the application of the methods and insights of theoretical linguistics in working computer programs is a good way to test and refine these theoretical ideas. I agree with this answer, and I will emphatically reiterate it here. But most of this article is devoted to a somewhat more speculative train of thought which shows that language-technological considerations can have important theoretical implications.02br 02br 00My considerations focus on a fundamental problem which is faced by current language-processing systems: the problem of ambiguity. To solve the ambiguity problem it is necessary to put linguistic insights about the structure and meaning of language utterances under a common denominator with statistical data about actual language use. I will sketch a technique which might be able to do this: data-oriented parsing, by means of pattern-matching with an annotated corpus. This parsing technique may be of more than technological interest: it suggests a new and attractive perspective on language and the language faculty.02br 02br 00First a warning. The following discussion concentrates almost exclusively on the problem of syntactic analysis. Of course this is only a sub-problem -- both in language theory and in language technology. But this problem turns out to yield so much food for thought already, that it does not seem useful to complicate the discussion by addressing the integration with phonetics, phonology, morphology, semantics, pragmatics and discourse-processing. How the different kinds of linguistic knowledge in a language-processing system ought to be distributed over the modules of the algorithm, is a question which will be left out of consideration completely.2. Linguistics and language technology.02br 02br 00To be able to turn linguistics into a hard science, Chomsky [1957] assigned a mathematical correlate to the intuitive idea of a "language". He proposed to identify a language with a set of sentences: with the set of grammatically correct utterance forms that are possible in the language. The goal of descriptive linguistics is then to characterise, for individual languages, the set of grammatical sentences explicitly, by means of a formal grammar. And the goal of explanatory linguistic theories should then be, to determine the universal properties which the grammars of all languages share, and to give a psychological account of these universals.02br 02br 00In this view, linguistic theory is not immediately concerned with describing the actual language use in a language community. Although we may assume that there is a relation between the language users' grammaticality intuitions and their actual language behaviour, we must make a sharp distinction between these; on the one hand the language system may offer possibilities which are rarely or never used; on the other hand the actual language use involves mistakes and sloppinesses which a linguistic theory should not necessarily account for. In Chomsky's terminology: linguistics is concerned with the linguistic competence rather than the actual performance of the language user. Or, in the words of Saussure, who had emphasized this distinction before: with langue rather than parole.02br 02br 00Chomsky's work has constituted the methodological paradigm for almost all linguistic theory of the last few decades. This comprises not only the research tradition that is explicitly aiming at working out Chomsky's syntactic insights. The perspective summarized above has also determined the goals and methods of the most important alternative approaches to syntax, and of the semantic research traditions which have grown out of Richard Montague's work. Now we may ask: how does language technology relate to this language-theoretical paradigm0-
Free · every Monday
Get the Weekly English Kit 📬
New words, one handy idiom, and a 2-minute quiz — delivered to your inbox to keep your streak alive.