Produkt

KlappentextThis is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed.
Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed
Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux
Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools
Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data

Martin Weisser is a Professor in the National Key Research Center for Linguistics and Applied Linguistics at Guangdong University of Foreign Studies, China . He is the author of Essential Programming for Linguistics (2009), and has published numerous articles and book chapters, including contributions to The Encyclopedia of Applied Linguistics (Wiley, 2012) and Corpus Pragmatics: A Handbook (2014).

Details

Weitere ISBN/GTIN9781118831908

ProduktartE-Book

EinbandartE-Book

FormatEPUB

Format Hinweis2 - DRM Adobe / EPUB

FormatFormat mit automatischem Seitenumbruch (reflowable)

Verlag

John Wiley & Sons

Erscheinungsjahr2015

Erscheinungsdatum03.12.2015

Auflage1. Auflage

Seiten240 Seiten

SpracheEnglisch

Dateigrösse3321 Kbytes

Artikel-Nr.3232172

Rubriken

Genre9201

Inhalt/Kritik

Leseprobe

CHAPTER 1
Introduction

This textbook aims to teach you how to analyse and interpret language data in written or orthographically transcribed form (i.e. represented as if it were written, if the original data is spoken). It will do so in a way that should not only provide you with the technical skills for such an analysis for your own research purposes, but also raise your awareness of how corpus evidence can be used in order to develop a better understanding of the forms and functions of language. It will also teach you how to use corpus data in more applied contexts, such as e.g. in identifying suitable materials/examples for language teaching, investigating socio- linguistic phenomena, or even trying to verify existing linguistic theories, as well as to develop your own hypotheses about the many different aspects of language that can be investigated through corpora. The focus will primarily be on English-language data, although we may occasionally, whenever appropriate, refer to issues that could be relevant to the analysis of other languages. In doing so, we'll try to stay as theory-neutral as possible, so that no matter which flavour(s) of linguistics you may have been exposed to before, you should always be able to understand the background to all the exercises or questions presented here.

The book is aimed at a variety of readers, ranging mainly from linguistics students at senior undergraduate, Masters, or even PhD levels who are still unfamiliar with corpus linguistics, to language teachers or textbook developers who want to create or employ more real-life teaching materials. As many of the techniques we'll be dealing with here also allow us to investigate issues of style in both literary and non-literary text, and much of the data we'll initially use actually consists of fictional works because these are easier to obtain and often don't cause any copyright issues, the book should hopefully also be useful to students of literary stylistics. To some extent, I also hope it may be beneficial to computer scientists working on language processing tasks, who, at least in my experience, often lack some crucial knowledge in understanding the complexities and intricacies of language, and frequently tend to resort to mathematical methods when more linguistic (symbolic) ones would be more appropriate, even if these may make the process of writing elegant and efficient algorithms more difficult.

You may also be asking yourself why you should still be using a textbook at all in this day and age, when there are so many video tutorials available, and most programs offer at least some sort of online help to get you started. Essentially, there are two main reasons for this: a) such sources of information are only designed to provide you with a basic overview, but don't actually teach you, simply demonstrating how things are done. In other words they may do a relatively good job in showing you one or more ways of doing a few things, but often don't really allow you to use a particular program independently and for more complex tasks than the author of the tutorial/help file may actually have envisaged. And b) online tutorials, such as the ones on YouTube, may not only take a rather long time to (down)load, but might not even be (easily) accessible in some parts of the world at all, due to internet censorship.

If you're completely new to data analysis on the computer and working with - as opposed to simply opening and reading - different file types, some of the concepts and methods we'll discuss here may occasionally make you feel like you're doing computer science instead of working with language. This is, unfortunately, something you'll need to try and get used to, until you begin to understand the intricacies of working with language data on the computer better, and, by doing so, will also develop your understanding of the complexity inherent in language (data) itself. This is by no means an easy task, so working with this book, and thereby trying to develop a more complete understanding of language and how we can best analyse and describe it, be it for linguistic or language teaching purposes, will often require us to do some very careful reading and thinking about the points under discussion, so as to be able to develop and verify our own hypotheses about particular language features. However, doing so is well worth it, as you'll hopefully realise long before reaching the end of the book, as it opens up possibilities for understanding language that go far beyond a simple manual, small-scale, analysis of texts.

In order to achieve the aims of the book, we'll begin by discussing which types of data are already readily available, exploring ways of obtaining our own data, and developing an understanding of the nature of electronic documents and what may make them different from the more traditional types of printed documents we're all familiar with. This understanding will be developed further throughout the book, as we take a look at a number of computer programs that will help us to conduct our analyses at various levels, ranging from words to phrases, and to even larger units of text. At the same time, of course, we cannot ignore the fact that there may be issues in corpus linguistics related to lower levels, such as that of morphology, or even phonology. Having reached the end of the book, you'll hopefully be aware of many of the different issues involved in collecting and analysing a variety of linguistic - as well as literary - data on the computer, which potential problems and pitfalls you may encounter along the way, and ideally also how to deal with them efficiently. Before we start discussing these issues, though, let's take a few minutes to define the notion of (linguistic) data analysis properly.
1.1 Linguistic Data Analysis
1.1.1 What's data?

In general, we can probably see all different types of language manifestation as language data that we may want/need to investigate, but unfortunately, it's not always possible to easily capture all such available material for analysis. This is why, apart from the armchair data available through introspection (cf. Fillmore 1992: 35), we usually either have to collect our materials ourselves or use data that someone else has previously collected and provided in a suitable form, or at least a form that we can adapt to our needs with relative ease. In both of these approaches, there are inherent difficulties and problems to overcome, and therefore it's highly important to be aware of these limitations in preparing one's own research, be it in order to write a simple assignment, a BA dissertation, MA/PhD thesis, research paper, etc.

Before we move on to a more detailed discussion of the different forms of data, it's perhaps also necessary to clarify the term data itself a little more, in order to avoid any misunderstandings. The word itself originally comes from the plural of the Latin word datum, which literally means (something) given , but can usually be better translated as fact . In our case, the data we'll be discussing throughout this book will therefore represent the facts of language we can observe. And although the term itself, technically speaking, is originally a plural form referring to the individual facts or features of language (and can be used like this), more often than not we tend to use it as a singular mass noun that represents an unspecified amount or body of such facts.
1.1.2 Forms of data

Essentially, linguistic data comes in two general forms, written or spoken. However, there are also intermediate categories, such as texts that are written to be spoken (e.g. lectures, plays, etc.), and which may therefore exhibit features that are in between the two clear-cut variants. The two main media types often require rather radically different ways of recording and analysis, although at least some of the techniques for analysing written language can also be used for analysing transliterated or (orthographically) transcribed speech, as we'll see later when looking at some dialogue data. Beyond this distinction based on medium, there are of course other classification systems that can be applied to data, such as according to genre , register , text type , etc., although these distinctions are not always very clearly formalised and distinguished from one another, so that different scholars may sometimes be using distinct, but frequently also overlapping, terminology to represent similar things. For a more in-depth discussion of this, see Lee (2002).

To illustrate some of the differences between the various forms of language data we might encounter, let's take a look at some examples, taken from the Corpus of English Novels (CEN) and Corpus of Late Modern English Texts, version 3.0 (CLMET3.0; De Smet, 2005), respectively. To get more detailed information on these corpora, you can go to https://perswww.kuleuven.be/â¼u0044428/, but for our purposes here, it's sufficient for you to know that these are corpora that are mainly of interest to researchers engaged in literary stylistic analyses or historical developments within the English language. However, as previously stated, throughout the book, we'll often resort to literary data to illustrate specific points related to both the mechanics of processing language and as...
mehr

Bitte warten - die Druckansicht der Seite wird vorbereitet.

Practical Corpus Linguistics

Verfügbare Formate

Produkt

Inhalt/Kritik

Autor

Python-Programmierung für Germanist:innen

Python Programming for Linguistics and Digital Humanities

Python Programming for Linguistics and Digital Humanities

Python Programming for Linguistics and Digital Humanities

Python-Programmierung für Germanist:innen

Services

Infos

Über uns

Leistungen

Newsletter

Zahlung