CorpusLinguistics
1
Whatisacorpus?
Corpus(pl.corpora)
Acollectionoftextsassumedtoberepresentativeofagivenlanguage,dialect,orothersubsetofalanguage,tobeusedforlinguisticanalysis.
(Francis1982)
2
Whatisacorpus?
Threebasicknowledgeaboutcorpus:
Alargeprincipledcollectionofnaturaltexts.
naturaltexts:languagehasbeencollectedfromnaturallyoccurringsources.
Abasicsourceofstoringlanguageknowledge.
Onlyafterbeingprocessed,canitbeavailableresources.
3
Partone-Whatiscorpuslinguistics?
Asameans-exploreactualpatternsoflanguage
Use
Asatool-developingmaterialsforclassroomlanguageinstruction
Andalsouseslargecollectionsofbothspokenandwrittennaturaltextsthatarestoredoncomputers.
4
Whatiscorpuslinguistics?
Corpuslinguisticsisanapproachtoinvestigatelanguagethatischaracterizedbytheuseoflargecollectionsoftexts(spoken,writtenorboth)andcomputer-assistedanalysis
methods.
5
Contributionsofcorpuslinguistics
·Oneofthemajorcontributionsofcorpuslinguisticsisintheareaofexploringpatternsoflanguageuse.
Provideanextremelypowerfultool-theanalysisofnaturallanguage
Providetremendousinsights-howlanguageusevariesindifferentsituations
6
Thedevelopmentofcorpus
In1964,,thefirstcorpuswasbuiltbyAmericanBrownUniversity,storingonemillionwordsandcollectingeachstylesoflanguage
sourcesofAmericanEnglish.
In1978,BritainbuiltLOB(Lancaster-Oslo-Bergen),storingonemillionwordsandcollectinglanguagesourcesofBritishEnglish.
In1980s,BirminghamUniversitybuil+BCET(BirminghamCollectionofEnglishTexts),storing7.3millionwordsandusingfordictionarycompilation.
In1996,thestoragewasextendedto32millionwordsandwasrenamedasBE(BankofEnglish).
In1980,COBUILD(CalLinsBirminghamUniversityInternationalLanguageDatabase)wasbuilt,inclu