Electronic language resources in Oxford
The Corpus Linguistics course is held each year in Hilary Term at OUCS, Thursdays 12:30 to 13:30 - more details here.
OUCS is helping to co-ordinate access to electronic language resources across the University of Oxford via a small working group with participation from several departments. Please get in touch if you want to get involved! We have termly meetings to discuss resources and projects, usually on Friday of week 4. The next meeting will be held at OUCS on 17th May 2012, and the new Oxford-based BNCWeb service will be demonstra.
- BNCWeb beta service, hosted at the Oxford e-Research Centre. Available to University of Oxford Users.
- British National Corpus via OXLIP (search in title list under 'b'). Available to all Oxford Users.
- IVIE Corpus of English dialects
- Oxford English Corpus, access available on demand to Oxford researchers - please apply via the website
- Literary and linguistic electronic resources on OXLIP, now with a category for Linguistics. For further information contact Johanneke Sytsema
The University of Oxford has licences for 2008, 2009, 2010 and 2013 for the Linguistic Data Consortium. Take a look at their catalogue, and if there is something there that you are interested in, get in touch with Martin Wynne. Thanks to OUP who paid for the 2009 licence in full for the University, ComLab who are paying for the 2010 licence, and the Phonetics Laboratory for 2013. The following resources have been downloaded from the LDC and are now available online from IT Services for Oxford users. Consult the LDC catalogue for the full list of what is available, and get in touch with martin.wynne at oucs.ox.ac.uk. Please note that you are bound by the terms and conditions of the user agreements associated with each of these resources, which can be found on the LDC website.
- LDC2002L49 Buckwalter Arabic Morphological Analyzer Version 1.0
- LDC2004L02 Buckwalter Arabic Morphological Analyzer Version 2.0
- LDC2009T03 GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
- LDC2009T09 GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
- LDC2009T10 Language Understanding Annotation Corpus
- LDC2009T12 2008 CoNLL Shared Task Data
- LDC2009T13 English Gigaword Fourth Edition disk 1 (4 Gb) and LDC2009T13 English Gigaword Fourth Edition disk 2 (4.5 Gb)
- LDC2009T22 Arabic Newswire English Translation Collection
- LDC2009T23 FactBank 1.0
- LDC2009T24 OntoNotes Release 3.0
- LDC2009T30 Arabic Gigaword Fourth Edition (2.5 Gb)
We are also assembling a list of corpora, copies of which may be available in Oxford, but under a variety of different licensing and access arrangements. Please get in touch to add to the list. For these resources, contact Martin Wynne unless otherwise stated.
- BNC XML version, BNC Baby (sampler on one CD)
- Corpus of Spoken Dutch
- IPI-PAN corpus of Polish
- COLT Corpus of London Teenagers' Speech
- Gesprochenes Jiddisch Textzeugen einer Europäisch-jüdischen Kultur
- ICAME corpus collectionA
- East meets West: a compendium of multilingual resources (the TELRI CD, parallel aligned corpora in many European languages)
- Discovering, creating and using digital literary and linguistic resources
- Licensing issues relating to digital literary and linguistic resources
- Connecting your project or resources with national and international infrastructures
- Writing the Technical Appendix of an AHRC Research Grant application
- Planning a digital project in the humanities
- Digital preservation of research data
- Electronic Text Encoding
- Corpus linguistics
See the research support pages at OUCS for more information.