Improving Langscape's Text-based Language Identification Tool

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Moderator
Panelist
Alternative Title
Department
Swarthmore College. Dept. of Linguistics
Type
Thesis (B.A.)
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
en_US
Note
Table of Contents
Terms of Use
Full copyright to this work is retained by the student author. It may only be used for non-commercial, research, and educational purposes. All other uses are restricted.
Rights Holder
Access Restrictions
Terms of Use
Tripod URL
Identifier
Abstract
Text-based language identification (LID) is the task of determining the language a piece of text is written in. Although modem LID tools achieve high accuracy using the widely-accepted n-gram method, there are several areas of LID that remain more difficult, particularly the task of distinguishing between closely related languages. Langscape, a project of the University of Maryland's Language Science Center, has an LID tool that uses a variation on the n-gram method. In this thesis, 1 propose and test a modification to Langscape's LID tool to improve its ability to distinguish between closely related languages.
Description
Subjects
Citation
Collections