Improving Langscape's Text-based Language Identification Tool
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Producer
Director
Performer
Choreographer
Costume Designer
Music
Videographer
Lighting Designer
Set Designer
Crew Member
Funder
Rehearsal Director
Concert Coordinator
Advisor
Moderator
Panelist
Alternative Title
Department
Swarthmore College. Dept. of Linguistics
Type
Thesis (B.A.)
Original Format
Running Time
File Format
Place of Publication
Date Span
Copyright Date
Award
Language
en_US
Note
Table of Contents
Terms of Use
Full copyright to this work is retained by the student author. It may only be used for non-commercial, research, and educational purposes. All other uses are restricted.
Rights Holder
Access Restrictions
Terms of Use
Tripod URL
Identifier
Abstract
Text-based language identification (LID) is the task of determining the language a piece of text
is written in. Although modem LID tools achieve high accuracy using the widely-accepted
n-gram method, there are several areas of LID that remain more difficult, particularly the task
of distinguishing between closely related languages. Langscape, a project of the University
of Maryland's Language Science Center, has an LID tool that uses a variation on the n-gram
method. In this thesis, 1 propose and test a modification to Langscape's LID tool to improve its
ability to distinguish between closely related languages.