ANALYZING LEXICAL COMPLEXITY IN LEARNER CORPORA:A CORPUS-DRIVEN APPROACH USING PART-OF-SPEECH TAGGING AND DEPENDENCY PARSING
DOI:
https://doi.org/10.63878/cjssr.v3i4.1556Abstract
Lexical complexity is a crucial component of second language (L2) proficiency, encompassing the range, sophistication, and density of vocabulary used by learners. This study examines lexical complexity in learner corpora through a corpus-driven methodology leveraging part-of-speech (POS) tagging and dependency parsing. The objectives are to (a) operationalize lexical complexity into measurable indices (lexical density, diversity, and sophistication), (b) utilize POS tagging to automatically identify and categorize lexical items, and (c) apply dependency parsing to incorporate syntactic context into lexical complexity analysis. The study analyzes a corpus of 300 L2 English essays from intermediate and advanced learners. Results show that advanced learners use a higher proportion of low-frequency “sophisticated” words and exhibit greater lexical diversity than intermediate learners, although lexical density (content word ratio) remains comparable across proficiency levels. Dependency-based metrics (e.g., average dependency length, noun modifier counts) provided additional insights into how learners deploy complex lexico-syntactic structures. The findings highlight a positive correlation between lexical complexity indices and writing quality scores, with lexical sophistication and diversity emerging as significant predictors of human-rated proficiency. The study fills a methodological gap by integrating NLP tools in learner corpus research to yield a multi-dimensional profile of lexical complexity. Implications are discussed for L2 writing pedagogy, automated writing evaluation, and the development of hybrid computational models for L2 complexity. This research underscores the value of combining POS tagging and dependency parsing in corpus analyses to obtain granular, robust measures of lexical complexity in learner language.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Contemporary Journal of Social Science Review

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
