ANALYZING LEXICAL COMPLEXITY IN LEARNER CORPORA:A CORPUS-DRIVEN APPROACH USING PART-OF-SPEECH TAGGING AND DEPENDENCY PARSING

Authors

  • Menahil MS Scholar, Applied Linguistics, National University of Computer and Emerging Sciences, Lahore, Pakistan.
  • Saadia Khan PhD Scholar, English (Linguistics), University of Education, Lahore, Pakistan.

DOI:

https://doi.org/10.63878/cjssr.v3i4.1556

Abstract

Lexical complexity is a crucial component of second language (L2) proficiency, encompassing the range, sophistication, and density of vocabulary used by learners. This study examines lexical complexity in learner corpora through a corpus-driven methodology leveraging part-of-speech (POS) tagging and dependency parsing. The objectives are to (a) operationalize lexical complexity into measurable indices (lexical density, diversity, and sophistication), (b) utilize POS tagging to automatically identify and categorize lexical items, and (c) apply dependency parsing to incorporate syntactic context into lexical complexity analysis. The study analyzes a corpus of 300 L2 English essays from intermediate and advanced learners. Results show that advanced learners use a higher proportion of low-frequency “sophisticated” words and exhibit greater lexical diversity than intermediate learners, although lexical density (content word ratio) remains comparable across proficiency levels. Dependency-based metrics (e.g., average dependency length, noun modifier counts) provided additional insights into how learners deploy complex lexico-syntactic structures. The findings highlight a positive correlation between lexical complexity indices and writing quality scores, with lexical sophistication and diversity emerging as significant predictors of human-rated proficiency. The study fills a methodological gap by integrating NLP tools in learner corpus research to yield a multi-dimensional profile of lexical complexity. Implications are discussed for L2 writing pedagogy, automated writing evaluation, and the development of hybrid computational models for L2 complexity. This research underscores the value of combining POS tagging and dependency parsing in corpus analyses to obtain granular, robust measures of lexical complexity in learner language.

Downloads

Download data is not yet available.

Downloads

Published

2025-11-26

How to Cite

ANALYZING LEXICAL COMPLEXITY IN LEARNER CORPORA:A CORPUS-DRIVEN APPROACH USING PART-OF-SPEECH TAGGING AND DEPENDENCY PARSING. (2025). Contemporary Journal of Social Science Review, 3(4), 1143-1170. https://doi.org/10.63878/cjssr.v3i4.1556