Computational comparative analysis of global water legislation: an NLP and LLM-based framework for cross-jurisdictional policy assessment

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Nazarbayev University School of Engineering and Digital Sciences

Abstract

The research outlined within this dissertation provides an approach to analyzing inter- national water legislation by using a computational pipeline to process water legislation from 165 different countries written in over 35 different languages and represented by over 10 different writing systems. The computational pipeline included seven steps: extracting the text from documents, translating that extracted text into English, eval- uating the quality of those translations based on multiple metrics, utilizing a large language model to extract legal information from the translated text, calculating the similarities between each piece of legislation utilizing embedded representations of the text, and finally clustering these similar pieces of legislation together to identify pat- terns of similarity among them. This computational pipeline shows how automated methods may provide an extension to the existing manual comparative tradition in water law research, allowing researchers to analyze large amounts of data that would be impossible to compare manually. Important findings from this project were: (1) that the quality of the translation was sufficient enough to allow for meaningful com- parison in the majority of the sample set (based on COMET reference-free quality estimation the mean score was 0.83); however, it was determined that there existed a phenomenon referred to as “contextual flattening,” where low resource languages had been reduced to a flat context that did not take advantage of the linguistic complexity present in the original language; (2) that the large language model-based extraction pipeline was able to extract all relevant information regarding three dimensions of wa- ter law policy—groundwater regulation, river basin management, and polluter-pays principle—with 100% compliance with the schema; (3) that cluster analysis revealed five distinct typologies of water law that corresponded with some extent to traditional classifications of legal families but also indicated cross-traditional convergence in basin- based governance practices; and (4) that the polluter-pays principle was found to be the most frequently used mechanism of implementation although it was never explicitly mentioned in any of the examined country profiles. The methodology presented in this dissertationwillserveasthebasisforfutureresearchinvolvingtheuseofcomputational comparative law in areas outside of the water sector.

Description

Citation

Alikhanov, A. (2026). Computational comparative analysis of global water legislation: An NLP and LLM-based framework for cross-jurisdictional policy assessment. Nazarbayev University School of Engineering and Digital Sciences

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 United States