Computational comparative analysis of global water legislation: an NLP and LLM-based framework for cross-jurisdictional policy assessment

dc.contributor.advisorSiamac, Fazli
dc.contributor.authorAlikhanov, Adilkhan
dc.date.accessioned2026-05-26T12:27:13Z
dc.date.issued2026-05-08
dc.description.abstractThe research outlined within this dissertation provides an approach to analyzing inter- national water legislation by using a computational pipeline to process water legislation from 165 different countries written in over 35 different languages and represented by over 10 different writing systems. The computational pipeline included seven steps: extracting the text from documents, translating that extracted text into English, eval- uating the quality of those translations based on multiple metrics, utilizing a large language model to extract legal information from the translated text, calculating the similarities between each piece of legislation utilizing embedded representations of the text, and finally clustering these similar pieces of legislation together to identify pat- terns of similarity among them. This computational pipeline shows how automated methods may provide an extension to the existing manual comparative tradition in water law research, allowing researchers to analyze large amounts of data that would be impossible to compare manually. Important findings from this project were: (1) that the quality of the translation was sufficient enough to allow for meaningful com- parison in the majority of the sample set (based on COMET reference-free quality estimation the mean score was 0.83); however, it was determined that there existed a phenomenon referred to as “contextual flattening,” where low resource languages had been reduced to a flat context that did not take advantage of the linguistic complexity present in the original language; (2) that the large language model-based extraction pipeline was able to extract all relevant information regarding three dimensions of wa- ter law policy—groundwater regulation, river basin management, and polluter-pays principle—with 100% compliance with the schema; (3) that cluster analysis revealed five distinct typologies of water law that corresponded with some extent to traditional classifications of legal families but also indicated cross-traditional convergence in basin- based governance practices; and (4) that the polluter-pays principle was found to be the most frequently used mechanism of implementation although it was never explicitly mentioned in any of the examined country profiles. The methodology presented in this dissertationwillserveasthebasisforfutureresearchinvolvingtheuseofcomputational comparative law in areas outside of the water sector.
dc.identifier.citationAlikhanov, A. (2026). Computational comparative analysis of global water legislation: An NLP and LLM-based framework for cross-jurisdictional policy assessment. Nazarbayev University School of Engineering and Digital Sciences
dc.identifier.urihttps://nur.nu.edu.kz/handle/123456789/18745
dc.language.isoen
dc.publisherNazarbayev University School of Engineering and Digital Sciences
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/
dc.subjectcomputational comparative law
dc.subjectwater legislation
dc.subjectnatural language processing
dc.subjectlarge language models
dc.subjectmachine translation evaluation
dc.subjecthierarchical clustering
dc.subjecttext embeddings
dc.subjectpolicy analysis
dc.titleComputational comparative analysis of global water legislation: an NLP and LLM-based framework for cross-jurisdictional policy assessment
dc.typeMaster`s thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Adilkhan_Alikhanov_MSc_Thesis.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format
Description:
Master`s thesis