TITLE

Dataset Generation for Korean Urban Parks Analysis with Large Language Models


CONFERENCE

The 33rd ACM International Conference on Information and Knowledge Management 


ABSTRACT

Understanding how urban parks are utilized and perceived by the public is crucial for effective urban planning and management. This

study introduces a novel dataset derived from Instagram, using 42,187 images tagged with #Seoul and #Park hashtags from 2017

to 2023. These images were filtered using InternLM-XComposer2, a Multimodal Large Language Model (MLLM), to confirm they

depicted park scenes. GPT-4 then annotated the filtered images, resulting in 29,866 valid image annotations of physical elements,

human activities, animals, and emotions. The dataset is publicly available at https://huggingface.co/ datasets/RedBall/ seoul-urbanpark-

analysis-by-llm.


KEYWORDS 

Datasets, Urban park, Large language models, Image annotation


CITATION

Kim, H., Kang, M., Choi, H., & Cheong, Y. G. (2024, October). Dataset Generation for Korean Urban Parks Analysis with Large Language Models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (pp. 5375-5379).