TITLE
Dataset Generation for Korean Urban Parks Analysis with Large Language Models
CONFERENCE
The 33rd ACM International Conference on Information and Knowledge Management
ABSTRACT
Understanding how urban parks are utilized and perceived by the public is crucial for effective urban planning and management. This
study introduces a novel dataset derived from Instagram, using 42,187 images tagged with #Seoul and #Park hashtags from 2017
to 2023. These images were filtered using InternLM-XComposer2, a Multimodal Large Language Model (MLLM), to confirm they
depicted park scenes. GPT-4 then annotated the filtered images, resulting in 29,866 valid image annotations of physical elements,
human activities, animals, and emotions. The dataset is publicly available at https://huggingface.co/ datasets/RedBall/ seoul-urbanpark-
analysis-by-llm.
KEYWORDS
Datasets, Urban park, Large language models, Image annotation
CITATION
Kim, H., Kang, M., Choi, H., & Cheong, Y. G. (2024, October). Dataset Generation for Korean Urban Parks Analysis with Large Language Models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (pp. 5375-5379).