Optimization of metadata volume in modern information systems: methods, tools and algorithmic approaches
DOI:
https://doi.org/10.17721/AIT.2025.1.01Keywords:
metadata, optimization, management, information, dynamism, adaptability, relevance, performance, quality, algorithm, compressionAbstract
B a c k g r o u n d. The study of metadata volume optimization aims to achieve a balance between the sufficiency of description and system efficiency. Excessive metadata can overload systems, and insufficient metadata can complicate data access. At the same time, the growing volume of data complicates work with metadata, since its creation, storage and processing require significant resources. Metadata volume optimization has become an important task for organizations seeking to achieve effective information management. Challenges associated with metadata volume: redundancy, insufficiency, duplication, data dynamics, non-compliance with standards.
M e t h o d s. The paper considers the theoretical foundations, practical methods, tools (Collibra, Apache Atlas, Talend Metadata Manager, AI algorithms) and the benefits of metadata volume optimization (cost reduction, improved productivity, improved data quality, flexibility and scalability, improved analytics).
R e s u l t s. The paper proposes a comprehensive metadata optimization strategy adapted for the IT environment. It is shown that the use of a systematic approach, including analysis, standardization, automation and integration of the latest technologies, allows to significantly reduce costs and improve data management. An algorithm for optimizing the volume of metadata is presented, which can be adapted for various application areas, such as databases, content management systems or big data. The proposed algorithm takes into account the assessment of metadata usefulness using metric normalization to unify evaluation scales and determine the usefulness of each metadata element; metadata selection (filtering, clustering); metadata compression; automatic optimization using machine learning models and dynamic tuning; verification and adaptation. The algorithm can be expanded or modified depending on the specifics of the task.
C o n c l u s i o n s. The proposed algorithm takes into account: metadata usefulness assessment using metric normalization to unify evaluation scales and determine the usefulness of each metadata element; metadata selection (filtering, clustering); metadata compression; automatic optimization using machine learning models and dynamic tuning; testing and adaptation. The algorithm can be expanded or modified depending on the specifics of the task.
Downloads
References
Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data – The story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22. https://doi.org/10.4018/jswis.2009081901
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP ’03) (pp. 29–43). Association for Computing Machinery. https://doi.org/10.1145/945445.945450
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In M. Lapata, P. Blunsom, & A. Koller (Eds.), Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 2, Short Papers (pp. 427–431). Association for Computational Linguistics. https://doi.org/10.18653/v1/E17-2068
Korshun, N., Myshko, I., & Tkachenko, O. (2023). Automation and management in operating systems: The role of artificial intelligence and machine learning. In Proceedings of the 20th International Scientific Conference Dynamical System Modeling and Stability Investigation (DSMSI 2023: Mathematical Foundations of Information Technologies) (pp. 59–68). Igor Sikorsky Kyiv Polytechnic Institute; published online in CEUR Workshop Proceedings. https://ceur-ws.org/Vol-3687/
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv. https://doi.org/10.48550/arXiv.1301.3781
Turovsky, O., Tkachenko, O., Ghno, G. S. N., & Abed, A. M. (2024). Selection and substantiation of the system of criteria for evaluating the effectiveness of steganographic methods of hiding information in the image. In Proceedings of the 35th Conference of Open Innovations Association (FRUCT) (pp. 755–763). Open Innovations Association (FRUCT). https://doi.org/10.23919/FRUCT61870.2024.10516392
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Advanced Information Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2025 Advanced Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License