Skip to main navigation menu Skip to main content Skip to site footer

Decision Tree Algorithm Moderately Coupled to PostgreSQL DBMS

Abstract

Using machine learning for data management is an extraordinary opportunity to move towards a leadership model based on information, which drives the organization towards success in each initiative. However, when incorporating these technologies, a company presents problems associated with the economic and administrative costs generated in this process since these are usually quite high, limiting their implementation in MSMEs. This paper proposes to integrate supervised machine learning techniques into PostgreSQL DBMS in a moderately coupled architecture to provide it with the capabilities of discovering knowledge in databases. Classification and regression algorithms were coupled by developing extensions using one of the procedural languages supported by PostgreSQL. Initially, the C4.5 decision tree classification algorithm was implemented using the PL/pgSQL procedural language. The main advantage of this strategy is that it considers the scalability, administration, and data manipulation of the DBMS. Since PostgreSQL is an open-source manager, organizations such as MSMEs will have a free tool that allows them to perform predictive analysis in order to improve their decision-making processes by anticipating future consumer behavior and making rational decisions based on their findings.

Keywords

classification techniques, C4.5 algorithm, middle coupled architecture, PostgreSQL DBMS

PDF

Author Biography

Ricardo Timarán-Pereira

Doctor en Ingeniería énfasis Ciencias de la Computación,Master of Science en Ingeniería, Espcialista en MUltimedia Educativa, Ingeniero de Sistemas y Computación. Profesor Titular del Departamento de Sistemas de la  Facultad de Ingeniería de la Universidad de Nariño. Director grupo de investigación GRIAS

 


References

  1. R. Timarán, “Arquitecturas de Integración del Proceso de Descubrimiento de Bases de Datos con Sistemas de Gestión de Bases de Datos,” Revista Ingeniería y Competitividad, vol. 3, no. 2, pp. 45-55, 2001. https://doi.org/10.25100/iyc.v3i2.2327
  2. PostgreSQL Global Development Group, PostgreSQL 15.3 Documentation, 2023. https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-US.pdf
  3. J. M. Hellerstein et al., “The MADlib analytics library or mad skills, the SQL,” Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 1700–1711. 2012. https://doi.org/10.14778/2367502.2367510
  4. A. Carrigan, J. Torres, MinsDB, 2023. https://docs.mindsdb.com/what-is-mindsdb
  5. Y. Robles, A. Sotolongo, “Integración de los algoritmos de minería de datos 1R, PRISM e ID3 a PostgrSQL,” Revista de Gestão da Tecnologia e Sistemas de Informação, vol. 10, no. 2, pp. 389-406, 2014. https://doi.org/10.4301/S1807-17752013000200012
  6. C. Castro, M. Cabrera, R. Timarán, MateKDD: una herramienta de minería de datos medianamente acoplada con PostgreSQL, 2023. http://grias.udenar.edu.co/grias/?p=239
  7. A. García-Tembleque, “Implementación de Algoritmos de Aprendizaje Automático para Big Data,” Grade Thesis, Universidad Carlos III, Spain, 2017. https://e-archivo.uc3m.es/handle/10016/27534
  8. A. Sotolongo, pgsmtp: enviando correos desde PostgreSQL, 2018. https://anthonysotolongo.wordpress.com/2018/05/28/pgsmtp-enviando-correos-desde-postgresql/
  9. D. Rotiroti, PostPic: A PostgreSQL extensión for image-processing, 2023. https://github.com/drotiro/postpic
  10. C. Díaz, “Extensión basada en R para graficar en PostgreSQL,” Grade Thesis, Universidad de las Ciencias Informáticas, Cuba, 2014. https://repositorio.uci.cu/jspui/bitstream/ident/9246/2/TD_07692_14.pdf
  11. A. Azevedo, M. Santos, “KDD, SEMMA and CRISP-DM: a parallel overview,” in Proceedings of IADIS European Conference on Data Mining, 2008, pp. 182-185.
  12. J. Hernández, M. Ramirez, C. Ferri, Introducción a la Minería de Datos, Editorial Pearson Prentice Hall, Spain, 2005.
  13. J. Villena, CRISP-DM: La metodología para poner orden en los proyectos de Data Science, 2016. https://data.sngular.team/es/art/25/crisp-dm-la-metodologia-para-poner-orden-en-los-proyectos-de-data-science
  14. J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Third Edition, Burlington, MA: Morgan Kaufmann, 2011.
  15. R. Timarán, M. Millán, “New algebraic operators and SQL primitives for mining classification rules”, in Computational Intelligence, USA, 2006, pp. 61–65.
  16. K. Sattler, O. Dunemann, “SQL database primitives for decision tree classifiers”, in Proceedings of the tenth international conference on Information and knowledge management, 2001, pp. 379–386.
  17. J. R. Quinlan, C 4. 5: Programs for Machine Learning, Morgan Kaufmann Publishers. San Francisco, 1993.

Downloads

Download data is not yet available.