Adaptive Model of Classification of Professions in Vocational Guidance Systems

Vocational guidance is part of psychosocial development and is understood as a method that helps to determine the most appropriate profession according to the aptitudes and abilities of the student. The processes of vocational guidance are dynamic and focus on educating and favoring the decision-making process in the professional choice for a learning pathway throughout the student's life, which will benefit society in the long run. Most of the current solutions, both theoretical and applied, from Europe and North America differ when used in the Colombian context, mainly for adults, since the process of classifying professions is not accurate nor precise. In addition, there are various educational projects and evaluation systems in secondary education level institutions. At this level, the students have a changing vocational choice which implies taking into account specific characteristics of the context, also, the student profile vocational guidance determinants. The objective of this article is to describe the adaptive model of occupational classification integrated into the Intelligent Web Platform used in educational institutions in the Department of Cauca. The use of the CRISP-DM methodology allowed finding the Naive Bayes and Deep learning algorithms as those with the best performance in the classification of professions.


I. INTRODUCTION
Vocational orientation is a process that allows a person to find the profession that will bring the greatest satisfaction and benefit, being the choice of career one of the most important decisions in life [1].There are several determinants in a student's vocation, among which are: (a) personality, (b) interests, (c) self-concept, (d) cultural identity, (e) social connections, (f) role models, (g) economic resources, (h) parents' educational level, etc., which when recognized, become fundamental elements to guide the student in their learning and contribute to an appropriate career choice [2].
The process of vocational and career guidance has evolved in the world according to the different economic schemes specific to each era [3].Globally many solutions have been proposed.The main objective is to facilitate the career choice for students [4].In [5], the importance of having vocational guidance models that link different instruments for the collection of information, addressing various dimensions such as (i) self-knowledge, (ii) the world of work, and (iii) academic offerings, is highlighted.
Additionally, it is considered essential to link artificial intelligence (AI) techniques to classify professions according to their profile and provide adequate feedback on career decisions [6].
The design of a knowledge base in [7] stores information about careers, experiences, and skills, and [8] uses fuzzy logic to select the most suitable professions according to the profile.Other solutions [7], [9] use recommender systems that link AI techniques and seek to offer careers based on the students' interests and skills.
In Colombia, the Ministry of National Education has published orientation methodological guides [10] as an "informative" process or guidance for students who evidence a conflictive situation regarding their professional choice; however, no results of the effectiveness of this strategy are shown.[11].
Other works have focused on neural networks using multiple intelligences as inputs and professions as network output [12], [13].In addition to multiple intelligences, in [13], differential aptitudes and professional interests tests are used.[14] presents an expert system whose knowledge base uses interests, aptitudes, and skills; however, there is no evidence as results related to classifying professions.In [8], three algorithms are used: 1. Decision trees, 2. K-nearest neighbor (KNN), and 3. Supported vector machine (SVM), where the SVM showed the best result of the classification of science, technology, engineering and mathematics STEM professions with a percentage of 72% for 142 records in the dataset.In [15], a dataset is outlined by the authors, taking into account academic aspects and using SVM and Artificial Neural Networks (ANN) as algorithms to classify professions.
They achieved a performance of 84% of accuracy in the classification, the SVM algorithm offered the best results.Despite many solutions developed to support vocational guidance processes, limitations or challenges are still evident, which should be addressed with new research projects.Such limitations and challenges include: 1. Scarce implementation of vocational guidance models in real contexts; 2. The absence of vocational guidance models, which consider the dynamic character of the student's profile; 3. High level of uncertainty in the vocational decision due to low levels of accuracy in the prediction of professions; and (iv) lack of vocational guidance solutions for the high school context.Therefore, and considering the importance of guaranteeing a compelling orientation to students who are in the transition from high school to higher education, it is necessary to have counseling mechanisms and effective tools that allow recognition of competencies and skills to provide adequate guidance that contributes to a better career choice and, ultimately, to the personal and professional development of students.
Thus [16] presents in general terms the development of an intelligent web platform to support the process of vocational guidance in the context of secondary education.This article claims to describe the adaptive model of profession classification, developed and implemented to be used on an intelligent web platform to support vocational guidance processes in some educational institutions of the Department of Cauca.
The structure of this article is as follows: Section I presents some related works, highlighting mainly their strengths and weaknesses.Section II describes the methodological process for the development of the proposed solution.Section III describes the results obtained following the stages of the selected methodology.
Finally, conclusions and future work are presented.
In recent years, several computational tools have been developed to facilitate the vocational guidance process.From the mechanisms described in [17], [18], [19], and [20], playfulness is used to motivate the student to perform the vocational orientation process autonomously.However, the results delivered to the student are very broad, which does not contribute to the reduction of their uncertainty about their vocational decision.
In [21], speech and text recognition were used to facilitate interaction with the user of the vocational guidance system.However, the evaluation of the results shows an accuracy of 37% in predicting professions.[22] uses the internet of things (IoT) to propose the design of a guidance system in enterprises, which, with the use of radio frequency, allows students to see the compatibility of their skills with a specific job.This research does not show the levels of accuracy achieved during the guidance process.
In [6], a chatbot is used for virtual vocational guidance processes supported by DialogueFlow technology which applies a Google cloud application interface.It uses a dataset with attributes of activities, skills, occupations, and personality and proposes to give feedback to the student using Facebook chat.Similarly, in [23], Facebook and the internal Wit.ai application interface were used to convert voice responses and store them in the dataset.However, the two proposals do not evidence the effectiveness of the automated guidance process.
Other researches focus on the use of AI to support vocational guidance processes.
In [24], fuzzy logic was used; however, the authors focus more on IT career classification.In [7], a vocational guidance prototype is presented, using cloud computing to integrate educational institutions and companies.In [25], ANNs are used to support the career decision-making process, and in [14], the recurrent neural network uses long short-term memory (LSTM) blocks to predict new jobs for adults.
The use of ontologies and expert systems in [26] and [27] describe an expert system using fuzzy logic as in [28].In these three works, low classification percentages are shown, with a maximum of 70% accuracy [29] and [30]  In [9], [26], and [27], statistical and data mining models are used, with fixed architectures having specific characteristics of the vocational guidance process [26], [27] describes the use of Bayesian networks (Naïve Bayes NB), one rule Oner algorithm, and judgment repeated incremental pruning (JRIP), and case-based reasoning classifier (CBR).In [9], a mobile application that uses NB is proposed and validated in the middle school context.[31] suggests a profile evaluation related to college students' career choices using knowledge discovery in databases (KDD) to identify patterns in large amounts of data.This work evaluates the algorithm's performance using datasets with different numbers of records.
Finally, in [6], Multiclass Neural Networks (MNN) are used as the best classification algorithm with an accuracy of 90% for a data set of four attributes: activity, occupations, skills, and personality.
It is worth mentioning that online vocational guidance alternatives are a complement to face-to-face guidance [32].Online vocational guidance communities help the student validate the results of face-to-face guidance, with the advantage of easy access and the diversity of knowledge received from other people.In [33], a web interface is presented for vocational guidance with different guidance alternatives and age ranges; however, the tool's accuracy in aspects of career classification is not shown.
In [34], a web platform is shown that using games motivates students to identify their skills and profiles through challenges that allow them to know themselves (weaknesses, strengths, and fears), project their career aspirations, and share and compare their results.This platform, as in the ludic web platforms [15], [17], [19], also does not show results in terms of the accuracy of the orientation of its users.
A general review of several vocational guidance web platforms in [35] shows these platforms adopted features like text-based content, videos of professionals from various areas or occupations, online tests, social networks, vacancy search agents, and job recommendations.However, most of these publications do not have an internal evaluation in which users can validate the platform's contribution to their vocational orientation.
In [36], some essential elements identified in the vocational orientation process are used in different research projects, which mainly include: 1. Professional interests, 2. Personality, and 3. Skills.In addition to the above characteristics, in this research, the following aspects are included: social, economic, academic, and family.These elements are adopted in the web platform proposal.[37] suggests some general characteristics that should be included in a vocational guidance website, which are grouped into the quality of service and perceived quality of information.The mentioned factors are taken into account in the proposed platform, which were validated through a usability survey conducted on a sample of students.

II. METHODOLOGY
The aim is to describe the adaptive model aspects for occupational classification described in [16] and identify the best algorithms to include in the intelligent adaptive web platform for vocational guidance.Below are described the associated methodological processes.Mining, CRISP-DM) was used [38], which considers the following stages: understanding the business, understanding the data, data preparation, modeling, evaluation, and deployment.Holland's validated theory of career environments RIASEC [39] was also used to identify students' career interests and formalize the process of gathering and analyzing information related to students' vocational decisions.
The CRISP-DM methodology, in conjunction with the RIASEC theory, was used to collect information on students' professional profiles through a web interface and, subsequently, consolidate the information into a data set of 628 records, with which it was possible to identify the best classification models for the particular characteristics of the educational population.

A. Bussiness Understanding
The first phase, understanding the business, proposed in the CRISP-DM methodology, was achieved by systematically reviewing policies, standards, theories, methodologies and tools to establish the RIASEC theory as the most practical and understandable, as well as desirable characteristics to be included in a vocational guidance platform.

B. Understanding and Preparing the Data
With the use of a web survey and the design of a data set directly related to the identified characteristics of the vocational orientation process, we proceeded to the second phase of understanding and preparation.A data set was created in the beginning, according to the RIASEC theory, with variables for each professional and occupational environment in which a student can be classified: Realistic, Investigative, Artistic, Social, Entrepreneurial, and Conventional.In addition, variables of age, gender, and professional expectations of the students were included.Figure 1 shows the correlation degree of RIASEC variables, age, and sex, with the variable binomial classification (engineering profession).

C. Modeling
Initially, a Naive Bayes (NB) algorithm was evaluated to create the first model, which showed low performance in accuracy and training time for multi-valued classification variable datasets.The results improved changing to a binomial variable.Later, in a comparison of up to five models (NB, linear logistic regression model (generalized linear model, logistic regression), large fast margin, and deep learning and decision tree), Naive Bayes was found again with higher accuracy and lower training time.In the comparison, two tools, Weka and Rapidminer, were used to determine the stability of the model in different datasets and sizes, starting from a stable model using the parameters of accuracy, precision, and completeness, as well as classification error and classification time.Datasets were projected to evaluate the algorithms' response according to the characteristics of the initial data.

D. Evaluation
In the evaluation phase, the first data set was analyzed using NB, Linear Regression, Logistic Regression, and Deep Learning models.The results are described in Table In this initial result, the dataset used the classification variable "the profession" with ten different values of professions to address the expectations in professional choice on the part of the students.Due to the low accuracy obtained, as a result, this multivalued variable was changed to a binomial value variable (YES/NO) specifically for the area of professions of engineering, architecture, and urban planning, and using the algorithms of NB, Generalized Linear Model, Logistic Regression, Deep Learning, and Decision tree.The results received are recorded in Table 2.
With this new data set, the results were analyzed in terms of accuracy, precision, and completeness, receiving a shorter training time and an acceptable performance for NB, which allowed the classification of professions based on the professional interests of the students (Table 2).Subsequently, and taking into account the characteristics of data independence, which the NB model presupposes, it was decided to generate a data set with a more significant number of unique records (38,399) among the variables used in the data set for the classification process.These data were generated taking into account the restrictions in score values for each of the environments (from 0 to 30), as well as the values of age, ranges found in the data of the surveyed students (aged 12 to 23 years) in the initial data set.The results described in Table 3 show that the values of accuracy, precision, and completeness decreased.A new data set was created to improve these results by attending to the resulting more stable model given in Table 2, for which, in addition to using the restrictions on the data boundaries, the production model was attended to, and the influence of each of the variables in the data set on the classification variable (the profession) can be seen in Table 4.The strongest correlations were taken into account, in addition to the continuity in the highest values for groups of three contiguous environments according to the RIASEC hexagon [40].With these new model considerations, a new dataset was created with 1,048,561 records, and the results are shown in Table 5.
A new dataset created with 1,048,561 records following the characteristics of the model was re-evaluated, increasing its classification percentage to 83% accuracy.Other variables were added to this model to determine its stability by increasing the attributes of the dataset (personality, socioeconomic) to verify the impact on the classification.As described in the values in Table 5, it is found that Naive Bayes is still a stable algorithm with a shorter training time.It is also visualized that, for a larger volume of data, Deep Learning is projected as a better alternative for classification processes of the data corresponding to students of the specific context.This indicates that the intelligent web platform used the best-performing algorithm allowing more efficient results during the adaptation process, considering the number of records.

E. Deployment
In the deployment phase, the evaluated classification model was enriched by integrating two models, NB and Deep Learning, linked to the web platform.Figure 2 abstracts its operation for understanding, and Figure 3 describes the components that integrate it.This configuration ensured the platform's adaptability in an intelligent way by allowing the automatic update of the classification model, taking into account the previous choices of the students and, with increases in the size of the dataset, ensuring optimal levels of accuracy, precision, and completeness.
Finally, the web application was developed taking into account features found in other tools proposed for vocational guidance, both in the functional part and in usability aspects, and in the developed web application, the adaptability functionalities that make use of the mining module and the administration module were added.In terms of functionality, the web platform allows users to register a new account by entering their data or using their account on Facebook.Moreover, users can be created directly by the system administrator.Once the profile is completed by entering family, demographic and economic information, it is possible to access the survey module where professional interests, personality, learning styles, multiple intelligences, and aptitudes (as the determinants of vocation) are evaluated.
Then, the mining module performs the adaptive classification process and generates the associated prediction according to the assessed profile.The integrated statistics module into the platform allows the system user and the administrator to view results and changes over time, allowing for comparisons and sequential and online monitoring within the institution (Figure 4).
The adaptive model considers the characteristics of the student profile related to the determinants of their vocation, including 1. Sociodemographic characteristics, 2.
Aptitudes, and, additionally, the previous study expectations and preferences by subject, for which Figure 4 shows the statistical results of one of the preferred subjects per institution.Regarding the prediction of professions, the professional branches suggested by the Ministry of National Education, according to [41], are taken into account; they are 1.
Engineering architecture urbanism and related, 2. Agronomy veterinary and related,  To ensure the student's atention in the surveys, the platform integrates playful challenges, presenting images associated with the test questions of the underlying theory for selection and taking into account the average time of students who answer each section of questions in a concentrated manner, the student acquires a positive point when selecting the images or loses if it takes more than the average time.In addition, it encourages their attention and motivation, making the process more objective.Figure 7 shows the results after the classification process has been carried out.In the example, a positive prediction process is evidenced for the branch of engineering, architecture, and related fields, with an accuracy percentage of over 80%.
a. Participants: 628 students belonging to an official secondary education institution in the city of Popayán whose sociodemographic characteristics are: females 48.7% n= 306, males with 51.3% n= 322; age range between 12 and 23 years with an average of 16.03 years and with a standard deviation of 1.94; socioeconomic stratification levels are1, 2, and 3. b.Measures: criteria of accuracy, precision, and recall were used to compare six models to choose the best ones for classifying and predicting professions.c.Design: the data mining methodology (Cross Industry Standard Process for Data

Figure 1 Fig. 1 .
Figure1shows that the variable with the highest correlation in the choice of the engineering profession is gender, and the environments of the RIASEC theory that contribute most to the professional decision are the Conventional, Realistic, and Investigative types.

Fig. 4 .
Fig. 4. Statistics by institution of taste by subjects.

3 .
Fine arts, health sciences, 4. Social and human sciences, 5. Economics administration accounting and related, 6. Mathematics and natural sciences, and 7. Education sciences, among others.The model includes two algorithms, Naive Bayes and Deep Learning, which adapt automatically according to the number of records.The model uses supervised learning to provide a better classification variable whose categories are the professions to offer a better classification percentage.

Fig. 8 .
Fig. 8. Results of self-knowledge and career suggestion tests.

Table 1 .
Evaluation of algorithms with multi-valued classification variable.

Table 2 .
Evaluation of algorithms with binomial classification variable.

Table 3 .
Evaluation of algorithms, and data set using random data.

Table 4 .
Correlation of variables with class variable "profession" of binomial classification.

Table 5 .
Evaluation of algorithms, data set using production model, correlation and RIASEC of three environments.