Posted by Admin: System Admin
Gender information is very important for the recommendation system in the online shopping website. However, gender data often face label missing and incorrect labelling problems caused by consuers' unwillingness to actively disclose personal information, which leads to gender estimation results that cannot meet the needs of the product recommendation system. To discover the customers' gender information, we explore the customers' online shopping behavior, especially the items viewed in the shopping session, from the dataset provided by Vietnam FPT Group. The dataset is very imbalanced while the number of female samples is 3_ of the male samples. To address the imbalance issue, we cluster the female samples into three subsets and then train a two-layer classifier model to estimate the customers' gender. Experimental results demonstrate that our proposed method could achieve a combined accuracy 78% on average, and takes less than 6 seconds on average. As a data mining model for gender prediction, our approach has a lightweight network structure and less time consumption. Machine learning is an important component of the growing field of data science. Through the use of statistical methods, different type of algorithms is trained to make classifications or predictions, and to uncover key insights in this project. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. Machine learning algorithms build a model based on this project data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of datasets, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
Chen et al. [11] analyzed the moderating effect of gender on customers' shopping behavior based a benefit risk paradigm model, and found that gender has a significant impact on online shopping willingness. Sohab et al. [12] studied the moderating effect of consumer cognitive innovation on the influencing factors of iTrust (interpersonal trust) on online purchase intention of new products, and found that gender information is helpful for the product display design of online websites. Lin et al. [13] did research on the gender differences of customers' online shopping psychology and behavior, and showed that gender information can promote the improvement and benefits of online shopping websites. Due to the gender information is essential to improve product recommendation performance, some researchers had proposed personalized recommendation algorithms or techniques based on gender information to improve online shopping recommendation systems, for example, Liu et al. [14], Karthik and Ganapathy [15], Hammou et al. [16], Wu and Yu [17] and Liu and Wei [18]. All these personalized recommendation algorithms or technologies provide many references for online shopping companies to improve their online shopping recommendation systems in time. Despite the lacking of research on mining the customers' gender given the unreliable online shopping system gender data, there are many models based on mining customers online shopping browsing log and purchase log data were proposed to estimate the customers' gender. Zhou et al. [36] using the RFMT model to derive 7 characteristic customer clusters from a large dataset retrieved on a global retailer's website, and estimated customers' gender and personalized products preferences by the cluster analysis. Wan et al. [37] used large-scale online shopping transaction log modeling to mine consumer personalized preferences for gender estimation. However, their approach mainly relies on the analysis of the users' click behaviors, and ignore the female personality diversity or male personality diversity, and the samples imbalanced issue, which may not be reliable and accurate. Disadvantages ? The system is not implemented CLUSTERING BASED ON PERSONALITY DIVERSITY. ? The system is not implemented SVM Classifier which is more accurate and exact data measuring method.
_ The proposed system discovers the correlation between personality diversity and gender in online shopping behavior, and explain the characteristics of customer shopping behavior in a specific web browsing log data set. These features are combined into feature combinations as candidate combinations for gender classification. _ The proposed system uses personality diversity and data visualization to solve the problem of sample imbalance in the FPT group's online shopping behavior dataset. Based on the balanced sample set, the optimal classifier is selected for each layer of the designed gender classification network to improve the performance of gender classification. _ The proposed system conducted experiments using a large-scale data set provided by FPT Group and get the estimation accuracy of 78% within less than 6 seconds. The results prove the lightweight and high-efficiency of the proposed gender classification model. Advantages ? In the proposed system, the system implemented A TWO-LAYER GENDER CLASSIFICATION MODEL which is more effective and accurate. ? In the proposed system, it is recommended that future researchers take an effective technique called FEATURE EXTRACTION AND CANDIDATE FEATURE COMBINATIONS.