In this section, we present the proposed decision support system for the designers’ creative processes. The system is developed in such a way to be able to model the designer’s preferences automatically and be user-friendly at the same, in order to be easily handled by individuals without knowledge of the action planning research field. The system is composed of two interconnected components:
-
1.
Offline component: This component performs (a) data collection from internal and external sources, (b) data storage and management to Databases, and (c) data analysis processes that produce the artificial models which provide personalized recommendations to the end-users.
-
2.
Online component: This component comprises mainly the user interface (UI). The users, who are usually fashion designers with limited technical experience, are able to easily set their parameters via the graphical UI, visualize their results and provide feedback on the system results.
The overall system architecture is depicted in Fig. 1, whereas the major subsystems/ processes are further analyzed in the following subsections.
2.1 Data Collection
There are two different sources used for training, as well as for the recommendation process: the internal and external data sources.
Internal Data.
Each company has its own production line, rules and designing styles that are influenced by the fashion trends. The creativity team usually use an inspiration or starting point based on clothes coming from the company’s previous collections and adapt them to the new fashion trends. The internal data are usually organized in relational databases and can be reached by the Data Collection subsystem.
External Data.
The most common designers’ source for new ideas is browsing on the collections of other popular online stores. To this end, the system includes a web crawler, the e-shops crawler, which is able to retrieve clothing information, i.e. clothing images accompanied by their meta-data. The online shops that are supported so far are Asos, Shtterstock, Zalando and s.Oliver.
Another important inspiration source for the designers are social media platforms, especially Pinterest and Instagram. To this end, a second web crawler, the social-media crawler, was implemented, which is able to utilize existing APIs and retrieve information from the aforementioned platforms, including clothing images, titles of the post, descriptions and associated tags.
Both crawlers’ infrastructure is extendable, so that they can be easily used for other online shops or social media platforms in the future.
2.2 Data Preprocessing
This subsystem is responsible for extracting the clothing attributes from the meta-data accompanying every clothing image. Some of the attributes that are extracted from the available meta-data, accompanied by some valid examples, are presented below:
-
1.
Product category: dress, overall, pajamas, shorts, skirts.
-
2.
Product Subcategory: jacket, coat, T-shirt, leggings.
-
3.
Length: short, long, knee.
-
4.
Sleeve: short, ¾ length, sleeveless.
-
5.
Collar Design: shirt collar, peter pan, mao collar.
-
6.
Neck Design: V-neck, square neck.
-
7.
Fit: regular, slim.
For each attribute there is a dictionary, created by experienced fashion designers, that contains all the possible accepted values, including synonyms and abbreviations. NLP techniques are used for word-based preprocessing of all meta-data text. The attributes are extracted using a mapping process between the meta-data and the original attributes. The mapping is achieved by finding the occurrences of the words contained in the dictionaries to the meta-data. In the case of successful matching, the corresponding word is marked as a label to the respective attribute.
2.3 Data Annotation
The Data Annotation process complements the Data Collection and Data Preprocessing modules. It is used to enrich the extracted data with common clothing features that can be derived from images using computer vision techniques. Examples of clothing attributes that can be extracted from images include color, fabric and neck design.
It is widely known that color has the biggest impact on clothing, as it is related to location, occasion, season, and many other factors. Taking into consideration its importance, an intelligent computer vision component was implemented. This component has the capability to distinguish and extract the five most dominant colors of each clothing image. More specifically, the color of a clothing image is represented by the values of the RGB channels and its percentage, the color ranking specified by the percentage value and the most relevant general color label to the respective RGB value. The rest of the clothing attributes are extracted using deep learning techniques. Each attribute is represented by a single value from a set of predefined labels.
2.4 Clustering Based on Meta-Data
After the Data Collection and Annotation processes, all the data are available in a common format (row data) that can be analyzed using well-known state-of-the-art techniques. A common technique to organize data into groups of similar products is clustering. Clustering can speed up the recommendation process, by making the look-up subprocess quicker when it comes to significant amount of data. A practical example is a case where a user makes a search at the online phase: the system can limit the data used for product recommendation to those that are included in the clusters characterized by labels related to the user’s search.
Several clustering algorithms can be used depending on the type of the data. Clothing data can be characterized by both numerical (i.e. product price) and categorical features (i.e. product category) in general. A detailed review of the algorithms used for mixed-type data clustering can be found in [7]. The algorithms can be divided in three major categories: a) partition-based algorithms, which build clusters and update centers based on partition, b) hierarchical clustering algorithms, which create a hierarchical structure that combines (agglomerative algorithms) or divides (division algorithms) the data elements into clusters, based on the elements’ similarities, and c) model-based algorithms, which can either use neural network methods or statistical learning methods, choose a detailed model for every cluster and discover the most appropriate model. The algorithms that we use in this paper are as follows:
-
1.
KmodesFootnote 3: A partition-based algorithm, which aims to partition the objects into k groups such that the distance from objects to the assigned cluster modes is minimized. The distance, i.e. the dissimilarity between two objects, is determined by counting the number of mismatches in all attributes. The number of clusters is set by the user.
-
2.
PamFootnote 4: A partition-based clustering algorithm, which creates partitions of the data into k clusters around medoids. The similarities between the objects are obtained using the Gower’s dissimilarity coefficient [8]. The goal is to find k medoids, i.e. representative objects, which minimize the sum of the dissimilarities of the objects to their closest representative object. The number of clusters is set by the user.
-
3.
HACFootnote 5: A hierarchical agglomerative clustering algorithm, which is based on the pairwise object similarity matrix calculated using the Gower’s dissimilarity coefficient. At the beginning of the process, each individual object forms its own cluster. Then, the clusters are merged iteratively until all the elements belong to one cluster. The clustering results are visualized as a dendrogram. The number of clusters is set by the user.
-
4.
FBHCFootnote 6: A frequency-based hierarchical clustering algorithm [9], which utilizes the frequency of each label that occurs in each product feature to form the clusters. Instead of performing pairwise comparisons between all the elements of the dataset to determine objects’ similarities, this algorithm builds a low dimensionality frequency matrix for the root cluster, which is split recursively as one goes down the hierarchy, overcoming limitations regarding memory usage and computational time. The number of clusters can be set by the user or by a branch breaking algorithm. This algorithm would iteratively compare the parent clusters with their children nodes, using evaluation metrics and user-selected thresholds.
-
5.
VarSelFootnote 7: A model-based algorithm, which performs the variable selection and the maximum likelihood estimation of the Latent class model. The variable selection is performed using the Bayesian information criterion. The number of clusters is determined by the model.
2.5 Clothing Recommender and User Feedback
The Clothing Recommender is the most important component of our system, since it combines all the aforementioned analysis results to create models that make personalized predictions and product recommendations. The internal and external data, the user’s preferences, and the company’s rules are all taken into consideration.
Moving on to the online component, the UI enables the designer to search for products using keywords. The extracted results can then be evaluated by the designer and the preferred products can be saved on their dashboard over time and for each product search. If the user is not satisfied by the recommendations, they have the ability either to renew their preferences or ask for new recommendations.
The offline and the online components are interconnected by a subsystem that is responsible for implementing the models feedback process. The user can approve or disapprove the proposed products based on their preferences, and this information is transmitted as input to a state-of-the-art Deep Reinforcement Learning algorithm, which assesses the end user’s choices and re-trains the personalized user model. This is an additional learning mechanism evolving the original models over time, making the new search results more relevant and personalized.