Keywords

1 Introduction

In view of the considerable improvement of material living standard in recent years, designers begin to pay more attention to emotional and spiritual elements in their products and services. The major consideration of user experience design, or UED, is to create satisfying, aesthetic and innovative products which constantly meet user’s needs and even lead the trend of modern lifestyle. Therefore, it is important for designers to understand user needs and further translate them into appropriate products. In the age of the Internet, the presence of blogs, forums, wiki, SNS and RSS combining with newly developed theories such as Six Degrees of Separation and the Long Tail, has made user knowledge into an open, complex and adaptive system. In the current web environment, there is an increasing diversity in the representing forms of user knowledge, while users usually feel easy to accommodate this situation. The problem is left to designers on both acquiring user knowledge and constructing corresponding systems.

The key of user research is mining the needs buried deeply in users’ mind through their language and daily behavior. Traditional methods, including questionnaire, interview, observation, focus group and persona, achieve the goal through behavior observation and carefully designed conversation. Designers are required to have empathy and an open mind throughout the process. Otherwise, bad expressions may lead to different or even opposite answers, deviating from user’s reality.

To certain extent, traditional methods reveal user needs, but suffer from poor efficiency and non-negligible influence of mood and environment. Hence, they are not suitable for researching on massive users. On the other hand, the original knowledge produced by users themselves better expresses their real thought. Big data technology has made it possible and cheaper to study large groups of users. Till now, it is frequently used in many fields like finance, online business, healthcare, social security and smart city, comparatively rare in that of design.

Data mining can be a new aspect for extending the study of user experience and user knowledge. This paper describes how to dig for user knowledge and understand their needs by large-scale data searching and image content analysis technologies and finally construct user knowledge system which ensures excellent user experience. The methods described in this paper are also good references to other design research.

2 Methodology Description

2.1 Overview

This paper mainly elucidate how we apply image feature recognition and content analysis technologies to obtain research variables, which are later estimated by statistical calculation, in order to acquire user knowledge and construct corresponding system. The detailed research process is as follows:

  • How to acquire user knowledge? When using certain products or services, users would exchange information (namely words, images and voice) and this information could be recognized as “user knowledge” since they directly reflect users’ demands. For instance, users of photo sharing social websites interact with each other by uploading images, clicking “like”, commenting and reposting. In the process of this type of interactions, users undoubtedly leave “internet footprints” as a part of user knowledge, which manifest their attention and preference.

  • How to acquire users’ footprints? In short, one could apply respective techniques to figure out the footprints left by users. For example, equipped with public programming interfaces exposed by relevant websites (e.g. WeChat API) and web crawler programs, one is able to get users’ information such as images, texts, and voice, under certain agreement of privacy. The emerging of new technologies fulfills the purpose of image analysis, broadening the area of information capture and analysis.

  • Analysis methodology and tools. Three main methods have been exploited, including image feature identification, content analysis and statistical calculation.

2.2 Details of Three Methods

Image Feature Recognition. Three particular tools fall into this category.

Analyzing tools for color spatial distribution. Based on pixel RGB values of sample images, this tool generates color spatial points and conducts clustering and dimension-reduction processing through vector calculation and principal component analysis. The result can help researchers analyze variation in color characteristics of samples from different users.

Extracting tools for sample dominant color tone. Based on the calculation of pixel color features, this tool respectively generates the entire color constitution, by which the dominant 80 % colors of raw samples can be represented (Fig. 2). After that, it will conduct batch processing analysis and generate a form for each sample, manifesting its dominant color tone for following analysis of multi-dimensional color deviation (Fig. 1).

Fig. 1.
figure 1

Extracting tools for sample dominant color (Color figure online)

Fig. 2.
figure 2

Analyzing tools for the similarity of sample dominant colors (themeDistComputingTool_v1).

Analyzing tools for the similarity of sample dominant color tone. Depending on sample dominant color tone data, this tool calculates the dominant color tone similarity between each pair among 574 samples and generates csv format files as the input of statistical calculations in MDS analysis.

Content Analyzing Technology. Content analysis is a technology which analyzes the content of samples and generates a structured variable system to describe these samples by means of tags. The tags demonstrate the category and order description of the samples, in order to support future statistical analysis and search for similarity or differences.

Base on the overall analysis of samples, several descriptive variables have been proposed and labeled. In the scope of this research, all labels fall into one of the following six categories: picture type, picture theme, composition, means of expression, light and shade, image style.

Next, we introduce the notion of matrix of metrical data which is by definition a table for managing samples and corresponding variable labels. All assignment of values to variables results from combination of image feature and artificial labeling. Based on this matrix, all data is imported into SPSS after necessary normalization for next descriptive statistical analysis and advanced calculation.

Statistical Calculation. Statistical calculation provides a way to discover the internal relation between objective elements shown by pictures and subjective recognition of users, by means of clustering, multi-dimensional analysis and some other tools.

Correspondence analysis is the main statistical method used in this research. The connections between variables are represented graphically by interaction summary table. This analysis technique is suitable for situations with many qualitative variables in which connections between these variables of different categories is to be established. SPSS is a prevalent software for this kind of analysis.

Nowadays correspondence analysis is widely used in early-stage concept designing, in areas of developing new product, market positioning and advertisement. It has become an important tool for designers and market researchers to solve the problem of evaluating product property, competitor and targeting market.

3 Case Study of Photo Sharing Websites

Benefited from massive data mining technology, we selected a popular use case to launch our study which concentrated on constructing user knowledge of photo sharing websites and further analyzing the needs and psychological features of their active users.

Many user actions can be regarded as the process of producing user knowledge, including uploading photos and social operations such as clicking a like, commenting and reposting. In this scenario, user knowledge lies in the images, text and user actions. Although text usually indicates the exact thought of users, understanding the meaning by programming is very hard and most importantly text cannot reflect the relation between the image itself and users’ judgement on it.

After careful consideration, the popular images in photo-sharing websites were chosen as the main object for studying, fulfilling the purpose of mining information apropos to images itself, user preferences and their relation.

3.1 Selecting Target Website

There are many well-known photo-sharing websites including Instagram, Lofter and Flickr by Yahoo. We finally chose Flickr after comparing the foundation date, number of users and some other aspects. Flickr is an image hosting and video hosting website and the web services suite was created by Ludicorp in 2004, acquired by Yahoo in 2005. It offers preeminent services including picture uploading and storing, classification, tagging and searching. Users need to fill in their profiles after registration and the profiles can help us in future study.

In the uploading process, users are required to give the picture a title, a description and some tags. For managing photos more effectively, users can create “set”, which is similar to a photo album.

Users of Flickr have various background, from professional photographers to PS amateur. All of them enjoy uploading their favorite photos, adding tags and descriptions and creating sets for them. Social operations are even more popular since everybody loves discovering beautiful pictures and grabbing attention of others reflected by the number of like and comments. The feature of a particular user can be revealed by the pictures s/he likes and hottest pictures manifest the inclination of most users. As a result, these hottest pictures provide us an effective way of getting the features we are studying, analyzing user disposition and finally construct user knowledge system of the website. The purpose of this study is exploring the type and features of popular pictures shared by Flickr users and describing their behaviors in Flickr (Fig. 3).

Fig. 3.
figure 3

Flickr website

3.2 Process of Research

Flickr holds an annual show named “best shot”, selecting the most popular pictures of that year. We selected pictures from “2015 best shot” to narrow down the sample domain. Totally 574 pictures were filtered out through our crawler programs because they receive more than 99 comments or likes.

Based on previous state-of-the-art studies, we divided all labels into 6 categories.

  • Picture type: daily; documentary; black and white; art; portrait; landscape; abstract; report;

  • Picture theme: natural scenery; animals and insects; flowers and plants; still-life objects; character portrait; cultural construction; scene of stories; light rhythm;

  • Composition: nine-squared; diagonal; symmetry; frame; guide line; dynamic; triangle; photographic subtraction; special angle; repetition; vertical; curve; slash; centripetal; change; S-shape; open type; balance;

  • Means of expression: simplification; choice; comparison; contrast; scenery depth; background; lines; balance; motion; perspective; reflection;

  • Light and shade: backlight; soft light; capture light; appropriate exposure; contrast of exposure level; low angle light source; regional exposure; multicolor contrast;

  • Image style: traditional nostalgic, romantic, solemn and elegant, deep and solemn, easy dial, decorative arts, comparison of cool & warm, open magic, scarce unique, novel and creative, human sensations, rhythm, non-mainstream

In order to synthesize tag information, the matrix should be transformed into questionnaire. Some experts in both design and photography assigned the tags shown above to the 574 samples based on certain principles explored in previous studies.

With the 574 samples and their tags, the matrix of metrical data was established, a measure method previously mentioned. The matrix was being imported to SPSS latter (Fig. 4).

Fig. 4.
figure 4

Matrix of metrical data

4 Result

4.1 Result Evaluation of Image Feature Identification

According to the design of research previously described, the research of image features mainly involves feature extraction of the samples. The extraction job includes:

Make quantitative analysis based on color attributes of the sample (sample pixel RGB value). The main research steps include extracting the dominant color tone. According to the specific features of samples, the composition of the picture usually differs in many ways. Some of them possess a conspicuous dominant color tone while others are composed of many colors. Whatever, the number of dominant color tones of certain sample is able to represent 80 % of its color information.

The representative color tone of samples is evolved from all dominant color tones, which is used to analyze similarity between samples.

The distance between the color tones, which occupies relatively larger proportion of dominant color tones, is calculated based on the composition of each sample.

Figure 5 illustrate the similarity of the positioning of color space, based on our calculation and analysis.

Fig. 5.
figure 5

The similarity of the positioning of color space. (Color figure online)

Figure 6 illustrate the similarity analysis of dominant color tones, by the MDS multi-dimensional scaling function of themeDistComputingTool_v1

Fig. 6.
figure 6

Theme Color Position-1.

In Fig. 7, it is obvious that all of the samples shows remarkable patterns on positioning distribution of dominant color tone similarity. Based on the distribution of scattered plots, a two element regression equation is obtained by two order curve fitting:

Fig. 7.
figure 7

Theme Color Position-2 (Color figure online)

$$ {\text{y}} = - 0. 2 + - 0. 2 7*{\text{x}} + 0. 5 3*{\text{x}}^{ 2} $$

To make the distribution pattern of the result more easily determined, researchers supplement information for Fig. 8 and 574 dominant color tone palette which are also positioned to the corresponding scattered positions.

Fig. 8.
figure 8

Picture theme

We found that despite the differences in properties and content among the 574 samples, a significant pattern exists in the features of visual cognition of dominant color tones. The pattern was represented by the mild gradient of brightness from darkness on the left to brightness on the right. However, no obvious pattern was recognized in vertical dimension. In addition, the significance of saturation in center and center-right areas in the U-shape curve area is higher than that in other areas.

To sum up, it is convincing that the 574 samples primarily reflects differences in saturation and color temperature in terms of color properties, based on the result of color space positioning analysis and dominant color tone similarity MDS analysis.

4.2 Result Evaluation of Statistical Calculation

Recall previous discussion, correspondence analysis is the main method in this research. The location map analysis, resulting from 574 samples in all dimensions, is discussed below. Among all the dimensions, abundance of color tones is particular interesting so that the first part of this section makes a comparison between it and other dimensions while the second part discusses results within the other dimensions.

Abundance of Color Tones Compare to Other Dimensions

Picture Theme. Picture Theme The sig value is 1.000a, which indicates that there’s no significant relation between picture theme and tone abundance. No typical pattern is recognized in the distribution of the sample from different topics. In addition, the theme of still life objects is rare in the sample.

Composition. The sig value is 1.000a, one can see that most types of the composition is in a relatively concentrated manner while the diagonal type and curves type are relatively rare (Fig. 9).

Fig. 9.
figure 9

Composition

Means of Expression. In this figure, excepting the line type, the performance is similar in the majority of the sample (Fig. 10).

Fig. 10.
figure 10

Means of expression

Light and Shade. The sig value is 1.000a. There is no obvious correlation between lighting and tone abundance in this dimension. Meanwhile, low angle light source is more unique due to the special angle (Fig. 11).

Fig. 11.
figure 11

Lights and shade

Image Style. The sig value is 1.000a. Image style and tone abundance have no significant correlation. However, the rhythm is relatively rare (Fig. 12).

Fig. 12.
figure 12

Image style

Results Within Other Dimensions. Overall, three common features were found through all 574 samples. Firstly, in terms of the type, pictures about scenery or daily lives ranked the highest; then follows art, documentary and portrait; report and abstract had the least quantity. Secondly, for the composition, most samples were showed in a way of nine-squared or symmetry, which is associated with human aesthetic physiological characteristics. People like pictures which are concisely composed with a certain guidance or restriction, such as radial line, leading line, diagonal, or frame. The third common feature lies in image style. The most popular pictures are usually unique and relaxing. Nostalgic, romantic, solemn, aesthetic and novel ingredients are welcome as well. In contrast, popular pictures are scarcely in themes of rhythm, contrast or humanity.

The four results of specific analysis are shown in following figures.

Picture Type Compare to Image Style. The correspondence analysis of picture type and styles, with 574 effective samples and Sig value zero, indicating that there is a significant correlation between the type and the style.

The common aesthetic taste of inclining scenery and daily type of pictures was very likely being developed along with the evolution of human beings. Analysis of this type indicates that ancient prairie scenery, composed by fresh grass, low jungles and winding streams, gives comfortable and congruent feelings to people living in nearly all places. People often find senses of identity from documentary and portrait paintings, making it the second popular type. Abstract pictures are only appreciated by a small group of people (Fig. 13).

Fig. 13.
figure 13

Picture type&Image style

The result also shows that there’s a common mapping between image content type and means of expression. Sceneries are normally expressed through romantic, solemn, elegant or temperature contrasting styles, portraits by nostalgic and black-white ways and artistic pictures by decorating, novel, open magical ones.

Composition Compare to Image Style. In the correspondence analysis of this comparison, 562 effective samples leaded to a sig value of 0.005, suggesting a significant connection between image style and composition (Fig. 14).

Fig. 14.
figure 14

Composition&Image style

In the history of human aesthetic, nine-squared and symmetric have occupied their place in composition. Famous historical buildings, from Gothic to Chinese style, are designed to be strictly symmetric. Centripetal, guide-line, diagonal and frame are also prevailing metamorphism of symmetric.

The paring of romantic with symmetric, traditional with vertical, nine-squared with temperature contrast, can serve as a good reference for future composition designing.

Light and Shade Compare to Image Style. Scarce unique and easy dial are the two most welcome styles. The pessimistic nature of deep and solemn and the direct definition of non-mainstream causes the lack of attraction to the majority (Fig. 15).

Fig.15.
figure 15

Light and shade&Image style

Considering both dimensions, there’s significant relation between backlight and solemn, capture light and temperature contrast, regional exposure and elegant. Appropriate exposure is suitable for many styles, including romantic, human sensations, traditional nostalgic and easy dial.

Composition Compare to Light and Shade. Soft light pictures typically adopt expressions of S-shape, triangle, open type and centripetal. Diagonal and guide-lines are mostly used in photographic subtraction, while appropriate exposure in balance. Soft light and contrast of exposure level are totally opposite shown in the figure, indicating the thorough difference (Fig. 16).

Fig. 16.
figure 16

Composition&Light and shade

5 Conclusion

By extracting features of the sample images, analyzing the contents of semantic tags, looking for common features in popular images which hold relatively high degree of users’ attention, and studying the corresponding relationship between each label; this essay tends to figure out why users are paying more attention to landscape images. In addition, users favor composition balance, nine-squared format, with proper exposure, backlight or the way of capturing light. Besides, users also prefer the traditional nostalgia, deep dignified black and white photos or portraits; Photos they like range from lyrical romantic, lively, unique landscape to the daily theme; Over and above, users are also interested in innovative photos as well as open magic art photos.

These findings are significant for the construction of photo sharing site user knowledge. In the future, against such users who like sharing photos on these photos sharing websites, you can understand the relationship between the key themes of their favorite pictures, the composition and expression, light and shadow, style and tone. Designers can learn the preferences and needs of such users through first-hand detailed and reliable data to apply to other designs designed for this kind of user.

In this study, the method used is construction of user knowledge system by analyzing user behavior among those who like sharing pictures. This method can also be used in many other aspects of the behavior of keywords. For example, in the field of advertising communication, product packing design and all other users knowledge mining areas related to pictures.

In this study, the construction of the user knowledge mining method is different from the traditional method of user experience. As a result, it can be used in many aspects and fields to establish the user knowledge system based on general characteristics of different users’ needs, concerns, and thus facilitating designers’ working process. When identified certain feature of the keyword behavior of the user, designer can quickly draw from the user knowledge bank to find effective and usable research data for reference to aid their design decisions.

Mining and Construction of such a user’s knowledge system can be time-consuming in the early stage. However, once the user knowledge bank has been set up, it will not only facilitate the designer to effectively understand the needs of users and help decision-making, but also makes it easier for multiple designers in one single design projects to understand the common goal. In this way, the design consistency among several designers can be ensured and it saves designers time in reducing communication costs and in the end largely improves the communication quality.

This study mainly introduces the user knowledge, image mining method. What remains to be analyzed is the construction of other points of the user knowledge, such as text and sound. It is an area which still worth further studying and forms general research methods and theories. These aspects can be used as subsequent supplementary research for user’s knowledge system construction.

A well-established user database is built on both the traditional method and the innovative new one. Getting to understand users’ need from multi-dimensional perspective of big data method as well as the traditional way of conducting interview, survey and focus group seems to be the new trend. However, this essay deems that the new method of construction is fundamental to this trend while combined with the traditional method will make it better.