MRTensorCube: tensor factorization with data reduction for context-aware recommendations

Context information can be an important factor of user behavior modeling and various context recognition recommendations. However, state-of-the-art context modeling methods cannot deal with contexts of other dimensions such as those of users and items and cannot extract special semantics. On the other hand, some tasks for predicting multidimensional relationships can be used to recommend context recognition, but there is a problem with the generation recommendations based on a variety of context information. In this paper, we propose MRTensorCube, which is a large-scale data cube calculation based on distributed parallel computing using MapReduce computation framework and supports efficient context recognition. The basic idea of MRTensorCube is the reduction of continuous data combined partial filter and slice when calculating using a four-way algorithm. From the experimental results, it is clear that MRTensor is superior to all other algorithms.


Introduction
Recommendation systems are powerful techniques by providing users behavior recommendation of various types of potentially interesting products and service.The system's ability to gather information has been enhanced, and recommendation systems based on contextual modeling approach have become popular [1][2][3][4].Most systems require to analyze and manage a large amount of large volume data such as Web and social contextual information over multidimensional.It is becoming increasingly important to make a rapid analysis of large context datasets to enable rapid decision making.
Recent research has focused on integrating contextual information on user-itemcontext (User × Item × Context) and building multidimensional models based on tensor decomposition model [5,6].The tensor factorization is a multidimensional array of conventional matrix factorization techniques by considering interactions between users, items, and context.Many recommendation systems based on tensor decomposition techniques have been proposed to better support situation recognition recommendations and other related analysis tasks [7][8][9].However, since the previous research still operates in main memory, it cannot be extended to compute large tensor data.Each of the multidimensional data holds values aggregated by all possible combinations of dimensional attributes.It takes a lot of time to compute the data cube.
In this paper, we propose MRTensorCube (MapReduce Tensor Cube) designed which reduced cube computation time by sharing sorts cost and input data scan and/or by reducing data computation.To provide faster prediction result, the MRTensorCube consists of MR(MapReduce) framework to constructing distributed parallel computing environments for large tensor data.MRTensorCube is optimized for the MapReduce framework consisting of two MapReduce phases (MRSpread and MRAssemble).The main feature of this algorithm is the continuous data reduction by the partial rectangular parallelepiped and partial cell combination released when the calculation goes through these two stages.This type of data reduction technology enables rapid and efficient calculation of large data cubes.In fact, the distributed parallel processing has a priori to solve problems, such as interconnection of distributed resources, communication processing, and means of fast expansion [10,11,16].
The rest of paper is organized as follows.Section 2 presents the preliminaries to the tensor and motivation of this study.The structure and concepts of the MRTensor-Cube algorithm are description in Sect.3. Section 4 presents experimental results and concludes in Sect. 5.

Tensor data cube
The Tensor data cube is a generic model for multidimensional systems, such as fast prediction computations and simple optimization techniques.The multidimensional rating provides more information about user's preferences that a single-rating system.There are different types of tensor decomposition models, such as the PARAFAC, also 123 Fig. 1 Example of the tensor data cube called CANDECOMP(Canonical Decomposition), Tucker, and HOSVD (higher-order singular value decomposition).In our approach, we follow the HOSVD formulation shown in Fig. 1.
The HOSVD is based on a successive application of the matrix SVD decomposition to the flattened matrices of a given tensor [17].In this respect, HOSVD is one of the most powerful tensor decomposition methods.HOSVD can be used to build orthogonal spaces which can be then used for reduction analysis in a way similar to the subspace projection method.Figure 1 is an illustration of a threedimensional tensor.The three-dimensional tensor is factorization into three matrices A ∈ R n×d a , B ∈ R m×d b , and C ∈ R c×d c and one central tensor S ∈ R D A ×D B ×D C .The number of A is a user, the B is item, and C is a number of contexts where c i ∈ {1, . .., c}.The number of the user is n, the d is a number of dimensional, and m is a number an items.The rating id gives on five-star scale R ∈ {0, . . ., 5}.In this case, the decision function for a single user a, item b, context c combination becomes: TopK abc = S × A i * × B i * × C i * .This is composed the values of the measures precomputed by applying dimension attributes as analysis perspectives is well as various aggregate operations.A tensor data cube performs aggregate operations to extract all possible combinations of the dimension attributes, given factors extracted for the users, items, and context to provide rapid responses to any analysis queries.For details, refer to [10,12].
Tensor Fibers The tensor data cube performs aggregate operations to extract each attribute combination; a generation of matrix rows and columns to a higher-order case is called a fiber.Fiber represents a sequence of elements along a fixed mode when all but one indices are fixed.For example, ABC fiber can be expressed by the following SQL: SELECT A, B, C FROM R GROUP BY A, B, C; the fibers are ABCD, ABC, ABD, ACD, BCD, AB, AC, AD, BC, BD, CD, A, B, C, D, and a total of 2 4 = 16 are extracted for all that denotes dimensionless combinations.Fiber C is expressed in the (A 1 , A 2 , . . ., A n , M) form, where the dimension attribute Ai has n-dimension attributes and M denotes measure.
Tensor Slices The slices constituting a fiber represent the values stored in the given fiber and can be expressed in the form of (a 1 , a 2 , . . ., a n , m), where m denotes the 123 aggregate measure of a fiber slices.Tensor data cube takes an architecture comprising such slices.To take an example of slices, those in the fiber AC are expressed as (a 1 , *, c 1 , *, 3) and (a 1 , *, c 2 , *, 3), where * denotes the empty dimension (attribute) of a fiber.

Motivation
The increased size of the input data for tensor data cube computation is requiring a significant number of processing steps and high storage costs.The tuples number T of the input data and the number of dimensions D are required the maximum cost of T × 2 D .As the size of the data to be analyzed continues to grow, faster calculations are required for faster decision making.The MR is a distributed parallel framework to fast data computation.The most important issue pertaining to the computation tensor data cube based on MR is "high processing cost," "computing time," and "increased number of phases."

Problem 1 (High processing cost)
The problem of the size-dependent high processing costs is defined as the cost-intensive shuffle phase.

Problem 2 (Computing time)
The problem of excessive processing time for computing large-scale data is defined as the failure of single-node computing.

Problem 3 (Increased number of phases)
The increase in input-output costs due to the increased number of phases when large-scale data are divided to solve the problem is described in problem 2.
An efficient MR-based tensor data cube computation can be ensured by reducing the output amount of the map function and the data processing amount of the shuffle phase.The computation models that emit data amounting to 2 D are prone to disruption due to excessive data sizes.This problem can be addressed by increasing the number of the MR phases, of which the costs can be computed as follows: 1.As a mapper emits data only for one fiber, the costs for increasing the number of mappers can be expressed by the equation M R2 D = T /m. 2. The costs of reduction induced by the combination of the values sharing the same key can be expressed by M R2 D combine = αT/m, where α denotes the rate of reduction induced by the combination of the same-key values.3. The shuffle phase can be expressed by M R2 D shuffle = αβT , where β denotes the rate of reduction induced by the merging of the same-key values in the shuffle phase.4. In the reduce phase, as αβT is processed after being split by the number of reducers r , the costs for each reduce function can be expressed by the equation M R2 D reduce = αβT /r .However, as the whole phases are processed 2 D times, the equation for computing the total costs is In this paper, we introduce the MRTensorCube, a new algorithm capable of computing data cube at low-cost potential using the MR mechanism, as shown in Fig. 2.

MRTensorCube
The MRTensorCube consists of two basic phases, MRSpread and MRAssemble, as shown in Fig. 3.The MRSpread has the function of emitting all partial slices the tensor data need.To enable data cube calculation, the MRSpread consists of two preparatory: (1) processing steps of data reduction and (2) partial fibers generation.The result of the MRSpread phase is generated top fibers and partial fibers.The MRAssemble collects partial slices and aggregates them to generate a complete tensor data cube.
These two phases compute and construct cube according to the definition of MRTen-sorCube that is described as follows: Definition 1 (Partial slices) Partial slices p1 = (a 1 , a 2 , ..., a 3 , m 1 ) may share the same set of slices attributes with another partial slices p 2 = (a 1 , a 2 , ..., a 3 , m 2 ), where m 1 and m 2 are partial measures.The condition for the status of a complete slice is having only one slice that shares the same attribute.A complete slice has the form of (a 1 , a 2 , a 3 , M), where its measure M has the final aggregate result using the function F(m1, m2) that is same as sum ().
Definition 2 (Partial fibers) Partial fibers are a set of partial slices to be computed.Partial fibers P j = {p 1 , p 2 , ..., p n } for attribute j = {A 1 , A 2 , ..., A 3 } are partial fibers with one or more partial slices p ( p ⊆ P).Here, n denotes the total number of slices within P j (1 < n.P j ).P j contains one or more identical partial slices.If all partial slices p 1 , p 2 , ..., p n are completely computed, their values are the same as that of fibers C j (P j = C j ).
The process of MRTensorCube includes two phases for two reasons: (1) maximum use the combine performance of MR; (2) minimization of data amount to be computed in the shuffle phase.First, the MRSpread mappers receive split files as input data, with T , the number of tuples of the entire data, split by the number of mappers; this can be expressed by MRSpread map = T /m, where m is the number of mappers.The combined process, in which the data with the same key are grouped together, can be expressed by MRSpread combine = αT /m, where α denotes the rate of reduction (α ≤ 1) attained by combining the same-key values.In the shuffle process, the data split by each mapper are collected and the collected data, amounting to the number of tuples of the entire data αT , are processed, whereby the data with overlapping key values are merged together; this process can be expressed by MRSpread shuffle = αβT , where β denotes the rate of data merged by the overlapping keys.Finally, reducers divide αβT by the number of reducers (r ) for processing.As each reducer emits partial fibers amounting to 2 D , its costs can be expressed by MRSpread reduce = αβT /r × 2 D .In other words, the closer the values of α and β approach 0, the greater the cost reduction becomes in the shuffle and reduce phases.The cost-effectiveness of MRTensorCube is due to the data reduction depending on α and β values in MRSpread and χ and δ values in MRAssemble.Thus, the cost-effectiveness can be maximized by maximizing the amount of data reduction in MRSpread and MRAssemble, which can be achieved my minimizing the values of α, β, χ , and δ.For example, if T = 100,000 and α = 0.9, then αT = 90,000 in the MRSpread combine phase.If β = 0.8.If χ = 0.9 and in the MRAssemble combine and shuffle phases, respectively, then αβ χ T = 64,800 and αβ χ δT = 51,840.In other words, the advantage of MRTensorCube algorithm is the progressive reduction of processing loads as its processing steps advance.For details on cost model and data reduction, refer to [10,12].

Experimental evaluation 4.1 Experimental setup and dataset
A total 21 computers have been used for the experiments in this study.One of them was used for the 1 NameNode of Hadoop, and the rest for 20 DataNode.The specifications of NameNode are dual core CPU 3.0GHz, RAM 1GB, HDD 400GB, and those of DataNode were dual core CPU 3.0GHz, RAM 512MB, HDD 200GB.All 21 computers are interlinked with 100Mbps Ethernet.We used Ubuntu Server 11.04 as OS, Linux kernel 2.6.38 version, and Hadoop 0.20.2version.All algorithms were generated using java 1.6 versions.
The datasets are similar to real-world datasets because they are produced using Zipf's law [17].We synthesized datasets with the maximum magnitude of 1,000,000,000 tuples and six dimensions.Zipf distribution satisfies Formula 1, where values vary according to discrete variables (n = i) and the frequency of each variable (f) varies according to α value.In other words, distribution bias increases in proportion to the increase in α.In the Zipf distribution is reflect real-world data such as word and population prediction dictionary input frequency in the model.In this study, assuming that data are generated based on the real-world data distribution, the value of α, which is the expected value of the Zipf distribution, is set formula 1.

123
S. Kim et al.

Existing algorithm
We performed comparative experiments to prove the superiority of the proposed algorithm over several comparable ones.Using MR2 D , MRNaive, MRGBLP, and MRPipeSort, we examined each algorithm's cube computation procedure and characteristics and explored the difference from the proposed algorithm [13][14][15].MRPipeSort is an algorithm extended from the PipeSort algorithm.PipeSort takes advantage of the fact that fibers under a parent-child relationship sharing the same prefix can be computed without requiring additional arrays.

Experimental result
We experimentally compared the MRTensorCube algorithm proposed in this study with the four different algorithms described above under the same conditions.We performed three types of experiment: (i) comparing running time increasing the number of tuples; (ii) comparing increasing the number of dimensions; (iii) comparing decreasing the number of nodes.

Varying number of tuples
In this experiment, we varied the tuple-dependent data size.Figure 4a, b shows the results of comparing the running time varying the number of tuples from 10,000,000 to 100,000,000 and from 100,000,000 to 1,000,000,000.The experimental results show the proposed algorithm MRTensorCube faster than other all four algorithms.

Varying size of dimension
In this experiment, the performance of the proposed algorithm was compared with those of four other algorithms increasing the dimension of data from three to seven.Figure 5a shows the results of comparative experiments, whereby the number of 123 tuples was set at 10,000,000 and (b) set at 100,000,000.MR2 D showed the lowest performance.MRNaive computed rapidly at lower dimensions but was surpassed by MRGBLP at the seventh dimension.MRPipeSort demonstrated relatively superior performance; it was outperformed by MRTensorCube.MRTensorCube outperformed all other algorithms in this experiment as well.

Varying number of nodes
In this experiment, the running times of the five algorithms were tested as the number of nodes was increased from 4 through 20.The number of tuples and dimensions were set 50,000,000 and 5, respectively.Figure 6 shows that all algorithms operate increasingly faster as the number of nodes increases, with MRGBLP showing the smallest change rate depending on the number of nodes.Even MR2 D and MRNaive, which require much running time by nature, demonstrated increased computation speed with the 123 increase in the number of nodes.MRPipeSort and MRTensorCube algorithms were also found to compute more rapidly as the number of nodes increased.

Conclusion
In this paper, we proposed MRTensorCube, a new data cube computation algorithm based on the MR mechanism for rating large contextual data.The particularity of MRTensorCube is a two-phase computation: MRSpread phase, in which partial fibers slices are emitted, and MRAssemble that generates fibers by computing all partial slices.MRTensorCube takes maximum advantage of the MR combine function using the concepts of partially computed partial slices and partial fibers composed of partial slices.Its additional advantage is the progressive reduction of data size as each MR operational step.
MRTensorCube is a distributed parallel data cube algorithm that takes maximum benefits from the basic mechanism of MR framework by optimally using their advantages.We performed experiments for a more detailed and accurate comparison of related algorithms by extending representative data cube algorithms to incorporate the MR paradigm.The results of our various experiments verified the superiority in MRTensorCube over all existing data cube algorithms.That is, MRTensorCube showed a much higher cost-effectiveness and computation speed compared to other algorithms under the same computer environments.MRTensorCube is being used for multipredicting analysis in recommendations systems, adopted by increasing number of users for large-scale data processing.
Furthermore, we have seen that the relative gain comparing to other methods is proportional to the amount of contextual information available.and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 123

Fig. 2
Fig. 2 MRTensorCube for TopK recommendation itemAs a result, the MRTensorCube can perform rapid data cube computation based on the MR mechanism.For details of cost model and data reduction, refer to[10,12]. 123

Fig. 3
Fig. 3 MRTensorCube architecture and algorithm The total costs of MRSpread can be hence expressed by the equation MRSpread total = MRSpread map + MRSpread combine + MRSpread shuffle + MRSpread reduce .Second, in MRAssemble the input data which are the MRSpread outputs with already reduced data depending on the α and β values (αβT ), are split again by a number of mappers m, which can be expressed by MRAssemble map = αβT /m.The data size is further reduced by combiners that combine the overlapping values of partial slices, as expressed by MRAssemble combine = αβ χ T /m, where χ denotes the rate of the data combined with the same partial slices values.The overlapping partial slices are merged again in the shuffle phase, as shown in the equation MRAssemble shu f f le = αβ χ δ T , where δ denotes the rate of overlapping partial slices.Finally, MRAssemble reduce = αβ χ δT /r expresses the output of a reducer, where r denotes the number of reducers operating in parallel.The total costs of MRAssemble can be expressed by the equation MRAssemble total = MRAssemble map + MRAssemble combine + MRAssemble shuffle + MRAssemble reduce .Thus, the total costs of MRTensorCube is MRDataCube total = MRSpread total + MRAssemble total .

Fig. 4 aFig. 5 a
Fig. 4 a Elapsed time by varying the number of tuples (10,000,000-100,000,000) and b elapsed time by varying the number of tuples (100,000,000-1,000,000,000)

Fig. 6
Fig. 6 Elapsed time by varying the number of nodes (4-20) D algorithm performs MR phases in multiple operations in correspondence with the number of fibers to distribute the processing costs of the shuffle phase, thus compensating for the main problem of MRNaive.It differs from MRNaive in that it computes each fiber (n = 2 D ) in separate MR phase.MRNaive is the basic MR-based data cube computation algorithm.It computes fibers (n = 2 D ) in one operation, and for cube computation, tenor data are scanned only once.In other words, MRNaive processes all fibers computation in one MR operation.It tends to incur high processing costs in the shuffle phase due to excessive generation of intermediate outcomes in the map phase.MRGBLP is a distributed parallel algorithm extended from the MR-based GBLP algorithm.It excludes top fibers computation and does not use tensor data as input data when computing the remaining fibers.