Skip to main content
Log in

Social media filtering based on collaborative tagging in semantic space

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a semantic collaborative filtering method to enhance recommendation quality derived from user-generated tags. Social tagging is employed as an approach in order to grasp and filter users’ preferences for items. In addition, we explore several advantages of semantic tagging for ambiguity, synonymy, and semantic interoperability, which are notable challenges in information filtering. The proposed approach first determines semantically similar users using social tagging and subsequently discovers semantically relevant items for each user. Experimental results show that our method offers significant advantages both in terms of improving the recommendation quality and in dealing with ambiguity, synonymy, and interoperability issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://ieml.org

  2. http://linkeddata.org/

  3. The star before an English expression marks a *tag, a natural language descriptor of an IEML expression. A *tag holds the place of an IEML expression by suggesting its meaning rather than uttering the IEML expression

  4. http://www.opencalais.com/

  5. Detailed formal models are presented in Appendix A

  6. In IEML notation, the former “Java” can be expressed as (l.i.-k.i.-’)[Java] which means “Java as a geographic unit” whereas the latter “Java” is (b.-’ b.e.-t.u.-wa.e.-’ E:T:.p.-’,)[Java] which means “Java as a programming language”

  7. http://bibsonomy.org

  8. A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}.

  9. A powerset of S is the set of all subsets of S, including the empty set ∅.

References

  1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749

    Article  Google Scholar 

  2. Bao S, Wu X, Fei B, Xue G, Su Z, Yu Y (2007) Optimizing web search using social annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp 501–510

  3. Bonhard P, Sasse A (2006) ‘Knowing me, knowing you’—using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98

    Article  Google Scholar 

  4. Breese JS, Heckerman D, Kadie C (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp 43–52

  5. Deshpande M, Karypis G (2004) Item-based top-N recommendation algorithms. ACM Trans Inf Sys 22(1):143–177

    Article  Google Scholar 

  6. Facebook Statistics (2010) http://www.facebook.com/press/info.php?statistics. Accessed 30 Mar 2010

  7. Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208

    Article  Google Scholar 

  8. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Sys 22(1):5–53

    Article  Google Scholar 

  9. Hotho A, Jäschke R, Schmitz C, Stumme G (2006) Information retrieval in folksonomies: search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp 411–426

  10. Jäschke R, Marinho L, Hotho A, Schmidt-Thieme L, Stumme G (2008) Tag recommendations in social bookmarking systems. AI Commun 21(4):231–247

    MATH  MathSciNet  Google Scholar 

  11. Kim H-N, Ji A-T, Ha I, Jo G-S (2009) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commer Res Appl. doi:10.1016/j.elerap.2009.08.004

    Google Scholar 

  12. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data from BibSonomy, version of April 30th, 2007. http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/. Accessed 15 Dec 2009

  13. Lévy P (2009) Toward a self-referential collective intelligence some philosophical background of the IEML research program. In: Proceedings of 1st International Conference on Computational Collective Intelligence—Semantic Web, Social Networks & Multiagent Systems, pp 22–35

  14. Lévy P (2010) From social computing to reflexive collective intelligence: the IEML research program. Inf Sci 180(1):71–94

    Article  Google Scholar 

  15. Li X, Guo L, Zhao Y (2008) Tag-based social interest discovery. In: Proceedings of the 17th International Conference on World Wide Web, pp 675–684

  16. Marchetti A, Tesconi M, Ronzano F (2007) SemKey: a semantic collaborative tagging system. In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th International Conference on World Wide Web

  17. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  18. Peis E, Morales-del-Castillo JM, Delgado-López JA (2008) Semantic recommender systems. Analysis of the state of the topic. Hipertext.net number 6. http://www.hipertext.net/english/pag1031.htm. Accessed 15 Dec 2009

  19. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp 175–186

  20. Sarwar B, Karypis G, Konstan J, Riedl J (2000) Analysis of recommendation algorithms for E-commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp 158–167

  21. Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the Tenth International World Wide Web Conference, pp 285–295

  22. Schenkel R, Crecelius T, Kacimi M, Michel S, Neumann T, Parreira JX, Weikum G (2008) Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 523–530

  23. Siersdorfer S, Sizov S (2009) Social recommender systems for web 2.0 folksonomies. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp 261–270

  24. Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp 327–336

  25. Tso-Sutter KHL, Marinho LB, Thieme LS (2008) Tag-aware recommender systems by fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied computing, pp 1995–1999

  26. Xu Z, Fu Y, Mao J, Su D (2006) Towards the semantic web: collaborative tag suggestions. In: Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on the World Wide Web

  27. Zanardi V, Capra L (2008) Social ranking: uncovering relevant content using tag-based recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp 51–58

  28. Zhang Z-K, Zhou T, Zhang Y-C (2010) Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1):179–186

    Article  Google Scholar 

Download references

Acknowledgment

The work was mainly funded since 2009 by the Canada Research Chair in Collective Intelligence at University of Ottawa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heung-Nam Kim.

Appendix A

Appendix A

1.1 IEML language model

We present the model of the IEML language, along with the model of semantic variables. Let ∑ be a nonempty and finite set of symbols, ∑ = {S, B, T, U, A, E}. Let string s be a finite sequence of symbols chosen from ∑. The length of this string is denoted by |s|. An empty string ε is a string with zero occurrence of symbols and its length is |ε |= 0. The set of all strings of length k composed with symbols from ∑ is defined as ∑k = {s where |s| = k}. Note that ∑0 = {ε} and ∑1 = {S, B, T, U, A, E}. Although ∑ and ∑1 are sets containing exactly the same members, the former contains symbols and the latter strings. The set of all strings over ∑ is defined as ∑* = ∑0∪∑1∪∑2∪∑3

A useful operation on strings is concatenation, defined as follows. For all s i = a 1 a 2 a 3 a 4 …a i ∈∑* and s j = b 1 b 2 b 3 b 4 …b j ∈∑*, then s i s j denotes string concatenation such that s i s j = a 1 a 2 a 3 a 4 …a i b 1 b 2 b 3 b 4 …b j and |s i s j | = i + j. The IEML language over ∑ is a subset of ∑*, L IEML ⊆ ∑*:

$$ L_{{IEML}} = {\left\{ {s \in {\sum {^{*} \left\| s \right.\left| { = 3^{l} ,0 \leqslant l \leqslant 6} \right.} }} \right\}} $$
(9)

1.2 Model of semantic sequences

Definition 3 (Semantic sequence)

A string s is called a semantic sequence if and only if sL IEML .

To denote the p n th primitive of a sequence s, we use a superscript n where 1 ≤ n ≤ 3l and write s n. Note that for any sequence s of layer l, s n is undefined for any n > 3l. Two semantic sequences are distinct if and only if either of the following holds: i) their layers are different, ii) they are composed from different primitives, iii) their primitives do not follow the same order: for any s a and s b ,

$$ {s_a} = {s_b} \Leftrightarrow \forall n,s_a^n = s_b^n \wedge |{s_a}| = |{s_b}| $$
(10)

Let’s now consider binary relations between semantic sequences in general. These are obtained by performing a Cartesian product of two sets.Footnote 8 For any set of semantic sequences X, Y where s a X, s b Y and using Eq. 2, we define four binary relations wholeX × Y, substanceX × Y, attributeX × Y, and modeX × Y as follows:

$$ \begin{array}{*{20}{c}} {{\hbox{whole}} = \left\{ {({s_a},{s_b})|{s_a} = {s_b}} \right\}} \\{{\hbox{substance}} = \left\{ {({s_a},{s_b})|s_a^n = s_b^n \wedge |{s_a}| = 3|{s_b}|,{ }1 \leqslant n \leqslant { }|{s_b}|} \right\}} \\{{\hbox{attribute}} = \left\{ {({s_a},{s_b})|s_a^{n + |{s_b}|} = s_b^n \wedge |{s_a}| = 3|{s_b}|,{ }1 \leqslant n \leqslant { }|{s_b}|} \right\}} \\{{\hbox{mode}} = \left\{ {({s_a},{s_b})|s_a^n = s_b^{n + 2|{s_b}|} \wedge |{s_a}| = 3|{s_b}|,{ }1 \leqslant n \leqslant { }|{s_b}|} \right\}} \\\end{array} $$
(11)

Any two semantic sequences that are equal are in a whole relationship. In addition, any two semantic sequences that share specific subsequences may be in substance, attribute or mode relationship. For any two semantic sequences s a and s b , if they are in one of the above relations, then we say that s b plays a role w.r.t s a and we call s b a seme of sequence.

Definition 4 (Seme of a sequence)

For any semantic sequence s a and s b , if (s a , s b ) ∈ wholesubstanceattributemode, then s b plays a role w.r.t. s a and s b is called a seme.

We can now group distinct semantic sequences together into sets. A useful grouping is based on the layer of those semantic sequences.

1.3 Model of semantic categories

A category of L IEML is a subset such that all strings of that subset have the same length:

$$ c = \left\{ {\forall {s_i},{s_j} \in {L_{IEML}}\,where\,\left| {{s_i}| = |{s_j}} \right|} \right\} $$
(12)

Definition 5 (Semantic category)

A semantic category c is a set containing semantic sequences at the same layer.

The layer of any category c is exactly the same as the layer of the semantic sequences included in that category. The set of all categories of layer l is given as the powersetFootnote 9 of the set of all strings of layer l of L IEML :

$$ {C_l} = Powerset\left( {\left\{ {s \in {L_{IEML}}\,where\,\left| s \right| = {3^l}} \right\}} \right) $$
(13)

Two categories are distinct if and only if they differ by at least one element. For any c a and c b :

$$ {c_a} = {c_b} \Leftrightarrow {c_a} \subseteq {c_b} \wedge {c_b} \subseteq {c_a} $$
(14)

A weaker condition can be applied to categories of distinct layers (since two categories are different if their layers are different) and is written as:

$$ l({c_a}) \ne l({c_b}) \Rightarrow {c_a} \ne {c_b} $$
(15)

where l(c a ) and l(c b ) denotes the layer of category c a and c b , respectively. Analogously to sequences, we consider binary relations between any categories c i and c j where l(c i ), l(c j ) ≥ 1. For any set of categories X, Y where c a X, c b Y, we define four binary relations whole C X × Y, substance C X × Y, attribute C X × Y, and mode C X × Y as follows:

$$ \begin{array}{*{20}c} {{{\text{whole}}_{{\text{C}}} = {\left\{ {{\left( {c_{a} ,c_{b} } \right)}\left| {c_{a} = c_{b} } \right.} \right\}}}} \\ {{{\text{substance}}_{{\text{C}}} = {\left\{ {{\left( {c_{a} ,c_{b} } \right)}\left| {\forall s_{a} \in c_{a} ,\exists s_{b} \in c_{b} ,{\left( {s_{a} ,s_{b} } \right)} \in {\text{substance}}} \right.} \right\}}}} \\ {{{\text{attribute}}_{{\text{C}}} = {\left\{ {{\left( {c_{a} ,c_{b} } \right)}\left| {\forall s_{a} \in c_{a} ,\exists s_{b} \in c_{b} ,{\left( {s_{a} ,s_{b} } \right)} \in {\text{attribute}}} \right.} \right\}}}} \\ {{{\text{mode}}_{{\text{C}}} = {\left\{ {{\left( {c_{a} ,c_{b} } \right)}\left| {\forall s_{a} \in c_{a} ,\exists s_{b} \in c_{b} ,{\left( {s_{a} ,s_{b} } \right)} \in {\text{mode}}} \right.} \right\}}}} \\ \end{array} $$
(16)

For any two categories c a and c b , if they are in one of the above relations (c a , c b ) ∈ whole C substance C attribute C mode C , then we say that c b plays a role with respect to c a and c b is called a seme of category.

1.4 Model of catsets

A catset is a set of distinct categories of the same layer as defined Definition 4.

Definition 6 (Catset)

A catset κ is a set containing categories such that κ ={c n |∀i, j: c i c j , l(c i )=l(c j )}.

The layer of a catset is given by the layer of any of its members: if some cκ, then l(κ) = l(c). Note that a category c can be written as cC l , while a catset κ can be written as κC l . All standard set operations, such as union and intersection, (e.g., ∪ and ∩), can be performed on catsets of the same layer.

1.5 Model of uniform semantic locator

A USL is composed of up to seven catsets of different layers as follows:

Definition 7 (Uniform Semantic Locator, USL)

A USL υ is a set containing catsets of different layers such that υ = {κ n | ∀i, j: l(c i ) ≠ l(c j )}.

Note that since there are seven distinct layers, a USL can have at most seven members. All standard set operations, such as union and intersection (e.g., ∪ and ∩) on USLs are always performed on sets of categories (and therefore on sets of sequences), layer by layer. Since at each layer l there is |C l | distinct catsets, the whole semantic space is defined by the tuple: Ł =C 0 × C 1 × C 2 × C 3 × C 4 × C 5 × C 6 .

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, HN., Roczniak, A., Lévy, P. et al. Social media filtering based on collaborative tagging in semantic space. Multimed Tools Appl 56, 63–89 (2012). https://doi.org/10.1007/s11042-010-0557-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0557-4

Keywords

Navigation