Comparing student and expert-based tagging of recorded lectures


In this paper we analyse the way students tag recorded lectures. We compare their tagging strategy and the tags that they create with tagging done by an expert. We look at the quality of the tags students add, and we introduce a method of measuring how similar the tags are, using vector space modelling and cosine similarity. We show that the quality of tagging by students is high enough to be useful. We also show that there is no generic vocabulary gap between the expert and the students. Our study shows no statistically significant correlation between the tag similarity and the indicated interest in the course, the perceived importance of the course, the number of lectures attended, the indicated difficulty of the course, the number of recorded lectures viewed, the indicated ease of finding the needed parts of a recorded lecture, or the number of tags used by the student.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Abowd, G., Atkeson, C., Brotherton, J., Enqvist, T., Gulley, P., & LeMon, J. (1998). Investigating the capture, integration and access problem of ubiquitous computing in an educational setting. Paper presented at the SIGCHI conference on Human factors in computing systems, Los Angeles, California, United States.

  2. Abowd, G., Brotherton, J., & Bhalodia, J. (1998). Classroom 2000: a system for capturing and accessing multimedia classroom experiences. Paper presented at the CHI 98 conference on Human factors in computing systems, Los Angeles, California, United States.

  3. Bateman, S., Brooks, C., McCalla, G., & Brusilovsky, P. (2007). Applying collaborative tagging to e-learning. Proc. of ACM WWW, 3(4).

  4. Bligh, D. (1998). What is the use of lectures? (5th ed.). Bristol: Intellect Books.

    Google Scholar 

  5. Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: Handbook I: Cognitive domain. New York: David McKay, 19, 56.

  6. Brotherton, J., & Abowd, G. (2004). Lessons learned from eClass: assessing automated capture and access in the classroom. ACM Transactions on Computer-Human Interaction (TOCHI), 11(2), 121–155.

    Article  Google Scholar 

  7. De Backer, G. (2012). PHP Dutch Stemmer Retrieved 10-12-2012, from Delicious. Retrieved 21-1-2013, from

  8. Exley, K., & Dennick, R. (2004). Giving a lecture: From presenting to teaching. New York: Routledge/Falmer.

    Google Scholar 

  9. Golder, S. A., & Huberman, B. A. (2005). The structure of collaborative tagging systems:

  10. Golder, S. A., & Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2), 198–208.

    Article  Google Scholar 

  11. Gorissen, P., van Bruggen, J. M., & Jochems, W. (2012a). Analysing students' use of recorded lectures through methodological triangulation. Paper presented at the Workshop on Learning Technology for Education in Cloud (LTEC'12), Salamanca, Spain.

  12. Gorissen, P., van Bruggen, J. M., & Jochems, W. (2012b). Students and recorded lectures: survey on current use and demands for higher education. Research in Learning Technology, 20(3), 297–311.

    Google Scholar 

  13. Guy, M., & Tonkin, E. (2006). Tidying up tags. D-Lib Magazine, 12(1), 1082–9873.

    Article  Google Scholar 

  14. John, A., & Seligmann, D. (2006). Collaborative tagging and expertise in the enterprise. Paper presented at the Collab. Web Tagging Workshop in conj. with WWW2006.

  15. Kraaij, W., & Pohlmann, R. (1994). Porter’s stemming algorithm for Dutch. Informatiewetenschap, 167–180.

  16. Longmire, W. (2000). A primer on learning objects. Learning Circuits, 1(3).

  17. Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.

  18. Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. (2009). Evaluating similarity measures for emergent semantics of social tagging.

  19. Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). Position paper, tagging, taxonomy, flickr, article, toread. In: Collaborative Web Tagging Workshop at WWW2006, 31-41, Edinburgh, Scotland.

  20. Mathes, A. (2004). Folksonomies-cooperative classification and communication through shared metadata. Computer Mediated Communication, 47(10).

  21. O’Reilly, T. (2005). What is Web 2.0: design patterns and business models for the next generation of software. Communications & Strategies, (65), 17.

  22. Porter, M. F. (1997). An algorithm for suffix stripping. In K. Sparck Jones & P. Willett (Eds.), Readings in information retrieval (pp. 313–316). Morgan Kaufmann Publishers.

  23. Porter, M. F. (2001). Snowball: A language for stemming algorithms Retrieved 10-12-2012, from

  24. Srinivas, G., Tandon, N., & Varma, V. (2010). A weighted tag similarity measure based on a collaborative weight model. Paper presented at the Proceedings of the 2nd international workshop on Search and mining user-generated contents.

  25. Vander Wal, T. (2007). Folksonomy Retrieved 21-1-2013, from

  26. Velsen, L., & Melenhorst, M. (2008). User Motives for Tagging Video Content.

  27. Verburg, J. (2010). Live & social tagging van weblectures [live and social tagging of weblectures]. Utrecht: Hogeschool Utrecht.

    Google Scholar 

  28. Voss, J. (2007). Tagging, Folksonomy & Co-Renaissance of Manual Indexing? Paper presented at the 10th International Symposium for Information Science, Cologne.

Download references

Author information



Corresponding author

Correspondence to Pierre Gorissen.


Appendix 1 Tagging protocol

To determine where to place the tags, we used signals, or signposts, provided by the structure of the lecture. Exley and Dennick (2004) describe a number of possible lecture structures a lecturer can choose from. They distinguish a number of different types of statements a lecturer can use to inform students about the lecture organization: signposts, frames, foci, and links. Besides these statements by the lecturer, the structure of the slides also provides a signal for the lecture organization. The slides contain titles, lists of important topics, schemas with the structure of the topics, etc. Based on these signals, we created the following tagging protocol for our experiments:

  1. 1.

    Examine the lecture structure (see Table 6). This gives an indication of the sort of possible tag indicators that signal useful tags.

    Table 6 Lecture structures
  2. 2.

    Playback the recorded lecture, and while playing, listen to oral signals by the lecturer that indicate signposts, frames, foci, or links.

  3. 3.

    Mark potential tags. Pause the recording, write down the time code along with potential tag title and a short description of the tag.

  4. 4.

    After completion of the tagging process, the tags, descriptions, and time codes were added to the tagged player system.

  5. 5.

    Always add a tag at 00:00:00, indicating the beginning of the recording. This gives the student an easy way to return to the beginning of the recording.

Appendix 2 Vector space modelling

To analyze the similarity of the tags added by the students and the expert, we converted the tags into vectors. We will use an example to explain the steps taken to create the vectors during the analysis. Table 7 shows an example for a (short) video where four students (student 1 through student 4) have assigned a total of five different multi-word tags to that video:

Table 7 Raw frequencies for the tags used by the students

The table shows that student 1 used the tags “Intra cellular” and “Cells” once each and that he/she did not use any of the other tags. It also shows that student 3 used “Intra cellular” and “Function of a cell” once and “Homeostasis” twice, etc. To calculate the similarity between the tags assigned by the students, we use a method from the information retrieval and natural language processing domain (Manning and Schütze 1999). The foundation of this method is that the tags are translated into vectors. This translation results in a vector space where the tags are the axes of the space and the students who added the tags are the points or vectors in this space. We can then compare those vectors to calculate a score for the similarity of those vectors. We will use the example in Table 1 to clarify the steps taken.

The term frequency tf t,s of a tag is defined as the number of times that t has been used by students s. So in our example, tf 1,1 = 1, tf 2,3 = 0 and tf 5,4 = 2. We can represent this table as a set of four vectors, one for each student, where each vector has five dimensions (one for each tag). For example, the vector for student 1 and student 4 would look like this:

$$ \overrightarrow{s1}=\left[\begin{array}{c}\hfill 1\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \\ {}\hfill 1\hfill \end{array}\right]\overrightarrow{s}4=\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 1\hfill \\ {}\hfill 0\hfill \\ {}\hfill 0\hfill \\ {}\hfill 2\hfill \end{array}\right] $$

Student 1 and student 4 have one tag in common, but student 4 used it (“Cells”) twice. This tag has more importance to student 4, and we want to take that into consideration when we calculate the similarity of tag usage between the two students. However, we use the log frequency weight, and not the raw frequency of the term, to achieve a less than linear relevance indication:

$$ {w}_{t,s}=\left\{\begin{array}{ll}1+{ \log}_{10}{\mathrm{tf}}_{t,s}\hfill & if\kern0.5em {\mathrm{tf}}_{t,s}>0\hfill \\ {}0\hfill & otherwise\hfill \end{array}\right. $$

To calculate the similarity of the tag usage, we calculate the similarity of the vectors. We cannot simply take the Euclidian distance between the vectors. We need to calculate the cosine similarity of the vectors of students. To do that, we first have to length-normalize the vectors by dividing each of its components by its length. For that we use the L2 norm:

$$ {\Vert \overrightarrow{x}\Vert}_2=\sqrt{{\displaystyle \sum}_i{x}_i^2} $$

So if student 1 is represented by the vector \( \overrightarrow{s1} \) and student 2 is represented by the vector \( \overrightarrow{s2} \), then the cosine similarity for the tags used by the two students, taking into account length normalization of the vectors, is:

$$ \cos \left(\overrightarrow{s1},\overrightarrow{s2}\right)=\frac{{{\displaystyle \sum}}_{i=1}^{\left|V\right|}s{1}_is{2}_i}{\sqrt{{{\displaystyle \sum}}_{i=1}^{\left|V\right|}s{1}_i^2}\sqrt{{{\displaystyle \sum}}_{i=1}^{\left|V\right|}s{2}_i^2}} $$

If we apply both the weighting factor and the normalization to the vectors and represent it in our original table again, it looks like Table 8:

Table 8 Normalized vectors

We can now calculate the similarity between the tag usages of the students by multiplying the components of the vectors. In our example, the similarity of the tag usage of student 1 and student 2 can be calculated as: (0.7071 * 0.5) + (0 * 0) + (0 * 0.5) + (0 * 0.5) + (0.7071 * 0.5) = 0.7071

This then results in a 4 × 4 table for the mutual similarities between the four students given the example set of five tags (see Table 9).

Table 9 Similarity scores for the students

The diagonal cells in Table 4 are one, indicating that if we compare a vector with itself (e.g., vector \( \overrightarrow{s1} \) with vector \( \overrightarrow{s1} \), or compare the tags used by student 1 with student 1), there is a perfect match. Also, each value is present twice in the table, because comparing vector \( \overrightarrow{s1} \) with vector \( \overrightarrow{s2} \) gives the same similarity score as when we compare vector \( \overrightarrow{s2} \) with vector \( \overrightarrow{s1} \). Table 4 shows that student 3 and student 2 have the highest similarity in tags (similarity score is 0.8589), while student 3 and student 4 have no tag similarity (similarity score is 0). If we remove the duplicate information from the table, the resulting table looks like Table 10.

Table 10 Similarity scores for the students (optimized)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Gorissen, P., van Bruggen, J. & Jochems, W. Comparing student and expert-based tagging of recorded lectures. Educ Inf Technol 20, 161–181 (2015).

Download citation


  • Tagging
  • Recorded lectures
  • Cosine similarity
  • Vector space modelling