Traditionally, direct marketing companies have relied on pre-testing to select the best offers to send to their audience. Companies systematically dispatch the offers under consideration to a limited sample of potential buyers, rank them with respect to their performance and, based on this ranking, decide which offers to send to the wider population. Though this pre-testing process is simple and widely used, recently the industry has been under increased pressure to further optimize learning, in particular when facing severe time and learning space constraints. The main contribution of the present work is to demonstrate that direct marketing firms can exploit the information on visual content to optimize the learning phase. This paper proposes a two-phase learning strategy based on a cascade of regression methods that takes advantage of the visual and text features to improve and accelerate the learning process. Experiments in the domain of a commercial Multimedia Messaging Service (MMS) show the effectiveness of the proposed methods and a significant improvement over traditional learning techniques. The proposed approach can be used in any multimedia direct marketing domain in which offers comprise both a visual and text component.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Click-through rate, or CTR, is a common way of measuring success for an advertising campaign targeted to mobile devices. For the scope of our paper it can be measured as the ratio between the number of users who clicked a specific offer over the total number of users that were exposed to that offer.
For mobile operators, sending commercial messages to their customers is very cost-effective: operators can easily reach millions of potential buyers at little cost, making the profit potential of these advertising-related services very high. In addition, in the case of mobile phone operators market saturation and fierce competition  have turned value added services (VAS), like the ones these commercial messages advertise, into significant revenue source and in some cases the only opportunity for revenue growth. Because these services are now central to profitability, mobile phone operators and independent production companies are becoming increasingly creative in generating and proposing new services and offers. The result is a rapidly growing set of possible services available.
In the following section we explain in more detail what commercial mobile multimedia messages are and present several examples.
Given the speed of offer production in our application, even with daily contact (e.g., daily messages sent to mobile phone users), the number of offers to be tested grows at a faster pace than the rate at which a traditional pre-testing system is able to learn (while at the same time keeping enough potential customers for optimized delivery).
The real targeting system could reach millions of users, but large segments of users would have to receive the same message. Only a maximum of 20 messages could be sent daily.
By definition, a holistic cue is one that is processed over the entire human visual field and does not require attention to analyze local features .
By using the Bayesian classifier one can infer the presence of faces in an image by the skin appearance in the pixel domain; likewise, an outdoor context can be inferred by sky and/or vegetation appearance . We used these three types of visual information in our system as proposed by  and used the percentage of pixels belonging to each one of these appearance classes as determined by the Bayesian classifier to describe each image. The disadvantage of this method is that it required hand-labeling of a training set.
Taking into account the overall simulation settings, 30 offers per day is an arrival rate comparable to the mean arrival rate observed in the real system.
Alpaydin E (2004) Introduction to machine learning. MIT, Cambridge
Barnard K, Forsyth DA (2001) Learning the semantics of words and pictures. In: ICCV, Vancouver, 7–14 July 2001, pp 408–415
Battiato S, Farinella GM, Gallo G, Ravì D (2008) Scene categorization using bag of textons on spatial hierarchy. In: International conference on image processing (ICIP), San Diego, 12–15 October 2008
Battiato S, Farinella G, Giuffrida G, Tribulato G (2007) Data mining learning bootstrap through semantic thumbnail analysis. In: SPIE-IS&T 19th annual symposium electronic imaging science and technology 2007—multimedia content access: algorithms and systems, Orlando, 9–13 April 2007
Bergen JR, Julesz B (1983) Rapid discrimination of visual patterns. IEEE Trans Syst Man Cybern 13:857–863
Biederman I (1987) Recognition by components: a theory of human image interpretation. Psychol Rev 94:115–148
Biederman I, Mezzanotte R, Rabinowitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14:143–177
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/∼cjlin/libsvm
Cleveland WS, Devlin SJ, Grosse E (1988) Regression by local fitting: methods, properties, and computational algorithms. J Econom 37(1):87–114
Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach Intell 25(5):564–575
Direct Marketing Association (2007) The power of direct marketing: ROI, sales, expenditures and employment in the U.S., 2006–2007 edn. Direct Marketing Association, Washington, DC
Florent P (2008) Universal and adapted vocabularies for generic visual categorization. IEEE Trans Pattern Anal Mach Intell 53(7):1243–1256
Hull D (1996) Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci 47:70–84
Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290:91–97
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition, vol II. IEEE, Piscataway, pp 2169–2178
Li FF, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE Computer Society, Los Alamitos, pp 524–531
Lim JH (1999) Categorizing visual contents by matching visual “keywords”. In: VISUAL, pp 367–374
Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2008) Discriminative learned dictionaries for local image analysis. In: IEEE conference on computer vision and pattern recognition
Moosmann F, Triggs B, Jurie F (2007) Fast discriminative visual codebooks using randomized clustering forests. In: Schölkopf B, Platt J, Hoffman T (eds) Advances in neural information processing systems, vol 19. MIT, Cambridge, pp 985–992
Naccari F, Battiato S, Bruna A, Capra A, Castorina A (2005) Natural scene classification for color enhancement. IEEE Trans Consum Electron 5:234–239
Nash E (2000) Direct marketing. McGraw-Hill, New York
Netsize (2007) Convergence: everything is going mobile. The Netsize Guide 2007. Netsize, Levallois Perret
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Oren N (2002) Reexamining tf.idf based information retrieval with genetic programming. In: SAICSIT 2002, South African Institute for Computer Scientists and Information Technologists, Republic of South Africa, pp 224–234
Oza NC (2005) Online bagging and boosting. In: Systems, man and cybernetics, 2005 IEEE international conference on. IEEE, Piscataway, pp 2340–2345
Potter M (1975) Meaning in visual search. Science 187:965–966
Prinzie A, Van Den Poel D (2005) Constrained optimization of data-mining problems to improve model performance: a direct-marketing application. Expert Syst Appl 29(3):630–640
Renninger LW, Malik J (2004) When is scene recognition just texture recognition? Vis Res 44:2301–2311
Roberts M, Berger PD (1989) Direct marketing management. Prentice-Hall, New York
Schapire R (2001) The boosting approach to machine learning: an overview. Kluwer, Boston
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Shawe-Taylor J, Cristianini N (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the international conference on computer vision, vol 2. IEEE, Piscataway, pp 1470–1477
Taylor P, Caley R, Black AW, King S (1999) Wagon, Edinburgh Speech Tools Library
Varma M, Zisserman A (2005) A statistical approach to texture classification from single images. Int J Comput Vis 62(1–2):61–81
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: ICCV ’05: proceedings of the tenth IEEE international conference on computer vision. IEEE Computer Society, Washington, DC, pp 1800–1807
Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: MIR ’07: proceedings of the international workshop on multimedia information retrieval. ACM, New York, pp 197–206
The authors would like to thank Daniele Ravì for helping in the implementation of the simulation studies. The authors would also like to thank Neodata Group for giving access to the mobile messaging dataset, and for helping in the implementation and testing of the proposed approach.
About this article
Cite this article
Battiato, S., Farinella, G.M., Giuffrida, G. et al. Using visual and text features for direct marketing on multimedia messaging services domain. Multimed Tools Appl 42, 5–30 (2009). https://doi.org/10.1007/s11042-008-0250-z
- Visual and text features
- Learning in time and space constrained domains
- Multimedia messaging services
- Direct marketing