Keywords

1 Introduction

A trademark may be a recognizable word, phrase, symbol, sound, color, scent or design, or a combination of these, that identifies products or services of an individual, an organization or a particular source from those of others. The two primary purposes of a trademark are to: (1) protect brand names and logos used on goods and services and give the trademark owner the exclusive right to use the mark; and (2) act as a source indicator for consumers to ensure that the products and services they utilize under particular brands emanate from the sources that they expect [1]. Selecting a mark is the first step in the overall trademark application/registration process. One of the key factors in choosing a mark and filing it for registration is determining whether a “likelihood of confusion” [2] exists with the mark that is being filed or anything that has already been registered or filed. USPTO examines every trademark application for compliance with federal rules and laws and grants registrations when, among a host of other factors, no likelihood of confusion exists. In fact, “likelihood of confusion” between the mark that is being filed and a mark already registered or in a pending application, is the most common reason for refusal of a trademark application. Therefore, before the trademark filing process, each trademark applicant is strongly encouraged, though not required, to conduct a thorough trademark search to determine whether the proposed mark is likely to cause confusion with any existing registered trademarks or pending trademark applications [1].

Currently, the trademark applicants, and/or their attorneys and representatives, manually search the USPTO’s database of active and inactive trademark registrations and applications using the Trademark Electronic Search System (TESS) search engine. This search engine provides access to crucial information such as text and images of registered marks, and marks in pending and abandoned applications. During this search phase, trademark applicants, or their attorneys or representatives, visually identify and determine whether there are any same or similar marks for related goods and/or services that have already registered or are pending. Furthermore, a thorough study of each mark is required to determine that the goods and services are not related [1]. In addition, once the trademark application has been filed, it is forwarded to a trademark-examining attorney for legal review. During the review phase, the USPTO examining attorneys also manually search existing USPTO records of registered trademarks and prior pending applications to determine potential likelihood of confusion using the USPTO search system (known as X-search, which utilizes the same database as TESS, though the interface differs). Overall, the process of manually researching and identifying marks with similar text and image characteristics is a complex task that often takes a substantial amount of time.

Recently, Content-Based Image Retrieval (CBIR) systems have led to advancement in image retrieval and recognition methods by finding and retrieving images independent from the metadata. In CBIR, image global and local low-level features are extracted by their visual content such as shape, texture, and color or any other information that can be derived from the image itself. Similarly, Convolutional Networks (CNNs) [3] have achieved great success in the field of computer vision and demonstrated excellent performance in large-scale image classification [4] and object detection [5]. Moreover, in the last few years, CNNs have emerged as a methodology in extracting features [6, 7] such as basic shapes, textures, and colors etc. from the unlabeled data. Most notably, a significant advancement in the deep learning-based methods has been seen after Krizhevsky et al. [8] achieved the first place on the ILSVRC 2012 challenge using a CNN model that achieved top 1 and top 5 error rates of 37.5% and 17.5%. This has been made possible due to the rapid growth in the amount of annotated data [9], powerful graphic processing units (GPUs) [10] and advancements in computing architecture. Additionally, in the last few years, the depth of CNNs has advanced greatly from 8 layers (AlexNet) [8] to 19 layers (VGGNet) [4], 22 layers (GoogleNet) [11], and even 152 layers (ResNet) [12], improving the overall classification accuracy. Furthermore, numerous deep learning libraries and platforms such as TensorFlow [13], Theano [14], Caffe [15], Torch [16], Computational Network Toolkit [17] etc. have been developed and made available in the open source platform, enabling further research in simplifying the complexity of deep neural networks.

In this paper, we address the problem of searching trademarks similar to a chosen mark using a neural network pre-trained on the trademark dataset. TensorFlow-Slim high level neural network API library was firstly used to extract the image features from the pre-trained Inception-ResNet-v2 [18] neural network. The approximate nearest neighbor algorithm was then used to identify the “nearest neighbors”, that is, trademarks similar to the input mark.

2 Approaches

2.1 Content Based Image Retrieval Approach

Lucene Image Retrieval (LIRE) [21], an open source Java library was used for extracting the global and local features of the downloaded trademark images. The global features that were extracted include: Joint Color Descriptor (JCD), Pyramid Histogram of Oriented Gradients (PHOG), MPEG-7 descriptors scalable color, Color and edge directivity descriptor (CEDD) and Fuzzy color and texture histogram (FCTH). Besides this, local features were extracted based on the OpenCV implementations of SIFT and SURF. The extracted global and local image features were then stored in a Lucene index for later retrieval. For identifying similar images, LIRE either took the input query feature or extracted the feature from the input image. A linear search is then performed by reading the images from the stored Lucene index sequentially and comparing them with the input image to return a ranked order list of the best matching n candidates.

2.2 CNN Based Image Search

A total of 100,000 trademark images in Fig. 1 were downloaded from the USPTO database. The image features were then extracted by passing the images through the neural network that was pre-trained on the trademark dataset. These extracted features were used to perform image searching using the approximate nearest neighbor (ANN) variant of the nearest neighbor search (NNS) algorithm. Nearest neighbor search (NNS) is a form of proximity search that computes the distances from the query point to every single point in the target dataset and returns the data points that are closest to the query point. This technique has been successfully applied in numerous fields of applications, such as computer vision, pattern recognition, and content-based image retrieval, to name a few.

Fig. 1.
figure 1

Example images from USPTO trademark dataset.

For image search feature, Approximate Nearest Neighbor Oh Yeah (ANNOY) [19] and NearPy [20] libraries, were used for identifying the nearest neighbors. Each image of the trademark dataset was passed through the trained ResNet-v2 neural network as depicted in Fig. 2 to extract the intermediate representation (feature vector) of the image. These image vectors were then saved in a binary format and used to search and identify the nearest neighbors. Finally, the cosine distance between the image and the nearest neighbors was computed and then the nearest neighbors were sorted by distance to return the top K nearest neighbors.

Fig. 2.
figure 2

Image search using CNN and NNS.

3 Infrastructure

Amazon Web Services (AWS) cloud infrastructure and Docker [22] were utilized for performing trademark image search. AWS m4.16xlarge (64 Core Intel Xeon E5-2676 v3 Haswell processor and 256 GB DDR3 RAM) spot instance was utilized for extracting image features and then identifying the nearest neighbors of the input mark. For the machine learning approach, AWS EC2 spot instances were chosen since they have an advantage of providing surplus of computing resource at a lower price compared to the on-demand instance price. Also, Docker light weight containers were configured to ease the configuration and setup of the TensorFlow framework.

4 Results

Using a simplistic test data set (trademark variation of images from the same owner), we were able to validate the results of CNN image search approach. The test proved Mean Average Precision (MAP) score of 0.69. This sample set of comparing variations of say Puma® variations though a good starting point is not close to the complexity of the test case faced by a trademark examining attorney. We are currently in the process of obtaining a more realistic test data set curated by Trademark experts, and plan to pursue testing those (Fig. 3).

Fig. 3.
figure 3

MAP scores of CNN image search.

5 Conclusion

The current mechanism of searching Trademarks that depends on meta-tags such as trademark design codes, while still being the more comprehensive method of searching for likelihood of confusion in trademark images, is time consuming. By taking advantage of recent advances in Convolutional Networks (CNNs), we have been able to provide an alternate way to search for Trademarks based on image.