Abstract
This chapter mainly introduces Huawei CLOUD Enterprise Intelligence (EI), including Huawei CLOUD EI service family. It focuses on Huawei ModelArts platform and Huawei EI solutions.
Download chapter PDF
This chapter mainly introduces Huawei CLOUD Enterprise Intelligence (EI), including Huawei CLOUD EI service family. It focuses on Huawei ModelArts platform and Huawei EI solutions.
8.1 Huawei CLOUD EI Service Family
Huawei CLOUD EI service family is composed of EI big data, EI basic platform, conversational bot, natural language processing (NLP), speech interaction, speech analysis, image recognition, content review, image search, face recognition, Optical Character Recognition (OCR) and EI agent, as shown in Fig. 8.1.
-
1.
EI big data provides such services as data access, CLOUD data migration, real-time streaming computing, MapReduce, Data Lake Insight, table store.
-
2.
EI basic platform provides such services as ModelArts platform, deep learning, machine learning, HiLens, graph engine service and video access.
-
3.
Conversational bot provides intelligent QABot, TaskBot, intelligent quality inspection bot and customized conventional bot services.
-
4.
Natural language processing provides natural language processing fundamentals, content review—text and language understanding, language generation, NLP Customization, machine translation.
-
5.
Speech interaction provides speech recognition, speech synthesis and real-time speech transcription.
-
6.
Video analysis provides video content analysis, video editing, video quality detection and video tagging.
-
7.
Image recognition provides the services of image tagging and celebrity recognition.
-
8.
Content review provides the review of text, image and video.
-
9.
Image search indicates searching images with images, assisting customers to search the same or similar images from the designated image library.
-
10.
Face recognition provides face recognition and body analysis.
-
11.
OCR provides character recognition of general class, certificate class, bill class, industry class and customized template class.
-
12.
EI agent is composed of transportation AI agent, industrial AI agent, park AI agent, network AI agent, auto AI agent, medical AI engine and geographic AI agent.
8.1.1 Huawei CLOUD EI Agent
EI agent integrates AI technology into the application scenarios of all walks of life. Combined with various technologies, it deeply excavates data value and makes full use of AI technology, and thus a scenario based solution is developed to improve its efficiency and users’ experience. EI agent is composed of transportation AI engine, industrial AI engine, park AI engine and network AI engine, as shown in Fig. 8.2. In addition, vehicle AI engine, medical AI engine and geographic AI engine are launched by Huawei as well.
-
1.
Transportation AI Engine
Transportation AI engine realizes the products and solutions such as all-around road network analysis, traffic prediction, traffic incident monitoring and control, traffic lights optimization, traffic parameter perception and situation evaluation to ensure efficient, green and safe travelling. Transportation AI engine is shown in Fig. 8.3.
Transportation AI engine has the following advantages.
-
(a)
It realizes comprehensive and in-depth data mining, with fully integrated Internet and transportation big data, and deeply excavated big data value.
-
(b)
It provides all-around collaboration and pedestrian-vehicle collaboration, which maximizes the traffic flow of the whole region and minimizes the waiting time of vehicles in the region. It also coordinate the traffic demand of vehicles and pedestrians, so as realize the orderly passage of vehicles and pedestrians.
-
(c)
It provides real-time traffic light scheduling, which is the first in the industry to achieve the standard formulation for safe communication interface between transportation AI agent and traffic light control platform.
-
(d)
It can accurately predict the driving path demand so as to plan the route in advance.
Transportation AI engine is characterized as follows.
-
(a)
Full time: It realizes the 7 × 24 h whole-area and full-time perception of traffic incidents.
-
(b)
Intelligence: It achieves regional traffic light optimization.
-
(c)
Completeness: It can identify key congestion points and key congestion paths, making an analysis of congestion diffusion.
-
(d)
Prediction: It predicts the crowd density, obtaining the traffic regularity of crowd migration.
-
(e)
Accuracy: It achieves the comprehensive and accurate control of traffic conditions in 7 × 24 h.
-
(f)
Convenience: It realizes real-time traffic light scheduling and traffic clearance on demand.
-
(g)
Visuality: It displays live traffic situation on large screen.
-
(h)
Finess: It realizes key vehicle control and fine management.
-
(a)
-
2.
Industrial AI Engine
Relying on big data and artificial intelligence, industrial AI engine provides full-chain services in the fields of design, production, logistics, sales and service. It excavates data value, assisting enterprises to take a lead with new technologies. Industrial AI engine is shown in Fig. 8.4.
Industrial AI engine causes the three major changes of the existing industry.
-
(a)
Transformation from artificial experience to data intelligence: Based on data mining analysis, new experience of efficiency promotion and product quality improvement can be obtained from data.
-
(b)
Transition from digital to intelligent: The ability of intelligent analysis has become the new driving force of enterprise digitization.
-
(c)
Transition from product manufacturing to product innovation: Data collaboration from product design to sales in enterprises, as well as data collaboration between upstream and downstream of industrial chain, brings to new competitive advantages.
The application of industrial AI engine is as follows.
-
(a)
Product quality optimization and improvement: Based on customer feedback, Internet comment analysis, competitor analysis, maintenance records and after-sales historical data, a classified analysis is carried out in order to find out key problems of products, so as to guide new product improvement and product quality promotion.
-
(b)
Intelligent equipment maintenance: According to the past and present status of the system, predictive maintenance is taken through the predictive inference methods such as time series prediction, neural network prediction, regression analysis. It can predict whether the system will fail in the future, when it will fail, and the type of failure, so as to improve the efficiency of service operation and maintenance, reduce the unplanned downtime of the equipment, and save the labor cost on-site service.
-
(c)
Production material estimation: Based on the historical material data, the materials needed for production are accurately predicted so as to reduce the storage cycle and improve the efficiency. Deep algorithm optimization is based on the industry time series algorithm model, combined with Huawei supply chain deep optimization.
-
(a)
-
3.
Park AI Engine
Park AI engine applies artificial intelligence to the management and monitoring of industrial park, residential park and commercial park, so as to provide a convenient and efficient environment through the technologies such as video analysis, data mining. Park AI engine is shown in Fig. 8.5.
Park AI engine brings the following three changes.
-
(a)
From manual defense to intelligent defense: Intelligent security based on artificial intelligence can relieve the pressure on security personnel.
-
(b)
From card swiping to face scanning: Face-scanning automatically clocks in, so it is no longer necessary to bring the entrance card.
-
(c)
From worry to reassurance: With powerful lost tracking and analysis ability, artificial intelligence makes employees and property owners feel more at ease.
The application of park AI engine is as follows.
-
(a)
Park entrance control: Face recognition technology can accurately identify the identity of visitors and quickly return the results. Therefore, a high throughput of entrance control and automatic park management can be achieved.
-
(b)
Safety Monitoring: Through the intelligent technologies such as intrusion detection, loitering detection, abandoned object detection, the area can be monitored to ensure its safety.
-
(c)
Smart parking: Through vehicle license plate recognition and track trajectory, the services such as vehicle access control, driving line control, illegal parking management, parking space management can be realized.
-
(a)
-
4.
Network AI Engine
Network AI engine (NAIE) introduces AI into the network field to solve the problems of network business prediction, repeatability and complexity. It improves the utilization of network resources, operation and maintenance efficiency, energy efficiency and business experience, making it possible to realize automatic driving network. Network AI Engine is shown in Fig. 8.6.
Network AI engine has the following commercial values.
-
(a)
The utilization rate of resources has been improved. AI is introduced to predict the network traffic, according to the prediction results, so that network resources are managed in a balanced way to improve the utilization rate of network resources.
-
(b)
Improve the efficiency of operation and maintenance. AI is introduced to compress a lot of repetitive work, predict faults and carry out preventive maintenance, so as to improve the operation and maintenance efficiency of the network.
-
(c)
Improve the efficiency of energy utilization. AI technology is used to predict the business status in real time, automatically making dynamic adjustment of energy consumption according to the business volume, so as to improve the efficiency of energy utilization.
The technical advantages of network AI engine are as follows.
-
(a)
Data safely entering lake. It supports the rapid collection of various types of data such as network parameters, performance and alarm into the lake. On the one hand, a large number of tools are provided to improve the efficiency of data governance. At the same time, multi tenant isolation, encryption and storage are applied to ensure the whole life cycle security of data entering lake.
-
(b)
Network experience embedding. It uses the guided model development environment and presets multiple AI model development templates in network domain. It provides different services for different developers such as training service, model generation service, communication model service, aiming to help developers quickly complete model/application development.
-
(c)
Rich application services. It provides application services for various network business scenarios such as wireless access, fixed network access, transmission load, core network, DC, energy. It can effectively solve the specific problems of operation and maintenance efficiency, energy consumption efficiency and resource utilization rate in network services.
-
(a)
8.1.2 EI Basic Platform: Huawei HiLens
Huawei HiLens is a multi-modal AI development and application platform of End-Cloud collaboration, which is composed of end-side computing devices and Cloud platform. It provides a simple development framework, an out-of-the-box development environment, a rich AI skill market and Cloud management platform. It connects with a variety of end-side computing devices, supporting visual and auditory AI application development, AI application online deployment, and massive device management. HiLens help users develop multi-modal AI applications and distribute them to end-side devices to realize intelligent solutions for multiple scenarios. HiLens is shown in Fig. 8.7.
HiLens products feature as follows.
-
1.
End-Cloud collaborative inference, balancing low computing delay and high accuracy.
-
2.
End-side analysis of data, minimizing the cost of cloud storage.
-
3.
One-stop skill development, shortening the development cycle.
-
4.
Skills market presets rich skills, online training, one-click deployment.
-
1.
HiLens Product Advantage
-
(a)
End-Cloud collaborative inference.
-
End-Cloud collaboration can work in the scenario of unstable network and save user bandwidth.
-
End-side device can cooperate with Cloud-side device to update the model online, and quickly improve the accuracy of end-side.
-
End-side can analyze the local data, greatly minimizing cloud data flow and saving the storage cost.
-
-
(b)
Unified skills development platform.
With a collaborative optimization of software and hardware, HiLens products use a unified skill development framework, encapsulate basic components, and support common deep learning models.
-
(c)
Cross platform design.
-
HiLens products support Ascend AI processor, Hisilicon 35xx series chips and other mainstream chips in the market, which can cover the needs of mainstream monitoring scenarios.
-
HiLens products provide model conversion and algorithm optimization for end-side chips.
-
-
(d)
Rich skill market.
-
A variety of skills are preset in HiLens skill market, such as human shape detection and crying detection. Users can select the required skills directly from the skill market and quickly deploy them on the end side without any development steps.
-
HiLens has done a lot of algorithm optimization to solve the problems of small memory and low precision of end-to-end devices.
-
Developers can also develop custom skills through the HiLens admin console and join the skill market.
-
-
(a)
-
2.
HiLens Application
-
(a)
From the perspective of users, Huawei HiLens mainly has three types of users: ordinary users, AI developers and camera manufacturers.
-
Ordinary users: Ordinary users are skill users, who can be family members, supermarket owners, parking-lot attendant or site managers. HiLens Kit can achieve the functions such as family security improvement, customer flow counting, the identification of vehicle attribute and license plate, the detection of safety helmet wearing. And the users only need to register to HiLens management console, purchase or customize appropriate skills (such as license plate identification, safety helmet recognition, etc.) in platform skill market. One-click to install HiLens Kit can meet their needs.
-
AI developers: AI developers are usually technicians or college students who are engaged in AI development. They can easily deploy to the device to see how those skills work in real time if they want to obtain certain income or knowledge from AI. These users can develop AI skills in HiLens management console.
HiLens integrates HiLens framework on the end side, which encapsulates the basic components, simplifies the development process, and provides a unified API interface, by which developers easily complete the development of a skill. After the skill development is completed, users can deploy it to HiLens Kit with one click to view the running effect. At the same time, skills can also be released to the skill market for other users to buy and use, or skills can be shared as templates for other developers to learn.
-
Camera manufacturers: Manufacturers of Hisilicon 35xx series chip camera products. For this series of cameras may have weak or even no AI capabilities, the manufacturers expect their products to be more competitive after obtaining stronger AI capabilities.
-
-
(b)
In terms of application scenarios, Huawei HiLens can be applied in various fields such as home intelligent surveillance, park intelligent surveillance, supermarket intelligent surveillance, intelligent in-vehicle.
-
Home intelligent surveillance: Home Intelligent cameras and smart home manufacturers, integrated with Huawei Hisilicon 35xx series chips, as well as high-performance HiLens Kit integrated with D chip, can be used to improve home video intelligent analysis ability. It can be applied to the following scenarios.
-
Human shape detection. It can detect the human figure in the family surveillance, record the time of appearance, or send an alert to the mobile phone after it is detected in some period of time when no one is at home.
-
Fall detection. When it detects someone falls down, the alert is sent out. It is used mainly for elderly care.
-
Cry detection. When it detects baby’s crying, the alert is sent to the user’s mobile phone. It is used for child care.
-
Vocabulary recognition. It can customize a specific word, such as “help”, and give an alert when the word is detected.
-
Face attribute detection. The face attributes are detected, including gender, age, smiling face, which can be used for door security, video filtering and so on.
-
Time album. The video clips of the detected children are combined into a time album to record the growth of children.
-
-
Park intelligent surveillance: Through HiLens management console, AI skills are distributed to the intelligent small station with Ascend chip integrated, so that edge devices can handle certain data. It can be applied to the following scenarios.
-
Face recognition gate. Based on face recognition technology, face recognition can be realized for the entrance and exit gates of the park.
-
License plate/vehicle identification. In entrance and exit of the park or garage, vehicle license plate and vehicle type identification can be carried out to realize the license plate and vehicle type authorization certification.
-
Safety helmet detection. Workers without wearing safety helmet will be found from video surveillance and alert will be initiated on the specified equipment.
-
Track restore. The face of the same person or vehicle identified by multiple cameras is analyzed to restore the path of pedestrian or vehicle.
-
Face retrieval. In the surveillance, face recognition is used to identify the specified face, which can be used for blacklist recognition.
-
Abnormal sound detection. When abnormal sound such as glass breakage and explosion sound is detected, alert shall be reported.
-
Intrusion detection. An alert is sent when a human shape is detected in the specified surveillance area.
-
-
Shopping mall intelligent surveillance: The terminal devices applicable to shopping mall include HiLens Kit, intelligent edge station and commercial cameras. Small supermarkets can integrate HiLens Kit, which supports 4–5 channels of video analysis scenes. With small size, it can be placed in the indoor environment. It can be applied to the following scenarios.
-
Customer flow statistics. Through the surveillance of stores and supermarkets, the intelligent custom flow statistics at the entrance and exit can be realized, which can be used to analyze the customer flow changes at different periods.
-
VIP identification. Through face recognition, VIP customers can be accurately identified to help formulate marketing strategies.
-
Statistics of new and old customers. Through face recognition, the number of new and old customers can be counted.
-
Pedestrian counting heat map. Through the analysis of pedestrian counting heat map, the density of crowd can be identified, which is beneficial for commodity popularity analysis.
-
-
Intelligent in-vehicle: Intelligent in-vehicle equipment is based on Android system, which can realize real-time intelligent analysis of internal and external conditions. It is applicable to the scenarios such as driving behavior detection, “two kinds of passenger coaches and one kind of hazardous chemical truck” surveillance. It can be applied to the following scenarios.
-
Face recognition. By identifying whether the driver’s face matches the owner’s pre-stored photo library, the driver’s authority is confirmed.
-
Fatigue driving. Real time surveillance of driver’s driving state can be detected, and an intelligent warning is sent if fatigue driving is detected.
-
Posture analysis. Detect the driver’s distracted behaviors, such as making a phone call, drinking water, gazing around, smoking.
-
Vehicle and pedestrian detection. It can be used for pedestrian detection in blind area.
-
-
-
(a)
8.1.3 EI Basic Platform: Graph Engine Service
Huawei Graph Engine Service (GES) is the first commercial distributed native graph engine with independent intellectual property rights in China. It is a service for query and analysis of graph structure data based on “relation”.
Adopting EYWA, a high-performance graph engine developed by Huawei, as its core, GES has a number of independent patents. It is widely used in the scenarios with rich relational data such as social applications, enterprise relation analysis, logistics distribution, shuttle bus route planning, enterprise knowledge graph, risk control, recommendation, public opinion, fraud prevention.
Massive and complex associated data such as social relations, transaction records and transportation networks are naturally graph data, while Huawei GES is a service for storage, querying and analysis of graph-structured data based on relations. It plays an important role in many scenarios such as social App, enterprise relation analysis, logistics distribution, shuttle bus route planning, enterprise knowledge graph, risk control.
In individual analysis, GES conducts user portrait analysis on individual nodes according to the number and characteristics of their neighbors. It can also excavate and identify opinion leaders according to the characteristics and importance of the nodes. For example, considering the quantity factor, the more attention a user receives from others, the more important the user is. On the other hand, the quality transfer factor is considered based on the transfer characteristics on the graph, the quality of fans is transferred to the concerned. When concerned by high-quality fans, the quality of the concerned increases.
In group analysis, with label propagation algorithm and community discovery algorithm, GES divides nodes with similar characteristics into one class, so that it can be applied to the node classification scenarios such as friend recommendation, group recommendation and user clustering. For example, in a social circle, if two people have a mutual friend, they will have the possibility of becoming friends in the future. The more mutual friends they have, the stronger their relationship will be. This allows to make friend recommendations based on the number of mutual friends.
In link analysis, GES can use link analysis algorithm and relationship prediction algorithm to predict and identify hot topics, so as to find “the tipping point”, as shown in Fig. 8.8.
It can be seen that the application scenarios of GES in the real world are rich and extensive. There will be more industries and application scenarios in the future worthy of in-depth exploration.
The product advantages of GES are as follows.
-
1.
Large scale: GES provides efficient data organization, which can more effectively query and analyze the data of 10 billion nodes and 100 billion edges.
-
2.
High performance: GES provides deeply optimized distributed graph computing engine, through which users can obtain the real-time query capability of high concurrency, second-level and multi-hop.
-
3.
Integration of query and analysis: GES provides a wealth of graph analysis algorithms, which a variety of analysis capabilities for business such as relationship analysis, route planning and precision marketing.
-
4.
Ease of use: GES provides a guided, easy-to-use visual analysis interface, and what you see is what you get; GES supports Gremlin query language, which is compatible with user habits.
The functions provided by GES are as follows.
-
1.
Rich domain algorithms: GES supports many algorithms such as PageRank, K-core, shortest path, label propagation algorithm, triangle count, interaction prediction.
-
2.
Visual graphic analysis: GES provides a guided exploration environment, which supports visualization of query results.
-
3.
Query analysis API: GES provides API for graph query, graph index statistics, Gremlin query, graph algorithm, graph management, backup management, etc.
-
4.
Compatible with open source Ecology: GES is compatible with Apache TinkerPop Gremlin 3.3.0.
-
5.
Graph management: GES provides graph engine services such as overview, graph management, graph backup, metadata management.
8.1.4 Introduction to Other Services Provided by EI Family
-
1.
Conversational BOT
Conversational BOT Service (CBS) is composed of QABot, TaskBot, Intelligent Quality Inspection Bot and Customized Conventional Bot. Conversational BOT is shown in Fig. 8.9.
-
(a)
QABot: It can help enterprises quickly build, release and manage intelligent question-answering robot system.
-
(b)
Taskbot: It can accurately understand the intention of the conversation, extract key information. It can be used for intelligent telephone traffic, intelligent hardware.
-
(c)
Intelligent quality inspection bot uses natural language algorithm and user-defined rules to analyze the conversation between customer service agents and customers in call center scenarios, helping enterprises to improve the service quality and customer satisfaction.
-
(d)
Customized conversational bot can build AI bots with various capabilities according to customer needs, including QA of knowledge base and knowledge graph, task-based conversation, reading comprehension, automatic text generation, multimodality, serving customers in different industries.
-
(a)
-
2.
Natural Language Processing
Natural language processing (NLP) provides the related services needed by robots to realize semantic understanding, which is composed of four sub services: NLP foundation, language understanding, language generation and machine translation. NLP service is shown in Fig. 8.10.
Natural Language Processing Fundamentals provides users with natural language related APIs, including word segmentation, named entity recognition, keyword extraction, text similarity, which can be applied to such scenarios as intelligent question answering, conversational bot, public opinion analysis, content recommendation, e-commerce evaluation and analysis.
Language understanding provides users with APIs related to language understanding, such as sentiment analysis, opinion extraction, text classification, intention understanding, etc., which can be applied to such scenarios as comment opinion mining, public opinion analysis, intelligent assistant, conversational bot.
Based on the advanced language model, language generation generates readable texts according to the input information, including text, data or image. It can be applied to human-computer interaction scenarios such as intelligent Q&A and conversation, news summary, report generation.
NLP Customization is to build a unique natural language processing model according to the specific needs of customer, such as customized automatic classification model of legal documents, customized automatic generation model of medical reports, customized public opinion analysis model of specific fields, which aims to provide unique competitiveness for enterprise applications.
-
3.
Speech Interaction
Speech interaction is composed of speech recognition, speech synthesis and real-time voice transcription, as shown in Fig. 8.11.
The main applications of speech recognition are as follows.
-
(a)
Speech search: Search content is directly entered by speech, which makes the search more efficient. Speech recognition supports speech search in various scenarios, such as map navigation, web search and so on.
-
(b)
Human computer interaction: Through speech wake-up and speech recognition, speech commands is sent to terminal devices and real-time operation of devices is implemented to improve human-computer interaction experience.
The applications of speech synthesis are as follows.
-
(a)
Speech navigation: Speech synthesis can convert on-board navigation data into speech material, providing users with accurate speech navigation service. Using the personalized customization ability of speech synthesis, it provides rich navigation speech service.
-
(b)
Audio book: Speech synthesis can transform the text content of books, magazines and news into realistic human voice, so that people can fully free their eyes while obtaining information and having fun in such scenarios such as taking the subway, driving or physical training.
-
(c)
Telephone follow-up: In the scene of customer service system, the follow-up content is converted into human voice through speech synthesis, and the user experience is improved by direct communication with customers in speech.
-
(d)
Intelligent education: Speech synthesis can synthesize the text content in books into speech. The pronunciation close to the real person can simulate live teaching scenes, so as to realize the reading aloud and leading reading of texts, helping students better understand and master the teaching content.
The applications of real-time speech transcription are as follows.
-
(a)
Live subtitle: Real-time speech transcription converts audio in live video or live broadcast into subtitles in real time, providing more efficient viewing experience for the audience and facilitating content monitoring.
-
(b)
Real-time recording of conference: Real-time speech transcription converts audio in video or teleconference into text in real time, which can check, modify and retrieve transcribed conference content in real time, so as to improve conference efficiency.
-
(c)
Instant text entry: Real-time speech transcription on mobile App can be used to record and provide transcribed text in real time, such as speech input method. It is convenient for post-processing and content archiving, saving manpower and time cost of recording, and thus the conversion efficiency is greatly promoted.
-
(a)
-
4.
Video Analysis
Video analysis provides such services as video content analysis, video editing, video tagging.
The applications of video content analysis are as follows.
-
(a)
Monitoring management: Video content analysis conducts a real-time analysis on all videos in the shopping mall or park, in order to extract key episodes, such as warehouse monitoring, cashier compliance troubles, fire exit blockage. It also conducts high security area intruder detection, loitering detection, abandoned object detection, etc.; Intelligent loss prevention can be conducted, such as portrait surveillance, theft detection, etc.
-
(b)
Park pedestrian analysis: Through a real-time analysis on the active pedestrians in the park, video content analysis identifies and tracks the high-risk persons after configuring the pedestrian blacklist, sending a warning. It counts the pedestrian flow at key intersections to formulate park management strategies.
-
(c)
Video character analysis: Through the analysis of the public figures in the media video, video content analysis accurately identifies the political figures, movie stars and other celebrities in the video.
-
(d)
Motion recognition: Video content analysis detects and recognizes the motions in the video after an analysis of the front and back frame information, optical flow motion information, scene content information.
The applications of video editing are as follows.
-
(a)
Highlight clip extraction: Based on the content relevance and highlight of video, video editing extracts scene segments to make video summary.
-
(b)
News video splitting: Video editing splits the complete news into news segments of different themes based on the analysis of characters, scenes, voices and character recognition in the news.
The applications of video tagging are as follows.
-
(a)
Video search: Based on the analysis of video scene classification, people recognition, voice recognition and character recognition, the video tagging forms hierarchical classification tags so as to support accurate and efficient video search and search experience improvement, as shown in Fig. 8.12.
-
(b)
Video recommendation: Based on the analysis of scene classification, person recognition, speech recognition and OCR, video tagging forms hierarchical classification tags for personalized video recommendation.
-
(a)
-
5.
Image Recognition
Image recognition, based on deep learning technology, can accurately identify the visual content in the image, providing tens of thousands of objects, scenes and concept tags. It has the ability of target detection and attribute recognition, so as to help customers accurately identify and understand the image content. Image recognition provides such functions as scene analysis, intelligent photo album, target detection, image search, as shown in Fig. 8.13.
-
(a)
Scene analysis: The lack of content tag leads to low retrieval efficiency. Image tagging can accurately identify image content, improve retrieval efficiency and accuracy, so as to make personalized recommendation, content retrieval and distribution more effective.
-
(b)
Smart photo album: Based on tens of thousands of tags identified from images, smart photo album can be customized in categories, such as “plants”, “food”, “work”, which is convenient for users to manage.
-
(c)
Target detection: At the construction site, based on the customized image recognition, target detection system can real-time monitor whether the on-site staff wear safety helmet, in order to reduce the safety risk.
-
(d)
Image search: The search of massive image database is troublesome. Image search technology, based on image tag, can quickly search for the desired image no matter whether the user inputs a keyword or an image.
-
(a)
-
6.
Content Review
Content review is composed of text review, image review and video review. Based on the leading detection technology of text, image and video, it can automatically detect the contents concerning pornography, advertising, terrorism, politics, so as to help customers reduce the risk of business violations. Content review is shown in Fig. 8.14.
Content review includes the following applications.
-
(a)
Pornography identification: Content review can judge the pornographic degree of a picture, give three confidence scores: pornographic, sexy and normal.
-
(b)
Terrorism detection: Content review can quickly detect whether the picture contains the content concerning fire, guns, knives, bloodiness, flag of terrorism, etc.
-
(c)
Sensitive figures involved in politics: Content review can judge whether the content is involved in sensitive political figures.
-
(d)
Text content detection: Content review can detect whether the text content concerns with pornography, politics, advertising, abuse, adding water and contraband.
-
(e)
Video review: Content review can judge whether video has the risk of violation so as to provide violation information from the dimensions of screen, sound and subtitle.
-
(a)
-
7.
Image Search
Image search is to search image with image. Based on deep learning and image recognition technology, it uses feature vectorization and search ability to help customers search for the same or similar pictures from the specified library.
The applications of image search are as follows.
-
(a)
Commodity picture search: Image search can search the pictures taken by the user from the commodity library. By similar picture searching, the same or similar commodity are pushed to the user, so as to sell or recommend the related commodity, as shown in Fig. 8.15.
-
(b)
Picture copyright search: Picture copyright is an important asset of photography and design websites. Image search can quickly locate tort pictures from massive image databases so as to defend the rights of image resource websites.
-
(a)
-
8.
Face Recognition
Face recognition can quickly identify faces in images, analyzing the key information and obtaining face attributes, so that accurate face comparison and retrieval can be achieved.
The applications of face recognition are as follows.
-
(a)
Identity authentication: Face identification and comparison can be used for identity authentication, which is suitable for authentication scenes such as airport, customs.
-
(b)
Electronic attendance: Face identification and comparison is applicable to the electronic attendance of employees in enterprises as well as security monitoring.
-
(c)
Trajectory analysis: Face search can retrieve N face images and their similarity degree that are most similar to the input face in the image database. According to the time, place and behavior information of the return pictures, it can assist customers to realize trajectory analysis.
-
(d)
Customer flow analysis: Customer flow analysis is of great value to shopping malls. Based on the technology of face recognition, comparison and search, it can accurately analyze the information of customers such as age and gender, so as to distinguish new and regular customers, helping customers with efficient marketing. Customer flow analysis is shown in Fig. 8.16.
-
(a)
-
9.
Optical Character Recognition
Optical Character Recognition (OCR) is the recognition of text in a picture or scanned copy into editable text. OCR can replace manual input to improve business efficiency. It supports the character recognition of such scenarios as ID card, driver’s license, driver’s license, invoice, English customs documents, common forms, common characters, as shown in Fig. 8.17.
OCR supports the character recognition of general class, certificate class, bill class, industry class and customized template class.
General OCR supports automatic recognition of text information on arbitrary format pictures, such as forms, documents, network pictures. It can analyze various layouts and forms, so as to quickly realize the electronization of various documents.
The applications of general OCR are as follows.
-
(a)
Electronic filing of enterprise historical documents and reports: It can identify the character information in documents and reports and establish electronic files, and so as to facilitate rapid retrieval.
-
(b)
Automatic filling in the sender information of express delivery: It can identify the contact information in the picture and automatically fill in the express delivery form, reducing manual input.
-
(c)
Efficiency improvement of contract handling: It can automatically identify the structured information and extract the signature and seal area, which is helpful for rapid audit.
-
(d)
Electronic customs documents: As many companies have overseas business. General OCR can realize the automatic structure and electronization of customs document data, so as to improve the efficiency and accuracy of information entry.
Certificate character recognition supports automatic identification of valid information and structured extraction of key fields on ID card, driving license, vehicle certificate, passport.
The applications of certificate character recognition are as follows.
-
(a)
Fast authentication: It can quickly complete the real-name authentication of mobile phone account opening and other scenes, so as to reduce the cost of user identity verification.
-
(b)
Automatic information entry: It can identify key information in certificates so as to save manual entry and improve efficiency.
-
(c)
Verification of identity information: It can verify whether the user is the holder of a real certificate.
Bill type character recognition supports automatic recognition and structured extraction of valid information on various invoices and forms, such as VAT invoice, motor vehicle sales invoice, medical invoice, etc.
The applications of bill character recognition are as follows.
-
(a)
Automatic entry of reimbursement document information: It can quickly identify the key information in the invoice and effectively shorten the reimbursement time.
-
(b)
Automatic entry of document information: It can quickly enter motor vehicle sales invoice and contract information, so as to improve the efficiency of vehicle loan processing.
-
(c)
Medical insurance: It can automatically identify key fields such as drug details, age and gender of medical documents before entering them into the system. Combined with ID card and bank card OCR, it can quickly complete insurance claims business.
Industry type character recognition supports the extraction and recognition of structured information of various industry-specific pictures, such as logistics sheets and medical test documents, which helps to improve the automation efficiency of the industry.
The applications of industry type character recognition are as follows.
-
(a)
Automatic filling in the sender’s information of express delivery: It can identify the contact information in the picture before automatically filling in the express delivery form so as to minimize manual input.
-
(b)
Medical insurance: It can automatically identify key fields such as drug details, age and gender of medical documents and enter them into the system. Combined with ID card and bank card OCR, it can quickly complete insurance claims business.
Customized template type character recognition supports user-defined recognition templates. It can specify the key fields to be recognized, so as to realize the automatic recognition and structural extraction of user-specific format images.
-
(a)
Identification of various certificates: For card images of various formats, it can be used to make templates to realize automatic identification and extraction of key fields.
-
(b)
Recognition of various bills: For various bill images, it can be used to make templates to realize automatic recognition and extraction of key fields.
-
(a)
8.2 ModelArts
As EI basic platform in EI service family, ModelArts is a one-stop development platform for AI developers. It provides massive data preprocessing and semi-automatic annotation, large-scale distributed training, automatic model generation and on-demand deployment capabilities of End, Edge and Cloud model, helping users quickly create and deploy models and managing full cycle AI workflow.
“One-stop” means that all aspects of AI development, including data processing, algorithm development, model training and model deployment, can be completed on ModelArts. Technically, it supports various heterogeneous computing resources, so that developers can choose to use flexibly according to their needs, regardless of the underlying technologies. At the same time, ModelArts supports mainstream open source AI development frameworks such as TensorFlow and MXNet, as well as self-developed algorithm frameworks to match the usage habits of developers.
Aiming to make AI development easier and more convenient, ModelArts provides AI developers with a convenient and easy-to-use process. For example, business-oriented developers can use automatic learning process to quickly build AI applications without focusing on model or coding; AI beginners can use preset algorithms to build AI applications without much concern for model development; provided with a variety of development environments, operation processes and modes by ModelArts, AI engineers can easily code expansion and quickly build models and applications.
8.2.1 Functions of ModelArts
ModelArts enables developers to complete all tasks in one stop, from data preparation to algorithm development and model training, and finally to deploy models and integrate them into the production environment. The function overview of ModelArts is shown in Fig. 8.18.
ModelArts features as follows.
-
1.
Data governance: ModelArts supports data processing such as data filtering and annotation, providing version management of data sets, especially large data sets for deep learning, so that training results can be reproduced.
-
2.
Extremely “fast” and “simple” training: Moxing deep learning framework, developed by ModelArts, is more efficient and easier to use, greatly improving the training speed.
-
3.
Multi-scenario deployment of end, edge and cloud: ModelArts supports the deployment of models to a variety of production environments, which can be deployed as cloud online reasoning and batch reasoning, or directly deployed to end and edge.
-
4.
Automatic learning: ModelArts supports a variety of automatic learning capabilities. Through “automatic learning” training model, users can complete automatic modeling and one-click deployment without writing code.
-
5.
Visual workflow: ModelArts uses GES to manage the metadata of development process and automatically visualize the relationship between workflow and version evolution, so as to realize model traceability.
-
6.
AI Market: ModelArts preset common algorithms and data sets, supporting the sharing of models within the enterprise or publicly.
8.2.2 Product Structure and Application of ModelArts
As a one-stop development platform, ModelArts supports the whole development process of developers from data to AI application, including data processing, model training, model management, model deployment and other operations. It also provides AI market functions, and can share models with other developers in the market. The product structure of ModelArts is shown in Fig. 8.19.
Modelarts supports the whole process development from data preparation to model deployment AI, and a variety of AI application scenarios, as detailed below.
-
1.
Image recognition: It can accurately identify the object classification information in the picture, such as animal identification, brand logo recognition, vehicle identification, etc.
-
2.
Video analysis: It can accurately analyze key information in video, such as face recognition and vehicle feature recognition.
-
3.
Speech recognition: It enables machines to understand speech signals, assisting in processing speech information, which is applicable to intelligent customer service QA, intelligent assistant, etc.
-
4.
Product recommendation: It can provide personalized business recommendation for customers according to their own properties and behavior characteristics.
-
5.
Anomaly detection: In the operation of network equipment, it can use an automated network detection system to make real-time analysis according to the traffic situation so as to predict suspicious traffic or equipment that may fail.
-
6.
In the future, it will continue to exert its strength in data enhancement, model training speed and weak supervision learning, which will further improve the efficiency of AI model development.
8.2.3 Product Advantages of ModelArts
The product advantages of ModelArts are reflected in the following four aspects.
-
1.
One-stop: Out of the “box”. It covers the whole process of AI development, including the functions of data processing, model development, training, management and deployment. One or more of these functions can be flexibly used.
-
2.
Easy to use: It offers a variety of preset models, and open source models can be used whenever wanted; model super-parameters are automatically optimized, which is simple and fast; zero code development is simple to operate for training their own models; one-click deployment of models to end, edge and cloud is supported.
-
3.
High performance: Moxing deep learning framework developed by ModelArts improves the efficiency of algorithm development and training speed; the utilization optimization of GPU in deep model reasoning can accelerate cloud online reasoning. As a result, models running on ascend chip can be generated to realize efficient end-side inference.
-
4.
Flexibility: It supports a variety of mainstream open source frameworks (TensorFlow, Spark)_ MLlib, etc.). It supports mainstream GPU and self-developed ascend chip. It supports exclusive use of exclusive resources. It supports user-defined mirror image to meet the needs of user-defined framework and operator.
In addition, ModelArts has the following advantages.
-
1.
Enterprise level: It supports massive data preprocessing and version management. It supports multi-scene model deployment of end, edge and cloud, to realize visual management of the whole process of AI development. It also provides AI sharing platform, assisting enterprises to build internal and external AI ecology.
-
2.
Intellectualization: It supports automatic design of models, which can train models automatically according to deployment environment and reasoning speed requirements. It also supports automatic modeling of image classification and object detection scenes, as well as automatic feature engineering and automatic modeling of structured data.
-
3.
Data preparation efficiency improved 100 times: It has a built-in AI data framework, which improves the efficiency of data preparation through the combination of automatic pre-annotation and difficult-case set annotation.
-
4.
Great reduction of model training time consumption: It provides Moxing high-performance distributed framework developed by Huawei, adopting the core technologies such as cascade hybrid parallel, gradient compression, convolution acceleration, to greatly reduce model training time consumption.
-
5.
The model can be deployed to the end, edge and cloud with one click.
-
6.
AI model deployment: It provides edge reasoning, online reasoning and batch reasoning.
-
7.
Accelerating AI development process with AI method—Automatic Learning: It provides UI guide and adaptive training.
-
8.
Creating whole process management: It realizes automatic visualization of development process, restart training breakpoints, and easy comparison of training results.
-
9.
AI Sharing—assisting developers to realize AI resource reuse: It realizes intra-enterprise sharing so as to improve efficiency.
8.2.4 Approaches of Visiting ModelArts
Huawei cloud service platform provides a web-based service management platform, namely management console and Application Programming Interface (API) management mode based on HTTPS request. ModelArts can be accessed in the following three ways.
-
1.
Management Console Mode
ModelArts provides a simple and easy-to-use management console, including the functions such as automatic learning, data management, development environment, model training, model management, deployment online, AI market, which can complete AI development end-to-end in the management console.
To use ModelArts management console, you need to register a Huawei cloud account first. After registering the Huawei cloud account, you can click the hyperlink of “EI Enterprise Intelligence → AI Services → EI Basic Platform → AI Development Platform ModelArts” on the Huawei cloud home page, and click the “enter console” button in the page that appears to log in to the management console directly.
-
2.
SDK Mode
If you need to integrate ModelArts into a third-party system for secondary development, you can choose to call ModelArts SDK. ModelArts SDK is a Python encapsulation of REST API provided by ModelArts service, which simplifies the user’s development work. For the specific operation of calling ModelArts SDK and the detailed description of SDK, please refer to the product help document “SDK Reference” on the official website of ModelArts.
In addition, when writing code in the Notebook of the management console, you can directly call ModelArts SDK.
-
3.
API Mode
ModelArts can be integrated into a third-party system for secondary development. Modelarts can also be accessed by calling ModelArts API. For detailed operation and API description, please see the product help document “API overview” on the official website of ModelArts.
8.2.5 How to Use ModelArts
ModelArts is a one-stop development platform for AI developers. Through the whole process management of AI development, it helps developers create AI models intelligently and efficiently and deploy them to the end, edge and cloud with one click.
ModelArts not only supports automatic learning function, but also presets a variety of trained models, integrating Jupyter Notebook to provide online code development environment.
According to different groups of users, different use-patterns of ModelArts are to be selected.
For business developers without AI development experience, ModelArts provides automatic learning function, which can build AI model with zero foundation. Developers don’t need to focus on development details such as model development, parameter adjustment. Just three steps (data annotation, automatic training, deployment online) are needed to complete an AI development project. The product help document “best practice” on the official website of ModelArts provides a sample of “Find Yunbao” (Yunbao is the mascot of Huawei Cloud), which is used to help business developers quickly get familiar with the use process of ModelArts automatic learning. This example is a scene project of “object detection”. Through the preset Yunbao image data set, the detection model is automatically trained and generated, and the generated model is deployed as an online service. After the deployment, users can identify whether the input image contains Yunbao through the online service.
For AI beginners with certain AI foundation, ModelArts provides preset algorithm based on the mainstream engine in the industry. Learners do not need to pay attention to the model development process. They directly use preset algorithm to train existing data and quickly deploy it as a service. The preset algorithm provided by modelarts in AI market can be used for object detection, image classification and text classification.
The product help document “best practice” on the official website of ModelArts provides an example of flower image classification application, which helps AI beginners quickly get familiar with the process of using ModelArts preset algorithm to build models. This example uses the preset flower image data set to annotate the existing image data, and then uses the preset RESNET_ v1_50. Finally, the model is deployed as an online service. After the deployment, users can identify the flower species of the input image through the online service.
For AI engineers who are familiar with code writing and debugging, ModelArts provides one-stop management capability, through which AI engineers can complete the whole AI process in one stop from data preparation, model development, model training and model deployment. ModelArts is compatible with mainstream engines in the industry and user habits. At the same time, it provides a self-developed MoXing deep learning framework to improve the development efficiency and training speed of the algorithm.
The product help document “Best Practice” on the official website of ModelArts provides an example of using MXNet and NoteBook to realize the application of handwritten digital image recognition, which helps AI engineers quickly comb the whole process of AI development of ModelArts.
MNIST is a handwritten numeral recognition data set, which is often used as an example of deep learning. This example will use MXNet native interface or NoteBook model training script (provided by ModelArts by default) for MNIST dataset, deploying the model as an online service. After the deployment, users can identify the numbers entered in the picture through the online service.
8.3 Huawei CLOUD EI Solutions
This chapter mainly introduces the application cases and solutions of Huawei Cloud EI.
8.3.1 OCR Service Enabling Whole-Process Automated Reimbursement
Huawei Cloud OCR service can be applied to financial reimbursement scenarios. It automatically extracts the key information of bills, helping employees automatically fill in reimbursement forms. Meanwhile, combined with Robotic Process Automation (RPA) to it can greatly improve the work efficiency of financial reimbursement. Huawei Cloud Bill OCR recognition supports OCR recognition of various bills such as VAT invoice, taxi invoice, train ticket, itinerary sheet, shopping ticket. It can correct the skew and distortion of pictures, effectively removing the impact of seal on character recognition so as to improve the recognition accuracy.
In financial reimbursement, it is very common to have multiple bills in one image. Generally, OCR service can only identify one kind of bill. For example, VAT invoice service can only identify a single VAT invoice. However, Huawei Cloud OCR service, an online intelligent classification and identification service, supports multiple formats of invoice and card segmentation. It can recognize one image of multiple tickets, one image of multiple cards, mixed card and ticket, and realize total charging. Combined with each OCR service, it can realize the recognition of various kinds of invoices and cards including but not limited to air ticket, train ticket, medical invoice, driver’s license, bank card, identity card, passport, business license, etc.
Financial personnel need to manually input the invoice information into the system after getting a batch of financial invoices. Even if you use OCR service of Huawei Cloud, you need to take photos of each financial invoice and upload it to the computer or server. Huawei Cloud can provide batch scanning OCR recognition solution, which only needs a scanner and a PC to scan invoices in batches through the scanner to generate color images. It can automatically call OCR service of Huawei Cloud in batch, to quickly complete the extraction process of invoice information, and visually compare the recognition results. It can also export the identification results to excel or financial system in batches, greatly simplifying the data entry process.
The solution has the following characteristics.
-
1.
Multiple access methods: automatic connection scanner, batch acquisition of images; high camera, mobile phone photo acquisition of images.
-
2.
Flexible deployment mode: supporting public cloud, HCS, all-in-one and other deployment modes, and unifying the standard API interface.
-
3.
Applicable to all kinds of invoices: VAT general/special/electronic/ETC/voucher, taxi fare/train ticket/itinerary sheet/quota invoice/toll, etc.
-
4.
Support one image of multiple invoices: automatic classification and recognition of multiple invoices.
-
5.
Visual comparison: location information return, Excel format conversion, easy for statistics and analysis.
The invoice reimbursement solution is shown in Fig. 8.20. The advantages of the solution are list as the followings: improving efficiency and reducing cost, optimizing operation, simplifying process and enhancing compliance.
8.3.2 OCR Supporting Smart Logistics
Couriers can take pictures of ID card through mobile terminals (such as mobile App) when picking up the items. With Huawei Cloud ID identification service, the identity information is automatically recognized. When filling in the express information, you can complete the automatic entry of the express information by uploading the address screenshot, chat record screenshot and other pictures, for OCR can automatically extract the information such as name, telephone, address. In the process of express transportation, OCR can also extract the waybill information to complete the automatic sorting of express delivery, judging whether the information in the express face sheet is complete. OCR service of Huawei Cloud supports OCR recognition of complex pictures from any angle, uneven illumination, incomplete, with high recognition rate and good stability, which can greatly reduce labor costs and enhance user experience. The smart logistics solution is shown in Fig. 8.21.
8.3.3 Conversational Bot
Usually, a single function robot can not solve all the problems in the customer business scenario. By integrating multiple robots with different functions, a joint solution of conversational bot is created, which is presented as a single service interface. Customers can only call a single interface to solve different business problems. The functional characteristics of each robot are as follows.
-
1.
Applicable Scenarios of Intelligent QABot
-
(a)
Intelligent QABot can solve common types of problems such as consultation, help seeking in the fields of IT, e-commerce, finance, government. In these scenarios, users frequently consult or seek help.
-
(b)
Intelligent QABot has knowledge reserve, with QA knowledge base, FAQ or similar documents, as well as work order and customer service QA data.
-
(a)
-
2.
Applicable Scenarios of TaskBot
-
(a)
TaskBot has clear conversational tasks. It can flexibly configure the conversational process (multi-round interaction) according to the actual business scenario. After loading the script template, TaskBot conducts multiple rounds of dialogue with customers based on speech or text in the corresponding scene, understanding and recording customers’ wishes at the same time.
-
(b)
Outbound Bot: This kind of TaskBot can complete various tasks such as return visit of business satisfaction, verification of user information, recruitment appointment, express delivery notice, sales promotion, screening of high-quality customers.
-
(c)
Customer Service: This kind of TaskBot can complete various tasks such as hotel reservation, air ticket reservation, credit card activation.
-
(d)
Intelligent Hardware: This kind of TaskBot can serve in many fields such as speech assistant, smart home.
-
(a)
-
3.
Applicable Scenarios of Knowledge Graph QABot
-
(a)
Complex knowledge system.
-
(b)
Answer requiring logical inference.
-
(c)
Multiple rounds of interaction.
-
(d)
A factual problem involving the value of an entity’s attributes or the relationship between entities that cannot be exhausted by enumeration.
The features of dialogue robot are as follows.
-
(a)
Multi robot intelligent integration, more comprehensive: a number of robots have their own strengths, self-learning and self optimization, so that the best answer can be recommended for customers.
-
(b)
Multiple rounds of intelligent guidance, better understanding: multiple rounds of dialogue, natural interaction, can accurately identify the user’s intention, so that the user’s potential semantics can be understood.
-
(c)
Knowledge graph, smarter: general domain language model + domain knowledge graph; dynamic updating of map content; more intelligent robot based on graph. The architecture of conversational bot is shown in Fig. 8.22.
-
(a)
Intelligent QABot based on knowledge graph can conduct accurate knowledge Q&A. For example, vehicle conversational bot can be applied to query the price and configuration of a specific vehicle model. It can recommend vehicles according to price and level type. It can also conduct vehicle comparison, and offer the corresponding information such as text, table, picture. Vehicle conversational bot is shown in Fig. 8.23.
8.3.4 A Case of Enterprise Intelligent Q&A in a Certain District
Enterprise intelligent question answering system in a district of Shenzhen provides relevant business robots with automatic response. The question that are not directly answered by the robots will be automatically recorded, and then the follow-up manual answering will be pushed to the questioner. The system provides a complete closed-loop solution for unsolved problems, which can realize the continuous optimization process of unsolved problems recording, artificial closed-loop knowledge formation, model annotation and optimization, making the robot more intelligent. Enterprise intelligent question answering system is shown in Fig. 8.24.
Business related to enterprise intelligent QA system mainly includes the following three categories.
-
1.
Policy consultation (frequent policy changes).
-
2.
Enterprise matters in office hall (more than 500 items).
-
3.
Appeals (various types).
8.3.5 A Case in Genetic Knowledge Graph
Genetic knowledge graph includes various types of entities, such as gene, mutation, disease, drug, etc., as well as complex relationships between genes and mutation, variation and diseases, diseases and drugs. Based on this graph, the following functions can be realized.
-
1.
Entity query: Based on genetic knowledge graph, the information of an entity (gene, mutation, disease, drug) can be quickly searched.
-
2.
Auxiliary diagnosis: Based on the results of genetic testing, the possible variation or disease can be inferred by the graph so as to give diagnosis and treatment suggestions and recommend drugs.
-
3.
Gene testing report: Based on the structured or semi-structured data of gene entity and its association knowledge with variation and disease, the readable gene testing report will be generated automatically.
Genetic knowledge graph is shown in Fig. 8.25.
8.3.6 Policy Query Based on Knowledge Graph
The state government often issues some incentive policies to enterprises, such as tax reduction and tax rebate policies. The contents of the policies are so professional, that ordinary people find it hard to understand and need professional interpretation.
There are many kinds of policies and reward categories. There are more than 300 conditions for enterprises to be recognized by policies, and there are logical relations between the conditions of the same policy, such as and, or, and not. Therefore, it is very difficult for enterprises to quickly obtain the policies they can enjoy.
Through policy knowledge map construction, all sorts of policy incentives and identification conditions are . In addition, we can build a knowledge map of enterprise information. Finally, we only need to input a enterprise name, and automatically obtain the value of various information (identification conditions) of the enterprise from the enterprise map, such as type, tax amount, scale and other identification conditions. Based on these identification conditions, we can Finally, all the policies and rewards that the enterprise can enjoy are obtained. The policy query based on knowledge map is shown in Fig. 8.26.
8.3.7 A Case in Smart Park
Tian’an Cloud Valley is located in Banxuegang Science and Technology City, the central area of Shenzhen, covering an area of 760,000 square meters, with a total area of 2.89 million square meters. It focuses on the new generation of information technology such as cloud computing and mobile Internet, as well as leading industries such as robot and intelligent device research and development. At the same time, the relevant modern service industry and productive service industry are developed around it. To meet the needs of leading industries, Tian’an Cloud Valley provides open and shared space and intelligent environment construction, so as to create a smart industry city ecosystem fully connected with enterprises and talents.
This project adopts the video analysis scheme of Edge-cloud collaboration. Video analysis models such as face recognition, vehicle recognition, intrusion detection are distributed to the local GPU inference server in the park. After the analysis of the real-time video stream is completed locally, the analysis results can be uploaded to the cloud or saved to the docking of the local upper application system.
By adopting the video analysis scheme of Edge-cloud collaboration, the park realizes intelligent analysis of surveillance video, real-time intruder perception, large flow of people and other abnormal events, so as to reduce the labor cost of the park. At the same time, the existing IPC cameras in the park can be used to change into intelligent cameras through edge cloud collaboration, which greatly protects users’ stock assets. Smart park is shown in Fig. 8.27.
The end-side is an ordinary high-definition IPC camera, while the edge adopts a hardware GPU server. The competitiveness and value of edge video analysis are as follows.
-
1.
Business value: The park conducts an intelligent analysis of surveillance video, with a real-time detection of abnormal events such as intrusion, large flow of people, so as to reduce the labor cost of the park.
-
2.
Edge-cloud collaboration: The edge applies life-cycle management, with seamless upgrading.
-
3.
Cloud model training: The model automatically conducts training, with good algorithm scalability, easy to update.
-
4.
Good compatibility: The existing IPC cameras in the park can be used to change to intelligent cameras through edge cloud collaboration.
8.3.8 A Case in Pedestrian Counting and Heat Map
Pedestrian counting and heat map are mainly used to identify the crowd information in the screen, including the number of personnel information and the heat information of regional personnel. User-defined time setting and result sending interval setting are supported. It is mainly applied to pedestrian counting, visitor counting and heat identification in commercial areas, as shown in Fig. 8.28.
The following improvements can be achieved by using pedestrian counting and heat map.
-
1.
Strong anti-interference: It supports pedestrian counting in complex scenes, such as face or body partially being covered.
-
2.
High scalability: It supports the simultaneous sending of pedestrian crossing statistics, regional statistics and heat map statistics.
-
3.
Usability improvement: It can be connected to the ordinary 1080P surveillance camera.
8.3.9 A Case in Vehicle Recognition
Vehicle recognition is shown in Fig. 8.29. With vehicle recognition, the following improvements can be achieved.
-
1.
Comprehensive scene coverage: It supports various scenarios such as vehicle type, body color and license plate recognition in various scenes such as electric police, bayonet, etc.
-
2.
High ease of use: Access to the ordinary 1080P surveillance camera, it can identify the vehicle information in the picture, including license plate and vehicle attribute information. It can recognize vehicle types such as cars, medium-sized vehicles, and vehicle colors as well, including blue license plate and new energy license plate. It is mainly used in various scenarios such as park vehicle management, parking lot vehicle management and vehicle tracking.
8.3.10 A Case in Intrusion Identification
Intrusion identification is mainly used to identify the illegal intrusion behavior in the screen. It supports the extraction of the moving target in the camera field of vision. When the target crosses the designated area, an alarm will be triggered. It also supports the minimum number of people in the alarm area setting, alarm trigger time setting and algorithm detection cycle setting. Intrusion detection is mainly used for identification of illegal entry into key areas, illegal entry into dangerous areas or illegal climbing. Intrusion identification is shown in Fig. 8.30.
Using intrusion detection, the following improvements can be achieved.
-
1.
High flexibility: It supports flexible alarm target sizes and category settings.
-
2.
Low false alarm rate: It supports intrusion alarm based on person/vehicle, filtering interference from other objects.
-
3.
Usability improvement: It can be accessed to the ordinary 1080P surveillance camera.
8.3.11 CNPC Cognitive Computing Platform: Reservoir Identification for Well Logging
With the completion and improvement of the integrated system, CNPC has accumulated a large amount of structured data and unstructured data. The structured data has been well used, but the unstructured data has not been fully applied. Moreover, the relevant knowledge accumulation and expert experience have not been fully exploited, and the intelligent analysis and application ability of data is insufficient.
The unstructured data features large data capacity, numerous varieties and low value density.
Cognitive computing represents a new computing mode, which is the advanced stage of artificial intelligence development. It contains a lot of technological innovation in the fields of information analysis, natural language processing and machine learning, which can help decision makers to obtain valuable information from a large number of unstructured data.
By using Huawei cloud knowledge map and NLP technology, CNPC has constructed the knowledge map of oil and gas industry. Based on the knowledge map, it constructs the upper business application (reservoir identification for well logging is identified as one of business scenarios, and other scenarios include seismic horizon interpretation, water content prediction, working condition diagnosis, etc.) Finally the following functions are realized.
-
1.
Knowledge aggregation: The knowledge map of oil and gas industry can precipitate the professional knowledge of oil and gas industry.
-
2.
Cost reduction and efficiency enhancement: Based on the knowledge map of oil and gas industry, the upper business application can simplify the business process and shortens the working time.
-
3.
Reserve growth and production improvement: Based on the knowledge map of oil and gas industry, the upper business application can advance proved reserves and ensure energy security.
The solution of reservoir identification for well logging features as follows.
-
1.
It can flexibly modify and manually intervene the key links such as ontology, data source, information extraction, knowledge mapping and knowledge fusion.
-
2.
Simple knowledge reuse is simple: It can quickly create new pipeline tasks and build atlas based on existing ontology and data source.
-
3.
Flexible modification and one-click effect: It can test frequently and quickly to improve efficiency. Finally, it can shorten the time by 70% and increase the coincidence rate by 5%. Reservoir identification for well logging is shown in Fig. 8.31.
8.4 Chapter Summary
This chapter first introduces Huawei Cloud EI ecology, and its related services are explained. Then, it focuses on the Huawei EI basic platform—ModelArts, and users can learn more about its services from the listed experiments. Finally, the relevant cases in the practical application of enterprise intelligence are discussed.
It is necessary to note that Huawei is committed to lowering the threshold of AI application. In order to assist AI lovers better understand Huawei cloud EI application platform, Huawei cloud official website has set up EI experience space and EI course training camp, as shown in Figs. 8.32 and 8.33.
8.5 Exercises
-
1.
Huawei cloud EI is an enterprise intelligence enabling agent. Based on AI and big data technology, it provides an open, trusted and intelligent platform through cloud services (public cloud, dedicated cloud, etc.). What services does Huawei cloud EI service family currently include?
-
2.
In Huawei cloud EI service family, the solutions for large-scale scenarios are called EI agents. What are they?
-
3.
In Huawei cloud EI service family, what is EI basic platform consist of?
-
4.
ModelArts belongs to EI basic platform in Huawei cloud EI service family. It is a one-stop development platform for AI developers. What functions does it have?
-
5.
As a one-stop AI development platform, what are the advantages of ModelArts products?
Author information
Authors and Affiliations
Consortia
Appendices
Appendix 1: Introduction to the API of HiAI Engine
8.1.1 Face Recognition
-
1.
Face Comparison
Face comparison API performs a precise comparison of human images by recognizing and extracting face features, so that a confidence score is given to determine whether the figure is the same person. Face comparison technology can be applied to intelligent classification of photo gallery. Based on the advanced algorithm of end-side intelligent image recognition, it has high accuracy in face recognition and excellent application experience.
The application scenarios of the algorithm are not recommended for authentication such as mobile phone unlocking and secure payment. It can be used in the scenes where face comparison function is needed in App, such as similarity comparison of between two people or between an average person and a star in entertainment App.
When two photos of the same person are compared, the comparison result indicates that they are the same person, which has a high confidence score. When the two photos are not of the same person, the comparison result indicates they are not the same person, with a low confidence score.
With this API, the time of algorithm development can be greatly saved. Rom space occupied by algorithm model can be saved too, so that the application is more portable. It can be used to realize local processing of data without network connection.
-
2.
Face Detection
Face detection API detects the face in the image before returning the high-precision rectangular coordinates of the face, which can be used as a key module to realize the functions of application and screen startup or shutdown. It can beautify the specific position of the face by locating facial features and position. Face detection is widely used in various face recognition scenes, such as face unlocking, face clustering, beautification.
Adapting to common light, various head postures and occlusion, face detection supports multi-ethnic and multi face detection. Face detection achieves high detection rate and low false detection rate.
-
3.
Face Analysis
Face analysis is to decompose the human head (including facial features) into different areas such as hair, facial skin, eyes, eyebrows, nose, mouth and ears. Its main function is to analyze the face in the input picture, providing the analysis results of various facial areas, including background, facial skin, left/right eyebrow, left/right eye, nose, upper lip/mouth interior/lower lip, left/right ear, neck, spectacles and sunglasses. Different parts are marked with different colors.
-
4.
Face Attributes
Face attributes are a series of biological characteristics that represent face features. They have strong self stability and individual differences, which can identify human identity. Face attributes are composed of gender, skin color, age, facial expression.
The main function of this API is to recognize the face attributes in the input picture, identifying the gender. It supports seven facial expressions—— joy, grieve, astonishment, anger, pout, grimace and neutral facial expression, and three character attributes—— gender, age and wearing (wearing glasses, hat or beard). It also supports facial expression and attribute recognition of multiple faces.
-
5.
Face Orientation Recognition
Face orientation recognition can check whether there is a person in the visual field of mobile phone camera, and identify the orientation of the face.It provides important information for the decision-making system of the smart phone. For example, face orientation recognition is applied to various scenes such as intelligent screen on, intelligent screen off, intelligent rotation, image rotation control. Face orientation means the direction of intermediate datum line (pointing to the top of the head) in facial plane, which can be divided into five scenarios, namely, no direction (unmanned), up, right, down and left.
Based on visible light image recognition technology, face orientation recognition uses image recognition technology to detect five categories of face up, face right, face down, face left and unmanned face in the plane. Through this API, we can get the specific category and confidence of the face orientation in the image.
The algorithm is used to check the face orientation information in the image. It can be applied to various scenes such as detecting the presence of person and judging the face orientation.
-
6.
Facial Feature Detection
Facial feature detection API can detect facial features of input image before returning the coordinates of facial landmarks (the current number is 276) , which represent the contour position of facial features. It can provide input for subsequent processing such as beautification, face modeling, facial expression recognition.
8.1.2 Human Recognition
-
1.
Pose Estimation
Pose estimation is the basis of many computer vision tasks, such as motion classification, abnormal behavior detection and autonomous vehicle. It weighs a lot in posture description and behavior prediction. In recent years, with the development of deep learning technology, it has been widely used in the related fields of computer vision.
Pose estimation mainly detects some key points of human body, such as joints, facial features, through which human bone information are stated.
-
2.
Video Portrait Segmentation
Video portrait segmentation API supports the real-time processing of real-time video streams (such as mobile camera). The developer transmits each frame image of the real-time video stream to HiAI Engine. Then the algorithm splits the portrait in the image, the mask result of byte array is returned to the user.
Through video portrait segmentation, users can render the foreground (person), such as blurring and beautification. The background can also be processed such as background replacement and background removal.
8.1.3 Image Recognition
-
1.
Aesthetic Score
Implementing an advanced multi-dimensional scoring technology of video AI, the aesthetic engine comprehends complex subjective aspects in images, making high-level judgments related to the attractiveness, memorability and engaging nature of an image. It can be applied in various video intelligent scenarios, such as auxiliary photography, photo group auxiliary, auxiliary video editing and video splitting.
This API can be used in photography or photo management apps, such as those for personal photo album management, automatic photo editing, and auxiliary photo shooting. The aesthetic engine's algorithms help realize the multi-dimensional evaluation of images, from aesthetic, technology, and compositional perspectives.
-
2.
Picture Classification Tagging
Based on deep learning method, picture classification tagging API identifies the information in the picture such as object, scene and behavior. The corresponding tag information, such as flowers, birds, fish, insects, cars and buildings can be returned. This API can be applied in various intelligent scenarios concerning picture content understanding, such as automatic classification and sorting of picture library, social picture recognition and sharing. Rich in tag information, this API supports the recognition of 100 kinds of common objects, scenes and behaviors. It creates a leading end-side intelligent image recognition algorithm, which has strong classification tag recognition and high accuracy.
-
3.
Image Super-resolution
Image super-resolution is based on the extensive application of deep learning in computer vision, which can enlarge pictures intelligently or remove the compressed noise under the condition of constant resolution, so as to obtain clearer, sharper and cleaner photos than traditional image processing.
The algorithm, based on deep neural network and NPU chip of Huawei mobile phone, is nearly 50 times faster than pure CPU computing. This API is built into Huawei mobile phone, with less ROM and RAM consumption, which has a smaller-sized and lighter application.
-
4.
Scene Detection
By identifying the scene of the image content, the scene detection API can quickly classify the input image, supporting multi types of scene recognition at present. The recognition scenes cover a variety of categories with high recognition accuracy, including animals, plants, food, buildings and cars. Through scene recognition, adding intelligent classification tags to images can be widely used in various scenarios such as creating intelligent photo albums and image classification management.
Generally, different scenes require different preferences or strategies for the photographic effect. This API can provide decision-making basis, so that the image rendering effect can choose a better strategy for each characteristic scene.
-
5.
Document Detection and Correction
Document detection and correction can realize the auxiliary enhancement in the process of document reproduction. It can automatically identify the document in the picture, returning the position information of the document in the original picture. Documents here generally indicate square-shaped items, such as books, photos, and picture frames. This function contains two sub functions: document detection and document correction.
Document detection: It identifies the document in the picture and return the position information of the document in the original picture.
Document correction: According to the position information of the document in the original picture, it can correct the shooting angle of the document (the correction area can be customized), while automatically adjusting the shooting angle to the angle facing the document. This function works well in situations where old paper photos, letters, or paintings are reproduced into electronic versions.
-
6.
Text Image Super-resolution
The text content in an image usually contains very important information. However, the text content may be blurred caused by shooting restriction, low resolution and remote subject. The text image super-resolution API can magnify the image containing text content by nine times (three times for height and three times for width), while significantly enhancing the clarity of the text in the image.
In the scene of text file reproduction, the identifiability of text is improved as a result of the boosted sharpness of images. At present, the algorithm is based on the deep neural network development, which makes full use of the NPU chip of Huawei mobile phone to accelerate the neural network. The acceleration ratio can reach more than ten times.
-
7.
Portrait Segmentation
Portrait segmentation refers to the separation of the portrait and background into different areas, with different tags to distinguish.
This API can implement portrait segmentation on the part of the input image containing the portrait, and the segmentation results are displayed as the portrait and background differentiation. It can be used to foreground replacement, background replacement and background blurring.
-
8.
Image Semantic Segmentation
The image is recognized and segmented at the pixel level, so as to obtain the category information and accurate position information of the object in the image. As the basic information of image semantic understanding, these contents can be used for subsequent image enhancement processing. This API currently supports the recognition and segmentation of ten types of objects, i.e., human, sky, plants (including grass and trees), food, pets, buildings, flowers, water, beaches and mountains.
This API is used for pixel-level recognition and segmentation of photographic images, which can be applied to the scenarios such as app auxiliary photography and street scene recognition.
8.1.4 Code Recognition
Code recognition gets the information contained in the code by identifying QR code and bar code, providing service framework which can be integrated into its application.
This API covers the resolution of QR code/bar code image in 11 scenarios such as Wi-Fi and SMS. In addition to effective code detection, it also provides service capability based on detection results. It can be widely used in code scanning services of various applications, such as QR code and bar code recognition.
8.1.5 Video Technology
-
1.
Video Summary
The aesthetic engine comprehends complex subjective aspects in images, making high-level judgments related to the attractiveness, memorability and engaging nature of an image, based on the multi-dimensional comprehensive aesthetic scoring technology of video AI. It can be applied in various video intelligent scenarios, such as auxiliary photography, photo group auxiliary, auxiliary video editing and video splitting.
This API can be used in photography or photo management apps, such as those for personal photo album management, automatic photo editing, and auxiliary photo shooting. The aesthetic engine’s algorithms help realize the multi-dimensional evaluation of images, from aesthetic, technology, and compositional perspectives.
-
2.
Video Cover
Implementing the multi-dimensional comprehensive aesthetic scoring technology based on video AI, the aesthetic engine can complex subjective aspects in images, making high-level judgments related to the attractiveness, memorability and engaging nature of an image. It can be applied in various video intelligent scenarios, such as auxiliary photography, photo group auxiliary, auxiliary video editing and video splitting.
The API can be used in photography or photo management apps, such as personal album management, automatic photo editing, auxiliary photo shooting. The algorithm of aesthetic engine help realize the multi-dimensional evaluation of images, from aesthetic, technology, and compositional perspectives, so as to obtain the static cover and dynamic cover with the highest aesthetic score.
8.1.6 Text Recognition
-
1.
General Character Recognition
The core of general character recognition is optical character recognition (OCR) technology, which transforms the characters of various bills, newspapers, books, manuscripts and other printed matter into image information through optical input methods such as scanning. And then the image information is converted into usable computer input technology by OCR. It plays an increasingly important role in the process of smart mobile phone. In smart phones, OCR is used in more and more applications, such as the identification of documents, road signs, menus, business cards, certificates, scanning questions. The current end-side general character recognition interface is divided into focusing photography OCR and mobile phone screenshot OCR.
Focusing photography OCR API is applicable to various sources of image data such as cameras and galleries. It provides an open interface for automatic detection and recognition of text position and content in images. Focusing photography OCR API can support scenes such as text tilt, shooting angle tilt, complex lighting conditions and complex text background to a certain extent. It can be used for text detection and recognition of document reproduction and street view reproduction. It has such a wide range of applications and strong anti-interference ability that it can be integrated into other applications to provide text detection and recognition services, and related services based on the results.
Aiming at the characteristics of mobile phone screenshots, mobile phone screenshot OCR API provides light and fast text extraction function for mobile phone screenshot pictures on the end side, which is convenient for subsequent processing and service docking, such as copy, editing, word segmentation and semantic analysis. The API custom hierarchical result return, by which the text block, text line and text character coordinates in the screenshot can be returned according to the application needs. This API provides an optimized algorithm for text extraction. The average time of text extraction for simple background mobile phone screenshot API is less than 200 ms, while the average time of general mobile phone screenshot OCR API is less than 500 ms.
-
2.
Table Recognition
Based on the ability of focusing photography OCR, table recognition can identify the text in the input picture and detect the structure information of the table, including the location information of the cell, the number of occupied rows and columns of the cell, and the text information in each cell. The table recognition API currently supports page scenarios (paper, printed pages, etc.) and projection scenarios (conference room presentation projection).
The table recognition API is applied to the content recognition of various table scenarios. The three-party app can use the results returned by the engine to generate Excel files so as to reduce the cost of manual input.
-
3.
Passport Recognition
Passport recognition is based on OCR technology, which extracts the text information from a passport photo taken by mobile phones or in image gallery. With the help of the general OCR text detection and text recognition ability, it extracts the key information from the passport image before returning it to the user or the third-party application, which helps users to quickly enter the document information, and saves from the troubles of manual input. Passport recognition is applicable to various conditions such as the horizontal or vertical shot of mobile phone, complex lighting conditions. It can be integrated into a variety of applications, providing services for identification and identification of third-party applications.
Passport is a legal document issued by a country to prove the nationality and identity of a citizen when he or she enters or leaves the country and travels or resides abroad. The entry of passport information is currently involved in many apps. Due to the large amount of information on the passport, if the passport information is entered manually, it will bring a lot of problems such as low efficiency and poor users’ experience. In order to improve the speed and accuracy of passport information input on mobile terminals, Huawei has developed passport recognition OCR technology to meet the application needs of various industries, offering better experience to users. Users only need to integrate the passport recognition SDK into the app to scan and recognize the passport information through the mobile phone camera.
-
4.
ID Card Identification
ID card identification is an important application based on OCR technology. By calling the ability of focusing photography OCR, the mobile phone can directly take photos of the certificate and extract the key information on the certificate.
ID card identification can extract key information such as name, gender, birth, certificate number from ID card photos.
At present, the identity of customers requires to be verified in many apps such as payment apps (UnionPay, XX bank, Huawei pay), travel apps (Didi and 12306), and hotel apps (Huazhu ). The customers are required to upload their ID card photos while using these apps. The ID card identification API assists such apps to automatically identify the user's ID card information.
Huawei's ID card recognition function extracts important information on the ID card by calling OCR ID card recognition ability, and outputs it in the form of JSON.
-
5.
Driving License Identification
OCR driving license recognition API can be used to quickly identify key information on the license.
Through the identification of driving license, the information on the license is recognized in the form of JSON, which is convenient for the rapid information entry. The accuracy of identification is more than 97%, and the recalling rate is more than 97%.
-
6.
Vehicle License Recognition
OCR vehicle license recognition API can be used to quickly identify the key information on the license.
Through vehicle license recognition, the license information is identified as JSON, which is convenient for the rapid entry of license information. The recognition accuracy is more than 97%, and the recall rate is more than 97%.
-
7.
Document Conversion
Document conversion API can easily convert images to documents (such as presentations). This API can identify the document and the text of it as well, then return the identified document and text to the client-side, which can restore the information to the presentation format.
Only by calling one interface, developers can quickly obtain the document detection and correction, text supersession, OCR detection results.
-
8.
Bank Card Identification
The function of the bank card identification API is to identify the bank card number in the input picture.
Through the identification of bank card, the card number information on the bank card is extracted and output in the form of corresponding card object.
The accuracy rate of bank card identification is more than 99%, and the recognition recall rate is more than 99%.
8.1.7 Speech Recognition
People have long dreamed to talk with the machine and let the machine understand what man says. SharKing IOT Circle vividly compared speech recognition to “machine hearing system”. Speech recognition technology, also known as automatic speech recognition (ASR), is a technology that enables the machine to transform speech signals into corresponding text or commands through recognition and understanding process.
Huawei speech recognition engine is oriented to mobile terminals, providing developers with AI application layer API. It can transform speech files and real-time speech data streams into Chinese character sequences, with the recognition accuracy rate over 90% (the accuracy rate of local recognition is 95%, the accuracy of cloud recognition is 97%), giving users a lively application experience.
This API can be applied to develop the third-party applications with speech recognition requirements in various scenarios, such as speech input method, speech search, real-time subtitle, games, chatting, human-computer interaction, driving modes.
8.1.8 Natural Language Processing
-
1.
Word Segmentation
With the development of information technology, the rapid growth of network information, text information and the geometric growth of information dominates in today’s society. In order to extract the key information of text, word segmentation becomes particularly important in search engine and other fields. As a basic research in the field of natural language processing, word segmentation has derived various applications related to text processing.
The word segmentation API provides an interface for automatic word segmentation of text. For a piece of input text, the API can automatically segment words. At the same time, it provides different word segmentation granularity, which can be customized according to the needs.
-
2.
Part-of-speech Tagging
The part-of-peech tagging API provides an interface for text automatic word segmentation and part of speech. For a piece of input text, the API can automatically segment the word and give the corresponding part of speech. At the same time, it provides different word segmentation granularity, which can be customized according to the needs.
-
3.
Assistant Intention Recognition
With the popularity of human-computer interaction, the device needs to understand various instructions issued by users to facilitate the operation of users. Assistant intention recognition is to analyze and identify the text messages sent to the device by using machine learning technology. Based on semantic analysis, various intelligent application scenarios can be derived from assistant intention recognition, which makes smart devices more intelligent.
This API can be applied to speech assistant. Through intelligent interaction between intelligent dialogue and instant question and answer, the API can help users solve problems quickly.
-
4.
IM Intention Recognition
IM intention recognition indicates the use of machine learning technology to analyze and recognize the intention of text messages of user’s SMS or chatting apps (such as wechat and QQ). Based on semantic analysis, machine learning technology is used to identify and understand the intention of user’s message. Through IM intention recognition, a variety of intelligent application scenarios can be derived, making smart devices more intelligent.
This API provides an interface for identifying the intention of user's SMS or text messages on chatting apps. Through this API, the intention of text messages can be automatically analyzed and identified. At present, only three intentions of notification message are supported, namely, repayment reminder, successful repayment and missed calls.
-
5.
Keyword Extraction
In our daily life, we are filled with all kinds of information, which is composed of ever-changing languages, integrating physics, mathematics, linguistics, computer and other disciplines into one. As a carrier of information, both useful information and useless information are contained in these languages. Keyword extraction is to quickly extract the key information and the core content from the vast information sea.
The keyword extraction API provides an interface to extract keywords. It can be used to extract the core content of the text from a large amount of information, which can be entities with specific meaning, such as person name, place, movie, etc., or some basic but key words in the text. Through the API, the extracted keywords can be sorted from high to low according to the weight in the text. The higher the ranking is, the more accurate the core content of the text is extracted.
-
6.
Entity Recognition
The entity recognition API can extract entities with specific meaning from natural language, and then complete a series of related operations and functions such as search.
It covers a wide range, meeting the needs of entity recognition in daily development and offering a better application experience. It has a high accuracy for entity recognition, for it can accurately extract entity information, making a key impact on information-based follow-up services.
Appendix 2: Key to Exercises
8.1.1 Chapter 1
-
1.
As long as the answer makes sense.
-
2.
Among the three, machine learning is a way or subset of artificial intelligence, and deep learning is a special kind of machine learning. Artificial intelligence can be compared to brain. Machine learning is a process to master cognitive ability, and deep learning is a very efficient teaching system in this process. Artificial intelligence is the purpose and the result, while deep learning and machine learning are methods and tools.
-
3.
As long as the answer makes sense. Take smart medicine as an example. By using artificial intelligence technology, we can let AI “learn” professional medical knowledge, “memorize” a large number of historical cases, identify medical images with computer vision technology. And doctors can be equipped with reliable and efficient intelligent assistants. For example, in today's widely used medical imaging technology, researchers can establish models by using the past data to identify the existing medical images, so as to quickly determine the focus of patients and improve the efficiency of consultation.
-
4.
Operator level fusion engine Fusionengine, CCE operator library, efficient and high-performance user-defined operator development tools, low-level compiler.
-
5.
Answer according to individual understanding.
8.1.2 Chapter 2
-
1.
For a certain kind of task T and performance measure P, the performance of a computer program measured by P on T improves with experience E, so we call the computer program learning from experience E.
-
2.
Variance is the degree of deviation of the prediction results near the mean value, while deviation is the difference between the mean value and the correct value of the prediction results. Over-fitting model generally features low deviation and high variance.
-
3.
The calculation of precision and recall in Figs. 8.2–8.25 is given in this chapter, which are 0.875 and 0.824 respectively. According to the formula, FF1 = 2 × 0.875 × 0.824/(0.875 + 0.824) = 0.848.
-
4.
Validation sets can be used to help model search for hyperparameters, while test sets cannot participate in model training in any form. Validation sets are introduced for cross-validation.
-
5.
New features can be constructed based on existing features, and then polynomial regression can be used. For example, the feature x of all samples is squared and x2 is added to the dataset as a new feature.
-
6.
There are many methods to extend the binary classification SVM to multi classification problems, One-against-one method is one of them. For each two categories in the data set, one-against-one method will build a binary classification SVM, so there are \( {\mathrm{C}}_k^2 \) models that need to be trained , where k represents the number of categories. In the prediction, each model gives a classification result for the new sample, which is equivalent to a vote on the category to which the sample belongs. Finally, the category with the most votes is regarded as the classification result (one of them can be selected when there is tie).
-
7.
Gaussian kernel function does not map vector to infinite dimensional space, and then calculate inner product, because this method is not feasible. In fact, it can be proved that the calculation of Gauss kernel function for the difference between two vectors is equivalent to the above process. This is the principle of Gaussian kernel function.
-
8.
Gradient descent algorithm is not the only way to train the model. Other methods such as genetic algorithm and Newton algorithm can be used to train the model. The disadvantages of gradient descent algorithm are as follows: easy to fall into local extreme value, only suitable for differentiable functions, and not considering the sensitivity of different parameters.
8.1.3 Chapter 3
-
1.
There is little need for computer hardware in traditional machine learning , while deep learning needs a lot of matrix operations, and GPU for parallel computing. Traditional machine learning is suitable for small amount of data training, while deep learning can obtain high performance under massive training data. Traditional machine learning needs to decompose the problem layer by layer, while deep learning is an end-to-end learning. Traditional machine learning requires manual feature selection while deep learning uses algorithms to extract features automatically. The characteristics of traditional machine are highly interpretable, while the characteristics of deep learning are weak in interpretation.
-
2.
The activation function introduces nonlinearity into neural network. Although the perceptron model is linear, the neural network with nonlinear activation function is no longer linear, so it can solve nonlinear problems, such as XOR problem.
-
3.
The output of Sigmoid function is not centered on 0, easy to saturate. The tanh function corrects the function output so that it is centered on 0. But it does not solve the problem of easy saturation, which may cause the gradient to disappear.
-
4.
The goal of regularization method is to reduce the generalization error of the model. Dropout is a kind of general regularization method with simple calculation. Its principle is to construct a series of sub networks with different structures and combine them in a certain way, which is equivalent to using the method of ensemble learning.
-
5.
Compared with Adam optimizer, momentum optimizer is slower but not easy to over fit.
-
6.
\( \left[\begin{array}{l}4\kern1em 3\kern1em 4\\ {}2\kern1em 4\kern1em 3\\ {}2\kern1em 3\kern1em 4\end{array}\right] \)
-
7.
The memory unit of recurrent neural network can realize the memory function by taking its own output as input. But the memory of recurrent neural network is very limited, and it can't deal with long sequence effectively. Alternative models are LSTM or GRU.
-
8.
The result of the game is improved by generating the discriminator and generator of the alternate training of the counter network.
-
9.
The problems of gradient vanishing and gradient explosion are caused by too deep network and unstable updating of network weight. The methods to deal with gradient vanishing problem are pre-training, the use of ReLU activation function, LSTM neural network, and residual module. The main scheme to deal with gradient explosion is gradient shear.
8.1.4 Chapter 4
-
1.
The mainstream development framework of artificial intelligence is as follows.
-
Tensorflow: Based on graph operation, and the variables of each link in training can be controlled by the node variables on the graph. Especially in the low-level operation, TensorFlow is easier than other frameworks.
-
Keras: TensorFlow, CNTK, MXNet and other well-known frameworks all provide support for Keras call syntax, and the API call method of building model has gradually become the mainstream. The code written in Keras is much more portable.
-
Pytorch: Its framework is also fairly extensible, but some interfaces are not comprehensive enough. Its biggest drawback is that it needs the support of LuaJIT and uses Lua language to program. The general performance of Pytorch is poor as Python is prevailing today.
-
-
2.
The main difference between TensorFlow 1.0 and TensorFlow 2.0 is that the former uses static diagram, which is more efficient, wheras the latter uses dynamic diagram, which is easier to debug. Meanwhile, TensorFlow version 2.0 has stronger cross platform capability so that it can be deployed on various platforms such as Android, JavaScript, Java.
-
3.
tf.errors:The exception type of TensorFlow error.
tf.data: Realize the operation of data set. Use the input pipeline created by tf.data. to read the training data. It also supports convenient data input from memory (such as NumPy).
tf.distributions:Realize various distributions in statistics
-
4.
Characteristics of Keras: Keras itself is not a framework, but an advanced API on top of other deep learning frameworks. At present, it supports TensorFlow, Theano and CNTK, with good scalability, simple API, user-friendliness and complete documents. Therefore, Keras is widely used.
-
5.
Answer is omitted.
8.1.5 Chapter 5
-
1.
The features of MindSprore architecture include friendly development state (AI algorithm is code), efficient running state (Ascend/GPU optimization is supported), flexible deployment state (full scene on-demand collaboration).
-
2.
MindSprore proposed three technological innovations: new programming paradigm, new execution mode and new collaboration mode.
-
3.
On-device implementation is adopted and the whole image is sunk to give full play to the computing power of Ascend AI processor. MindSprore maximizes the parallelism of “data computing communication” by using the chip-oriented depth map optimization technology, which minimizes synchronous waiting and sinks the entire data + computing graph into Ascend chip for optimal effect.
-
4.
See Sect. 5.2.3.
8.1.6 Chapter 6
-
1.
GPU is mainly faced with highly unified, interdependent large-scale data and pure computing environment that does not need to be interrupted: thousands of cores, design based on high throughput; speciality in computing intensive and parallel programs.
CPU requires a strong generality to handle different data types. At the same time, logic judgment is needed too. A large number of branch jump and interrupt processing are introduced: only a few cores; low delay design; speciality in logic control and serial operation.
-
2.
Computing unit, storage system and control unit.
-
3.
Cube unit, vector unit and scalar unit.
-
4.
The four layers are L3 application enabling layer, L2 execution framework layer, L1 chip enabling layer and L0 computing resource layer. The tool chain mainly provides auxiliary capabilities such as engineering management, compilation and debugging, matrix, log and profiling.
-
5.
Matrix, DVPP module, tensor boost engine, framework, runtime and task scheduler.
-
6.
Ascend 310 is used for inference, while Ascend 910 is mainly for training.
-
7.
The inference products are mainly composed of Atlas 200 AI acceleration module, Atlas 200 DK, Atlas 300 inference card, Atlas 500 intelligent edge station and Atlas 800 AI inference server.
The training products mainly include Atlas 300 AI training card, Atlas 800 AI server and Atlas 900 AI cluster.
-
8.
See Sect. 6.4.
8.1.7 Chapter 7
-
1.
HUAWEI HiAI platform builds a three-tier ecosystem of "Service, Engine and Foundation": HiAI Foundation, HiAI Engine and HiAI Service. It supports rich front-end mainstream frameworks on the service side. It provides rich upper layer functional business APIs on the engine side, which can run efficiently on mobile devices. It flexibly schedules heterogeneous resources on the foundation side, which can meet the needs of developers to accelerate neural network model calculation and sub calculation.
-
2.
HiAI Foundation.
-
3.
HiAI Engine.
-
4.
Android Studio.
-
5.
The integration process of App is as follows.
-
Step 1: Project creation.
① Create an Android studio project and check the “Include C++ support” option.
② Select C++ 11 in C++ Standard, check “Exceptions Support (-fexceptions)” option, check “Runtime Type Information Support(-frtti)” option.
-
Step 2: JNI compilation.
① Realize JNI, write Android.mk document.
② Write Application.mk File, and copy sdk so to the repository.
③ Specify ndk to compile C++ document in build.gradle document.
-
Step 3: Model integration.
① Model pre-processing: Application layer model pre-processing, JNI layer model pre-processing.
② Model inference.
-
8.1.8 Chapter 8
-
1.
Huawei CLOUD EI service family is composed of EI big data, EI basic platform, conversational bot, natural language processing (NLP), speech interaction, speech analysis, image recognition, content review, image search, face recognition, Optical Character Recognition (OCR) and EI agent.
-
2.
EI agent is composed of transportation AI agent, industrial AI agent, park AI agent, network AI agent, auto AI agent, medical AI agent and geographic AI agent.
-
3.
EI basic platform provides such services as ModelArts platform, deep learning, machine learning, HiLens, graph engine service and video access.
-
4.
The functions of ModelArts include data governance, extremely “fast” and “simple” model training, multi-scenario deployment of “end, edge and cloud”, automatic learning, visual workflow and AI market.
-
5.
The product advantages of ModelArts are reflected in four aspects: one-stop, user-friendliness, high performance and flexibility.
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits any noncommercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this chapter or parts of it.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Huawei Technologies Co., Ltd.. (2023). Huawei CLOUD Enterprise Intelligence Application Platform. In: Artificial Intelligence Technology. Springer, Singapore. https://doi.org/10.1007/978-981-19-2879-6_8
Download citation
DOI: https://doi.org/10.1007/978-981-19-2879-6_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2878-9
Online ISBN: 978-981-19-2879-6
eBook Packages: Computer ScienceComputer Science (R0)