Abstract
Internet sites are sources of information for the detection of events, a special mention of traffic activity and accidental accidents or earthquake detection system. Because of the rapid growth of the last 20 years, there have been frequent traffic congestions in cities around the world. The increase in vehicles has caused a greater number of traffic events and, as a result, there are no common resources. We present a methodology for the acquisition, processing and classification of public Tweets with Natural Language Processing (NLP) techniques using the Vector Machine Support (SVM) algorithm, using text classification using social network data to detect incidents. Our view can detect tweets related to traffic, with an accuracy of 88.27%. In this document, we focus on a real-time monitoring system to detect traffic, for Twitter streams analysis by ranking of Twitter posts. We cannot even distinguish if an outdoor event throws traffic or not, multiplying the classification problem and correcting it by point 88.89%.
1 Introduction
Now a day’s social media became very important for everyone in their daily routine. Moreover, it became a great terminology of real time contents. Accurate timing of information in transportation system is very important. We use social media to analysis and detect the traffic related incidents such as congestion, incidents, natural disasters or other kinds of events related to transport [1].
Among social networks platforms, commonly used micro blogging site known as Twitter. The use of the micro blogging site Twitter for purposes of communication with customers, and it provides cost-effective and reliable method of sharing information [4]. Twitter has more than 200 million active users in a month [2] and so on also currently make “340,000,000” number of tweets per day. Mostly traffic related information sharing by people using the SUM [5]. It consist information about current situation of traffic while they are driving, so Intelligent Transportation Systems (ITSs) used to detect the traffic related event [14].
In this paper, we will review two main techniques that use twitter for handling the traffic incidents. The first technique is ITS [20] that related to Machine Learning Algorithms and text-mining technique for real-time detection of incidents and events [3] from Twitter stream [10]. The second technique is a methodology tweets processing, classifying and retrieving public tweets by using a popular Natural Language Processing (NLP) techniques with combination of a Support Vector Machine algorithm (SVM) for the classification of particular text. The paper structured as follows Sect. 2 describe problem related to system, Sect. 3 illustrated detail of both techniques with implementation. Next, in Sect. 4, comparison of the results from the experiments is being presenting. Lastly, Sect. 5 contains the discussion on conclusions.
2 Problem Statement
Many people mentioning different problems related to traffic such as traffic-jam, no parking on specific area, heavy vehicle, U-turn etc. [1]. With the change in location, these causes remain same or might be vary. Further there are tweets contain multiple problems related to particular scenario. The detection system handles these problems to predict the situation of problem [18].
3 Methods
There are two main techniques used twitter for handling the traffic incident are as following:
-
Detection Using Data of twitter
-
Real-time Detection using twitter
3.1 Detection Using Data of Twitter
This technique will classify and retrieve tweets related to traffic, APIs selected for real time streaming of data and by using roads names and keywords tweets filtered and the removals of special characters by using NLP [7]. Lastly, the classification into traffic/non-traffic tweets SVM algorithm used. The following contain the detail information of every step.
Twitter Data Gain.
By using two different kinds of tools, twitter provides free access to user for sharing post. Users easily accesses the system and can query by keywords or location and achieve popular tweets by using tool REST API queries are limited to 350 every 15 min. We chose streaming API for real time streaming of data [13]. Roads filtered name and keywords related to traffic like (M6, accidents) Scrabble tweets. This is completing through by using the regular expressions [6].
Pre-processing.
In social media informal language used to write any type of text in post that can be very informal language used by people in tweets mostly, include emotions, special characters, and hash-tags and so on. It is important step to clear the text by using some text-mining technique before send it to classifier. Following are some steps apply on datasets to clear the text:
-
(a)
Tokenization: The all text is breakdown into tokens [7]. It is process in which non-alphanumeric characters like emotions, hash tags and punctuations were removed, so the as a result became a set of words [19]. This task completed through by using python.
-
(b)
Stop word removal: This term eliminate words, which are not helpful in characterization of a text like conjunctions, prepositions, and articles. Natural language Toolkit in a famous language that is python used to get the full list of ENGLISH stops words.
Classification.
The last step of detection is classifying the pre-processed related tweets into traffic or non-traffic, large numbers of machines learning algorithms like SVM, naïve Bayes, neural language network etc. play an important role in working strategy of classification and it is implemented using SVM with Scikit-learn library [9, 13].
Implementation.
We discuss various datasets for experimental use. For dataset, 3956871 tweets were collected using twitter streaming API form “March 1st 2017 to May 31st 2017” these tweets labeled and further divided into training datasets and testing datasets.
-
(a)
Tanning Data Set: The tweets filtered commonly used to train algorithm, these tweets labeled into traffic (Good)/non-traffic (Bad). The datasets tweets related to traffic are 870 0r 871 and related to non-traffic are 870 0r 871. After that, a validation known as “10-flod cross” validation is performing on set [6].
-
(b)
Test Data Set: The datasets remaining tweets were considered for testing datasets, datasets tweets related to traffic are 289 0r 290 and related to non-traffic are 289 0r 290 which used to accommodate model of tanning data [6].
3.2 Real-Time Detection Using Twitter Data
We propose the system that based on machine learning algorithm and text mining for real time findings of traffic events via twitter stream analysis. The system is event driven and based on SOA. The system has multi class classification identifies non-traffic and traffic [10, 14].
Pre-Processing and SUMs Fetching.
This proposed system performs Pre-processing and SUMs fetching. It removes raw posts from twitter based multiple search criteria such as geographic coordinates, keywords appearance in the text of the tweet [5]. When the SUMs are fetching related to the specific search criteria and SUMs are pre-processed. The Regular Expression filter is applied on text of each raw tweet and removes additional information related with the text [8, 10, 12].
Elaboration of SUMs.
This module of proposed system named as Elaboration of SUMs, aimed some Text Mining Techniques applied in classification to the SUMs [12]. Some text mining steps are implemented in this module are following and described in detail [8, 10, 14].
-
(a)
Tokenization This is text-mining process that transforms a stream of characters into stream processing units that called tokens. This process removes all the punctuation marks and divides every SUM into tokens are similar to words denoted as the sequence of words [8].
-
(b)
Stop-word filtering eliminates words that interrupt information to analyze text, articles, associations, prepositions and pronouns. Other stops spoken for particular languages often appear in phrases and expressions in domain considered as text and noise analyzers [8].
-
(c)
Stopping is the process of minimize the token (each word) to its stem or root form, by removing its suffix. The aim of this step is the collection of words that contain same theme having closely related semantics [10, 14].
-
(d)
Stem filtering is a process that reduces the number of stems per SUM. SUM in all filtrations is carried out to eliminate the stem groups and not go to set stem groups [10, 14].
Classification of SUMs.
The third module proposes system that classifies types of SUM and assigns class label associated with events of type SUM of circulation [5]. The partners finally achieved definitive results with real-time traffic control systems [14] and continues in certain area and reports the presence of traffic event according to set of rules that defined by system administrator [8, 12] (Fig. 1).
Implementation.
Three types of classes used for SUM classification updated by user related to traffic, non-traffic and Traffic due to classification of external events done through the Naive Bayes classifier. The first two traffic related class and non-trafficked are also called 2Dataset [5] and entire classes, related to traffic, non-traffic and Traffic due to external event is also called as 3Dataset. Here we classify the SUM by Application NB Classifier, SVM and Text mining Technique [10] (Fig. 2).
4 Expected Results
The Detection Using Data technique working with geographical filter and it is not real time detection working. It involved two steps in its processing is tokenization and stop word removal [6]. The real-time detection technique senses traffic events in real-time. It involved Tokenization, Stop word removal, stopping and stem filtering in its processing [14] (Table 1).
We inked SVM as classification model, and by handling binary classification problem traffic vs. non-traffic tweets, we attained 95.75% an accuracy value [5]. The important problem is multiclass classification by solving it. We describe difference between external event traffic or not, we attained 88.89%an accuracy value [14]. By using other technology, which is detection popular NLP technique with combination of SVM for text classification. This approach detects tweets and we attained 88.28% accuracy [6] (Tables 2) and 3.
5 Conclusion
This paper presents the detail of two detections methodologies for processing the tweets, classifying and retrieving public tweets by using popular NLP techniques with combination of algorithm SVM for the classification of text. This paper we review a framework in which using data from Twitter to manage incident detection in transport networks. As The detection system, handle problems to predict the situation of problems and further use an appropriate algorithm to identify the problems and causes of the problem related to the tweet.
References
Lv, Y., Chen, Y., Zhang, X., Duan, Y., Li, N.: Social media based transportation research: the state of the work and the networking. IEEE/CAA J. Autom. Sinica 4(1), 19−26 (2017)
Zhang, S., Tang, J., Wang, H., Wang, Y.: Enhancing traffic incident detection by using spatial point pattern analysis on social media. Transp. Res. Rec. J. Transp. Res. Board, September 2015
Kulkarni, R., Dhanawade, S., Raut, S., Lavhkarer, D.S.: Twitter stream analysis for traffic detection in real time. Int. J. Adv. Res. Ideas Innov. Technol. 2(5). ISSN: 2454-132X
Cottrill, C., Gault, P., Yeboah, G., Nelson, J.D., Anable, J., Budd, T.: Tweeting Transit: An examination of social media strategies for transport information management during a large event. Transp. Res. Part C 77, 421–432 (2017)
Hemalatha, K., Narasimha, V.: Real-time detection of traffic from Twitter stream analysis. ijatir 08(20), November 2016. ISSN 2348–2370
Salas, A., Georgakis, P., Petalas, Y.: Incident detection using data from social media. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC) (2017)
Sawant, K., Pawar, S., Jadhav, P., Vidhate, S., Bule, N., Pati, S.: Traffic Detection from Real Time Twitter Stream Analysis and Navigation System. IJESC 7(5). ISSN © 2017
Sathyanandan, S., Sreedharan, D.: Traffic detection from user’s status update messages in twitter” Int. Res. J. Eng. Technol. (IRJET) 03(10), October 2016. e-ISSN: 2395 -0056
Panchal, S., Apare, R.S.: Real time traffic detection using twitter tweets analysis. Int. J. Eng. Trends Technol. (IJETT) 47(8), May 2017
Kumari, S., Khan, F., Sultan, S., Khandge, R.: Real-time detection of traffic from Twitter stream analysis. Int. Res. J. Eng. Technol. (IRJET) 03(04) (2016). e-ISSN: 2395 -0056
Semwal, D., Patil, S., Galhotra, S., Arora, A., Unny, N.: STAR: real-time spatio-temporal analysis and prediction of traffic insights using social media. In: CODS-IKDD 2015, 20 March 2015, Bangalore, India (2015)
Bhosale, S., Kokate, S.: Traffic detection using tweets on Twitter social network. Int. J. Sci. Res. (IJSR) 4(12), December 2015. ISSN (Online): 2319-7064
(Sean) Qian, Z.: Real-time Incident Detection Using Social Media Data. Commonwealth of Pennsylvania Department of Transportation, 9 May 2016
D’Andrea, E., Ducange, P., Lazzerini, B., Marcelloni, F.: Real-time detection of traffic from Twitter stream analysis. IEEE Trans. Intell. Transp. Syst. 1524-9050 © 2015. IEEE (2015)
Revathi, S.., Sumithra, A., Hebziba, S., Rani, J., Vanitha, M.: Certain analysis on traffic dataset based on data mining algorithms. Int. Res. J. Eng. Technol. (IRJET) 04(12), December 2017. e-ISSN: 2395-0056
Minh, H.D.: Detection of Traffic Events from Finnish Social Media Data. University of Tampere School of Information Sciences Computer Science/Software Development, November 2016
Pathania, D., Karlapalem, K.: Social network driven traffic decongestion using near time forecasting Copyrightc 2015, International Foundation for Autonomous Agents and Multiagent Systems (2015)
Elsafoury, F.A.: Monitoring Urban Traffic Status Using Twitter Messages. Faculty of GEO information, February 2013
Mulinge, M.J.: Visualizing Nairobi traffic from social media data. Degree of a Master of Science in Computer Science, July 2016
Singh, B., Gupta, A.: Recent trends in intelligent transportation systems: a review. J. Transp. Lit. 9(2), 30–34 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Afzaal, M. et al. (2019). Real Time Traffic Incident Detection by Using Twitter Stream Analysis. In: Ahram, T., Karwowski, W., Taiar, R. (eds) Human Systems Engineering and Design. IHSED 2018. Advances in Intelligent Systems and Computing, vol 876. Springer, Cham. https://doi.org/10.1007/978-3-030-02053-8_95
Download citation
DOI: https://doi.org/10.1007/978-3-030-02053-8_95
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02052-1
Online ISBN: 978-3-030-02053-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)