## Abstract

Correctness guarantees are at the core of cyber-physical computing research. While prior research addressed correctness of timing behavior and correctness of program logic, this paper tackles the emerging topic of assessing correctness of input data. This topic is motivated by the desire to crowd-source sensing tasks, an act we henceforth call *social sensing*, in applications with humans in the loop. A key challenge in social sensing is that the reliability of sources is generally unknown, which makes it difficult to assess the correctness of collected observations. To address this challenge, we adopt a cyber-physical approach, where assessment of correctness of individual observations is aided by knowledge of physical constraints on sources and observed variables to compensate for the lack of information on source reliability. We cast the problem as one of maximum likelihood estimation. The goal is to jointly estimate both (i) the latent physical state of the observed environment, and (ii) the inferred reliability of individual sources such that they are maximally consistent with both provenance information (who reported what) and physical constraints. We also derive new analytic bounds that allow the social sensing applications to accurately quantify the estimation error of source reliability for given confidence levels. We evaluate the framework through both a real-world social sensing application and extensive simulation studies. The results demonstrate significant performance gains in estimation accuracy of the new algorithms and verify the correctness of the analytic bounds we derived.

This is a preview of subscription content, access via your institution.

## Notes

In practice, we can run the algorithm until the difference of estimation parameter between consecutive iterations becomes insignificant.

As stated in our application model, sources never report a variable to be false (e.g., cars never reported the absence of traffic lights).

In principle, there is no incentive for a source to lie more than 50Â % of the time, since negating their statements would then give a more accurate truth.

CabSense. http://www.cabsense.com.

## References

Abdelzaher T et al (2007) Mobiscopes for human spaces. IEEE Pervasive Comput 6(2):20â€“29

Ali A, Khelil A, Szczytowski P, Suri N (2011) An adaptive and composite spatio-temporal data compression approach for wireless sensor networks. In: Proceedings of the 14th ACM international conference on modeling, analysis and simulation of wireless and mobile systems, MSWiMâ€™11, ACM, New York, pp 67â€“76

Alur R, Courcoubetis C, Halbwachs N, Henzinger TA, Ho P-H, Nicollin X, Olivero A, Sifakis J, Yovine S (1995) The algorithmic analysis of hybrid systems. Theor Comput Sci 138(1):3â€“34

Burke J et al (2006) Participatory sensing. In: Workshop on world-sensor-web (WSW): mobile device centric sensor networks and applications, pp 117â€“134

Casella G, Berger R (2002) Statistical inference. Duxbury Press, Pacific Grove

Chang M, Ratinov L, Roth D (2012) Structured learning with constrained conditional models. Mach Learn 88(3):399â€“431 6

Cook B, Podelski A, Rybalchenko A (2005) Abstraction refinement for termination. In: Static analysis, Springer, pp 87â€“101

Cook B, Podelski A, Rybalchenko A (2006) Termination proofs for systems code. In: ACM SIGPLAN notices, ACM, vol 41. pp 415â€“426

Cramer H (1946) Mathematical methods of statistics. Princeton University Press, Princeton

de Valmaseda JM, Ionescu G, Deriaz M (2013) Trustpos model: trusting in mobile users location. In: Daniel F, Papadopoulos G, Thiran P (eds) Mobile web information systems, vol 8093., Lecture Notes in Computer ScienceSpringer, Berlin, pp 79â€“89

Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1â€“38

Eisenman SB et al (2007) The bikenet mobile sensing system for cyclist experience mapping. In: SenSysâ€™07

Ganti RK, Pham N, Ahmadi H, Nangia S, Abdelzaher TF (2010) Greengps: a participatory sensing fuel-efficient maps application. In: MobiSys â€™10: proceedings of the 8th international conference on mobile systems, applications, and services, ACM, New York, pp 151â€“164

Guitton A, Skordylis A, Trigoni N (2007) Utilizing correlations to compress time-series in traffic monitoring sensor networks. In: Wireless communications and networking conference, 2007. WCNC 2007. IEEE, pp 2479â€“2483

He L, Greenshields Ian R (2009) A nonlocal maximum likelihood estimation method for Rician noise reduction in MR images. Med Imaging IEEE Trans 28(2):165â€“172

He W, Liu X, Ren M (2011) Location cheating: a security challenge to location-based social network services. In: Distributed computing systems (ICDCS), 2011 31st international conference on, pp 740â€“749

Hoel PG et al (1954) Introduction to mathematical statistics, 2edn. John Wiley & Sons, Inc, New York

Hogg RV, Craig AT (1995) Introduction to mathematical statistics. Prentice Hall, New York

Hogg RV, McKean J, Craig AT (2005) Introduction to mathematical statistics. Prentice Hall, Inc

Hu S, Liu H, Su L, Wang H, Abdelzaher T (2013) SmartRoad: a mobile phone based crowd-sourced road sensing system. Technical report, University of Illinois at Urbana-Champaign, 08. https://www.ideals.illinois.edu/handle/2142/45699

Huang J-H, Amjad S, Mishra S (2005) CenWits: a sensor-based loosely coupled search and rescue system using witnesses. In: SenSysâ€™05, pp 180â€“191

Hull B et al (2006) CarTel: a distributed mobile sensor computing system. In: SenSysâ€™06, pp 125â€“138

Hunter T, Das T, Zaharia M, Abbeel P, Bayen AM (2012) Large scale estimation in cyberphysical systems using streaming data: a case study with smartphone traces. arXiv preprint. arXiv:1212.3393

Kay M, Choe EK, Shepherd J, Greenstein B, Watson N, Consolvo S, Kientz JA (2012) Lullaby: a capture & access system for understanding the sleep environment. In: Proceedings of the 2012 ACM conference on ubiquitous computing, ACM, pp 226â€“234

Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604â€“632

Koukoumidis E, Peh L-S, Martonosi M (2011) Demo: Signalguru: leveraging mobile phones for collaborative traffic signal schedule advisory. In: Proceedings of the 9th international conference on mobile systems, applications, and services, MobiSys â€™11, ACM, New York, pp 353â€“354,

Lane ND, Miluzzo E, Eisenman SB, Musolesi M, Campbell AT (2008) Urban sensing systems: opportunistic or participatory

Lin T-H, Tarng W (1991) Scheduling periodic and aperiodic tasks in hard real-time computing systems. In: ACM SIGMETRICS performance evaluation review, vol 19. ACM, pp31â€“38

Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46â€“61

Lu J, Sookoor T, Srinivasan V, Gao G, Holben B, Stankovic J, Field E, Whitehouse K (2010) The smart thermostat: using occupancy sensors to save energy in homes. In: Proceedings of the 8th ACM conference on embedded networked sensor systems, ACM, pp 211â€“224

Mok AK, Chen D (1997) A multiframe model for real-time tasks. Softw Eng IEEE Trans 23(10):635â€“645

Monte-Moreno E, Chetouani M, Faundez-Zanuy M, Sole-Casals J (2009) Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun 51(9):820â€“830

Mun M, Reddy S, Shilton K, Yau N, Burke J, Estrin D, Hansen M, Howard E, West R, Boda P (2009) Peir, the personal environmental impact report, as a platform for participatory sensing systems research. In: Proceedings of the 7th international conference on mobile systems, applications, and services, MobiSys â€™09, ACM, New York, pp 55â€“68

Munir S, Stankovic JA, Liang C-JM, Lin S (2013) Cyber physical system challenges for human-in-the-loop control. In: Presented as part of the 8th international workshop on feedback computing. USENIX

Nath S (2012) Ace: exploiting correlation for energy-efficient and continuous context sensing. In: Proceedings of the tenth international conference on mobile systems, applications, and services (MobiSysâ€™12)

Pandya M, Malek M (1998) Minimum achievable utilization for fault-tolerant processing of periodic tasks. Comput IEEE Trans 47(10):1102â€“1112

Park D-W, Natarajan S, Kanevsky A (1996) Fixed-priority scheduling of real-time systems using utilization bounds. J Syst Softw 33(1):57â€“63

Pasternack J, Roth D (2010) Knowing what to believe (when you already know something). In: International conference on computational linguistics (COLING)

Pham N, Ganti RK, Uddin YS, Nath S, Abdelzaher T (2010) Privacy-preserving reconstruction of multidimensional data maps in vehicular participatory sensing

Proietti T, Alessandra L (2012) Maximum likelihood estimation of time series models: the Kalman filter and beyond. MPRA paper, University Library of Munich, Munich

Qi G-J, Aggarwal CC, Han J, Huang T (2013) Mining collective intelligence in diverse groups. In: Proceedings of the 22nd international conference on world wide web, International World Wide Web Conferences Steering Committee, pp 1041â€“1052

Rachuri KK, Mascolo C, Musolesi M, Rentfrow PJ (2011) Sociablesense: exploring the trade-offs of adaptive sampling and computation offloading for social sensing. In: Proceedings of the 17th annual international conference on mobile computing and networking, MobiCom â€™11, ACM, New York, pp 73â€“84

Rajkumar RR, Lee I, Sha L, Stankovic J (2010) Cyber-physical systems: the next computing revolution. In: Proceedings of the 47th design automation conference, ACM, pp 731â€“736

Saeedloei N, Gupta G (2011) A logic-based modeling and verification of CPS. ACM SIGBED Rev 8(2):31â€“34

Scaglione A, Servetto SD (2002) On the interdependence of routing and data compression in multi-hop sensor networks. In: Proceedings of the 8th annual international conference on mobile computing and networking, MobiCom â€™02, ACM, New York, pp 140â€“147

Schirner G, Erdogmus D, Chowdhury K, Padir T (2013) The future of human-in-the-loop cyber-physical systems. Computer 46(1):36â€“45

Sha L, Abdelzaher T, Ã…rzÃ©n K-E, Cervin A, Baker T, Burns A, Buttazzo G, Caccamo M, Lehoczky J, Mok AK (2004) Real time scheduling theory: a historical perspective. Real-time Syst 28(2â€“3):101â€“155

Sha L, Gopalakrishnan S, Liu X, Wang Q (2009) Cyber-physical systems: a new frontier. In: Tsai JJP, Yu PS (eds) Machine learning in cyber trust. Springer, Berlin, pp 3â€“13

Sprunt B, Sha L, Lehoczky J (1989) Aperiodic task scheduling for hard-real-time systems. Real-Time Syst 1(1):27â€“60

Strosnider JK, Lehoczky JP, Sha L (1995) The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments. Comput IEEE Trans 44(1):73â€“91

Tang LA, Gu Q, Yu X, Han J, La Porta TF, Leung A, Abdelzaher TF, Kaplan LM (2012) Intrumine: mining intruders in untrustworthy data of cyber-physical systems. In: SDM, SIAM, pp 600â€“611

Uddin MYS, Wang H, Saremi F, Qi G-J, Abdelzaher T, Huang T (2011) Photonet: a similarity-aware picture delivery service for situation awareness. In: Proceedings of the 2011 IEEE 32nd real-time systems symposium, RTSS â€™11, IEEE Computer Society, Washington, DC, pp 317â€“326

Wang D, Abdelzaher T, Kaplan L, Aggarwal CC (2013) Recursive fact-finding: a streaming approach to truth estimation in crowdsourcing applications. In: The 33rd international conference on distributed computing systems (ICDCSâ€™13)

Wang D, Abdelzaher T, Kaplan L (2015) Social sensing: building reliable systems on unreliable data. Morgan Kaufmann

Wang D, Abdelzaher T, Kaplan L, Ganti R, Hu S, Liu H (2013) Exploitation of physical constraints for reliable social sensing. In: The IEEE 34th real-time systems symposium (RTSSâ€™13)

Wang D, Amin T, Li S, Abdelzaher T, Kaplan L, Gu S, Pan C, Liu H, Aggrawal C, Ganti R, Wang X, Mohapatra P, Szymanski B, Le H (2014) Humans as sensors: an estimation theoretic perspective. In: The 13th ACM/IEEE international conference on information processing in sensor networks (IPSN 14)

Wang D, Huang C (2015) Confidence-aware truth estimation in social sensing applications. In: The 12th annual IEEE international conference on sensing, communication, and networking

Wang D, Kaplan L, Abdelzaher T, Aggarwal CC (2012) On scalability and robustness limitations of real and asymptotic confidence bounds in social sensing. In: The 9th annual IEEE communications society conference on sensor, mesh and ad hoc communications and networks (SECON 12)

Wang D, Kaplan L, Le H, Abdelzaher T (2012) On truth discovery in social sensing: a maximum likelihood estimation approach. In: The 11th ACM/IEEE conference on information processing in sensor networks (IPSN 12)

Wang D, Kaplan LM, Abdelzaher TF, Aggarwal CC (2013) On credibility estimation tradeoffs in assured social sensing. IEEE J Sel Areas Commun 31(6):1026â€“1037

Wang D, Kaplan L, Abdelzaher TF (2014) Maximum likelihood analysis of conflicting observations in social sensing. ACM Transactions on Sensor Networks (TOSN) 10(2):30

Wang S, Wang D, Su L, Kaplan L, Abdelzaher TF (2014) Towards cyber-physical systems in social spaces: the data reliability challenge. In: Real-time systems symposium (RTSS), 2014 IEEE, IEEE, pp 74â€“85

Wolpaw J, Wolpaw EW (2012) Brain-computer interfaces: principles and practice. Oxford University Press, Oxford

Wu CFJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11(1):95â€“103

Xu Y, chien Lee W (2006) Exploring spatial correlation for link quality estimation in wireless sensor networks. In: Proceedings IEEE PerCom, pp 200â€“211

Yin X, Han J, Yu PS (2008) Truth discovery with multiple conflicting information providers on the web. IEEE Trans Knowl Data Eng 20:796â€“808

Zhao B, Rubinstein BIP, Gemmell J, Han J (2012) A bayesian approach to discovering truth from conflicting sources for data integration. Proc VLDB Endow 5(6):550â€“561

Zhou P, Zheng Y, Li M (2012) How long to wait? Predicting bus arrival time with mobile phone based participatory sensing. In: Proceedings of the 10th international conference on mobile systems, applications, and services, MobiSys â€™12, ACM, New York, pp 379â€“392

## Acknowledgments

Research reported in this paper was sponsored by National Science Foundation under Grant No. IIS-1447795 and Army Research Laboratory under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix

### Appendix

### 1.1 Derivation of the E-step and M-step of OtO EM

Having formulated the new likelihood function to account for the source constraints in the previous subsection, we can now plug it into the Q function defined in Eq.Â (7) of Expectation Maximization. The E-step can be derived as follows:

where \(p(z_j=1|X_j,\theta ^{(n)})\) represents the conditional probability of the variable \(C_j\) to be true given the observation matrix related to the *j*th observed variable and current estimate of \(\theta \). We represent \(p(z_j=1|X_j,\theta ^{(n)})\) by *Z*(*n*,Â *j*) since it is only a function of *t* and *j*. *Z*(*n*,Â *j*) can be further computed as:

Note that, in the E-step, we continue to only consider sources who observe a given variable while computing the likelihood of reports regarding that variable.

In the M-step, we set the derivatives \(\frac{\partial Q}{\partial a_i}=0\), \(\frac{\partial Q}{\partial b_i}=0\), \(\frac{\partial Q}{\partial d_j}=0\). This gives us the \(\theta ^*\) (i.e., \(a_1^*,a_2^*,\ldots ,a_M^*\);\(b_1^*, b_2^*,\ldots ,b_M^*\);\(d_1^*,d_2^*,\ldots ,d_N^*\)) that maximizes the \(Q\left( \theta |\theta ^{(n)}\right) \) function in each iteration and is used as the \(\theta ^{(n+1)}\) of the next iteration.

where \(\mathcal {O}_i\) is set of variables source \(S_i\) observes according to the knowledge matrix *SK* and *Z*(*n*,Â *j*) is defined in Eq.Â (23). \(SJ_i\) is the set of variables the source \(S_i\) actually reports in the observation matrix *SC*. We note that, in the computation of \(a_i\) and \(b_i\), the silence of source \(S_i\) regarding some variable \(C_j\) is interpreted differently depending on whether \(S_i\)
*observed* it or not. This reflects that the opportunity to observe has been incorporated into the M-Step when the estimation parameters of sources are computed. The resulting OtO EM algorithm is summarized in the subsection below.

### 1.2 Derivation of E-step and M-step of DV and OtO+DV EM

Given the new likelihood function of the DV EM scheme defined in Eq.Â (11), the E-step becomes:

where \(p(z_{g_1},\ldots ,z_{g_k}|X_g,\theta ^{(n)})\) represents the conditional joint probability of all variables in independent group *g* (i.e., \(g_1,\ldots ,g_k\)) given the observed data regarding these variables and the current estimation of the parameters. \(p(z_{g_1},\ldots ,z_{g_k}|X_g,\theta ^{(n)})\) can be further computed as follows:

We note that \(p(z_j=1|X_j,\theta ^{(n)})\) (i.e., *Z*(*n*,Â *j*)), defined as the probability that \(C_j\) is true given the observed data and the current estimation parameters, can be computed as the *marginal distribution* of the joint probability of all variables in the independent variable group *g* that variable \(C_j\) belongs to (i.e., \(p(z_{g_1},\ldots ,z_{g_k}|X_g,\theta ^{(n)})\quad j\in c_g\)). We also note that, for the worst case where *N* variables fall into one independent group, the computational load to compute this marginal grows exponentially with respect to *N*. However, as long as the constraints on observed variables are localized, our approach stays scalable, independently of the total number of estimated variables.

In the M-step, as before, we choose \(\theta ^*\) that maximizes the \(Q\left( \theta |\theta ^{(n)}\right) \) function in each iteration to be the \(\theta ^{(n+1)}\) of the next iteration. Hence:

where \(Z(n,j)=p(z_j=1|X_j,\theta ^{(n)})\). We note that for the estimation parameters, \(a_i\) and \(b_i\), we obtain the same expression as for the case of independent variables. The reason is that sources report variables independently of the form of constraints between these variables.

Next, we combine the two EM extensions (i.e., OtO EM and DV EM) derived so far to obtain a comprehensive EM scheme (OtO+DV EM) that considers constraints on both sources and observed variables. The corresponding E-Step and M-Step are shown below:

## Rights and permissions

## About this article

### Cite this article

Wang, D., Abdelzaher, T., Kaplan, L. *et al.* Reliable social sensing with physical constraints: analytic bounds and performance evaluation.
*Real-Time Syst* **51**, 724â€“762 (2015). https://doi.org/10.1007/s11241-015-9238-8

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11241-015-9238-8