Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: TensorFlow: A system for large-scale machine learning. In: USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2016)
Agrawala, A.K.: Learning with a probabilistic teacher. IEEE Trans. Infom. Theory 16, 373–379 (1970)
MathSciNet
MATH
Article
Google Scholar
Alfonseca, E., Filippova, K., Delort, J.-Y., Garrido, G.: Pattern learning for relation extraction with a hierarchical topic model. In: Meeting of the Association for Computational Linguistics (ACL) (2012)
Bach, S., Rodriguez, D., Liu, Y., Luo, C., Shao, H., Xia, C., Sen, S., Ratner, A., Hancock, B., Alborzi, H., Kuchhal, R., Ré C, Snorkel, Malkin, R.: drybell: A case study in deploying weak supervision at industrial scale. Arxiv (2019)
Bach, S.H., He, B., Ratner, A., Ré, C.: Learning the structure of generative models without labeled data. In: International Conference on Machine Learning (ICML) (2017)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Workshop on Computational Learning Theory (COLT) (1998)
Bunescu, R.C., Mooney, R.J.: Learning to extract relations from the Web using minimal supervision. In: Meeting of the Association for Computational Linguistics (ACL) (2007)
Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D.S., Karp, P.D.: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44(D1), D471–D480 (2016)
Article
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. Adaptive Computation and Machine Learning, MIT Press (2009)
Corney, D., Albakour, D., Martinez, M., Moussa, S.: What do a million news articles look like? In: Workshop on Recent Trends in News Information Retrieval (2016)
Dalvi, N., Dasgupta, A., Kumar, R., Rastogi, V.: Aggregating crowdsourced binary ratings. In: International World Wide Web Conference (WWW) (2013)
Davis, P.A. et al.: A CTD–Pfizer collaboration: Manual curation of 88,000 scientific articles text mined for drug–disease and drug–phenotype interactions. em Database (2013)
Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Soc. C 28(1), 20–28 (1979)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Dong, X.L., Srivastava, D.: Big Data Integration. Synthesis Lectures on Data Management. Morgan and Claypool Publishers (2015)
Eadicicco, L.: (2017) Baidu’s Andrew Ng on the future of artificial intelligence. Time [Online; posted 11-January-2017]
Fries, J.A., Varma, P., Chen, V.S., Xiao, K., Tejeda, H., Saha, P., Dunnmon, J., Chubb, H., Maskatia, S., Fiterau, M., Delp, S., Ashley, E., Ré, C., Priest, J.: Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac mri sequences. bioRxiv (2018)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
Article
Google Scholar
Gupta, S., Manning, C.D.: Improved pattern learning for bootstrapped entity extraction. In: CoNLL (2014)
Hancock, B., Varma, P., Wang, S., Bringmann, M., Liang, P., Ré, C.: Training classifiers with natural language explanations (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, arXiv:1512.03385 (2015)
Hearst, A.M.: Automatic acquisition of hyponyms from large text corpora. In: Meeting of the Association for Computational Linguistics (ACL) (1992)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
MATH
Article
Google Scholar
Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., Weld, D.S.: Knowledge-based weak supervision for information extraction of overlapping relations. In: Meeting of the Association for Computational Linguistics (ACL) (2011)
Joglekar, M., Garcia-Molina, H., Parameswaran, A.: Comprehensive and reliable crowd assessment algorithms. In: International Conference on Data Engineering (ICDE) (2015)
Khandwala, N., Ratner, A., Dunnmon, J., Goldman, R., Lungren, M., Rubin, D., Ré, C.: Cross-modal data programming for medical images. NIPS ML4H Workshop (2017)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization (2014) arXiv preprint
arXiv:1412.6980
Ku, J.P., Hicks, J.L., Hastie, T., Leskovec, J., Ré, C., Delp, S.L.: The Mobilize center: an NIH big data to knowledge center to advance human movement research and improve mobility. J. Am. Med. Inf. Assoc. 22(6), 1120–1125 (2015)
Google Scholar
Kuleshov, V., Hancock, B., Ratner, A., Ré C, Batzaglou, S., Snyder, M.: A machine-compiled database of genome-wide association studies. NIPS ML4H Workshop (2016)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia–A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal (2014)
Li, H., Yu, B., Zhou, D.: Error rate analysis of labeling by crowdsourcing. In: ICML Workshop: Machine Learning Meets Crowdsourcing. Atalanta, Georgia, USA (2013)
Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., Han, J.: A survey on truth discovery. SIGKDD Explor. Newsl. 17(2), 1–6 (2015)
Article
Google Scholar
Liang, P., Jordan, M.I., Klein, D.: Learning from measurements in exponential families. In: International Conference on Machine Learning (ICML) (2009)
Mann, G.S., McCallum, A.: Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res. 11, 955–984 (2010)
MathSciNet
MATH
Google Scholar
Metz, C.: Google’s hand-fed AI now gives answers, not just search results. Wired [Online; posted 29-November-2016] (2016)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Meeting of the Association for Computational Linguistics (ACL) (2009)
Davis, A.P., Grondin, C.J., Johnson, R.J., Sciaky, D., King, B.L., McMorran, R., Wiegers, J., Wiegers, T., Mattingly, C.J.: The comparative toxicogenomics database: update 2017. Nucleic Acids Res. 45, D972–D978 (2016)
Article
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article
Google Scholar
Parisi, F., Strino, F., Nadler, B., Kluger, Y.: Ranking and combining multiple predictors without labeled data. Proc. Natl. Acad. Sci. USA 111(4), 1253–1258 (2014)
MathSciNet
MATH
Article
Google Scholar
Pochampally, R., Das Sarma, A., Dong, X.L., Meliou, A., Srivastava, D.: Fusing data with correlations. In: ACM SIGMOD International Conference on Management of Data (SIGMOD) (2014)
Quinn, A.J., Bederson, B.B.: Human computation: A survey and taxonomy of a growing field. In: ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) (2011)
Ratner, A., Bach, S.H., Ehrenberg, H.R., Fries, J.A., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision (2017) CoRR, arXiv:1711.10160
Ratner, A., De Sa, C., Wu, S., Selsam, D., Ré, C.: Data programming: Creating large training sets, quickly. In: Neural Information Processing Systems (NIPS) (2016)
Ratner, A., Hancock, B., Dunnmon, J., Goldman, R., Ré, C.: Snorkel metal: Weak supervision for multi-task learning. In: Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, page 3. ACM (2018)
Ratner, A., Hancock, B., Dunnmon, J., Sala, F., Pandey, S., Ré, C.: Training complex models with multi-task weak supervision. AAAI (2019)
Ratner, A., Hancock, B., Ré, C.: The role of massively multi-task and weak supervision in software 2.0. In: Conference on Innovative Data Systems Research (2019)
Rekatsinas, T., Chu, X., Ilyas, I.F., Ré, C.: HoloClean: Holistic data repairs with probabilistic inference. PVLDB 10(11), 1190–1201 (2017)
Google Scholar
Rekatsinas, T., Joglekar, M., Garcia-Molina, H., Parameswaran, A., Ré, C.: SLiMFast: Guaranteed results for data fusion and source reliability. In: ACM SIGMOD International Conference on Management of Data (SIGMOD) (2017)
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD) (2010)
Chapter
Google Scholar
Roth, B., Klakow, D.: Combining generative and discriminative model scores for distant supervision. In: Conference on Empirical Methods on Natural Language Processing (EMNLP) (2013)
Satopaa, V., Albrecht, J., Irwin, D., Raghavan, B.: Finding a “kneedle” in a haystack: Detecting knee points in system behavior. In: International Conference on Distributed Computing Systems Workshops (2011)
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Infom. Theory 11, 363–371 (1965)
MathSciNet
MATH
Article
Google Scholar
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences (2009)
Settles, B.: Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan and Claypool Publishers (2012)
Stewart, R., Ermon, S.: Label-free supervision of neural networks with physics and other domain knowledge. In: AAAI Conference on Artificial Intelligence (AAAI) (2017)
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era (2017) arXiv preprint
arXiv:1707.02968
Takamatsu, S., Sato, I., Nakagawa, H.: Reducing wrong labels in distant supervision for relation extraction. In: Meeting of the Association for Computational Linguistics (ACL) (2012)
Varma, P., He, B., Bajaj, P., Khandwala, N., Banerjee, I., Rubin, D., Ré, C.: Inferring generative model structure with static analysis. In: Proceedings of NIPS (2017)
Varma, P., Ré, C.: Snuba: Automating weak supervision to label training data. In: Proceedings of VLDB (2019)
Wei, C.-H., Peng, Y., Leaman, R., P, D.A., Mattingly, C.J., Li, J., Wiegers, T., Lu, Z.: Overview of the BioCreative V chemical disease relation (CDR) task. In: BioCreative Challenge Evaluation Workshop (2015)
Worldwide semiannual cognitive/artificial intelligence systems spending guide. Technical report, International Data Corporation (2017)
Wu, S., Hsiao, L., Cheng, X., Hancock, B., Rekatsinas, T., Levis, P., Ré, C.: Fonduer: Knowledge base construction from richly formatted data. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1301–1316. ACM (2018)
Yuen, M.-C., King, I., Leung, K.-S.: A survey of crowdsourcing systems. In: Privacy, Security, Risk and Trust (PASSAT) and International Conference on Social Computing (SocialCom) (2011)
Zaidan, O.F., Eisner, J.: Modeling annotators: A generative approach to learning from annotator rationales. In: Conference on Empirical Methods in Natural Language Processing (EMNLP) (2008)
Zhang, C., Ré, C., Cafarella, M., De Sa, C., Ratner, A., Shin, J., Wang, F., Wu, S.: DeepDive: Declarative knowledge base construction. Commun. ACM 60(5), 93–102 (2017)
Article
Google Scholar
Zhang, Y., Chen, X., Zhou, D., Jordan, M.I.: Spectral methods meet EM: a provably optimal algorithm for crowdsourcing. J. Mach. Learn. Res. 17, 1–44 (2016)
MathSciNet
MATH
Google Scholar
Zhao, B., Rubinstein, B.I., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB 5(6), 550–561 (2012)
Google Scholar