CoMeT | <%=title %>
Collaborative Management of Talks Hello! sign in or register
Bookmark Talks, Share with Friends, and We Recommend More!
Advanced Search
Talk Detail
Posted: comet.paws on  Mar 31 11:52:16 PM
Title: Accelerating Machine Learning with Training Data Management  
Alex Jason Ratner
Sponsor: Carnegie Mellon University  >  Tepper School of Business
Date: Apr 19, 2019 12:00 PM - 1:00 PM
URL: Jason
Location: TEP 4243
Groups Posted: Big Data Intelligent Systems Program Machine Learning Group 
Detail: Abstract of Accelerating Machine Learning with Training Data Management

One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today?s models learn from. In this talk, I will describe my work on data management systems that let users specify training datasets in higher-level, faster, and more flexible ways, leading to applications that can be built in hours or days, rather than months or years.

I will start by describing Snorkel, an open-source system for programmatically labeling training data that has been deployed by major technology companies, academic labs, and government agencies. In Snorkel, rather than hand-labeling training data, users write labeling functions which label data using heuristic strategies such as pattern matching, distant supervision, and other models. These labeling functions can have noisy, conflicting, and correlated outputs, which Snorkel models and combines into clean training labels. We solve this novel data cleaning problem without any ground truth labels using a matrix-completion style approach, which we show has strong consistency guarantees, and demonstrate that Snorkel leads to impactful gains in applications ranging from knowledge base construction to medical imaging.

I?ll conclude by giving an overview of how different organizations have used Snorkel, and weak supervision ML techniques more broadly, and how this is changing the way these organizations are structured. I?ll center around an upcoming SIGMOD 2019 Industry Track paper describing our deployments of Snorkel at Google in several applications, and using what we call organizational knowledge resources.

Bio: Alex Ratner is a 5th year Ph.D. candidate advised by Christopher R? in the Computer Science department at Stanford, where he is supported by a Stanford Bio-X fellowship. His research focuses on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more. He leads the Snorkel project (, which has been deployed at large technology companies, academic labs, and government agencies, and his work has been recognized in VLDB 2018 (?Best Of?).
People Who Viewed This Talk, Also Viewed
RSS Feed: RSS 2.0
ATOM Feed: Atom
iCalendar: iCal
Share: Bookmark and Share
Google Calendar:
CoMeT Blog
©2009-2019 CoMeT - Supported by Google Grant
School of Information Sciences, University of Pittsburgh, 135 North Bellefield Avenue, Pittsburgh, PA 15260
Real Time Web Analytics