DevConf.US '18 has ended
DevConf.us 2018 is the 1st annual, free, Red Hat sponsored technology conference for community project and professional contributors to Free and Open Source technologies held at the Boston University in the historic city of Boston, USA.

When: Friday, August 17 to Sunday, August 19, 2018

Venue: Boston University, George Sherman Union Building
Back To Schedule
Saturday, August 18 • 4:30pm - 5:05pm
Sketching Data Distributions With T-Digests

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Algorithms for sketching distributions from large data sets are a building block of modern data science. Sketching plays a role in diverse applications including visualization, optimizing data encodings, data synthesis and imputation. The T-Digest is a highly versatile sketching data structure. It operates on any numeric data, models tricky distribution tails with high fidelity, and most crucially it works smoothly with map-reduce and other aggregations.

T-Digest is a perfect fit for commodity parallelization; it is single-pass and intermediate results can be aggregated across partitions. We describe a native Scala implementation of the T-Digest sketching algorithm and demonstrate its use for visualization, quantile estimations and data synthesis.

avatar for Erik Erlandson

Erik Erlandson

Principal Software Engineer, Red Hat

Saturday August 18, 2018 4:30pm - 5:05pm EDT
Metcalf Small Boston University, George Sherman Union Building