IHPCSS 2017: Spark and Tensorflow for Scientific Data at Scale

Presenter: Neal McBurnett
Independent Consultant in Data Science, Election Integrity, etc.

nealmcb@gmail.com

Apache Spark is a modern open source cluster computing platform. It is helping data scientists analyze and explore large datasets more effectively than ever before, in terms of both software development productivity and efficient use of hardware, scaling from laptops to on-premises clusters to on-demand cloud computing.

Come see examples of Spark at work on scientific datasets, and learn how the largest open source project in data processing can help unify a variety of tasks, including ETL, machine learning, streaming data and SQL queries, using Python, Scala, Java or R.

We'll also briefly introduce TensorFlow, the hot new deep learning and numerical computation library from Google. As Spark and TensorFlow are rolled out and adapted for supercomputing platforms, scientists will be able to leverage the enormous investment made by the "big data" community in these tools.

Computer scientist and former Distinguished Member of Technical Staff at Bell Labs - in 1993 introduced the World Wide Web to what is now Avaya and bootstrapped their participation in the IETF. Wrote open source ElectionAudits software used in the groundbreaking 2010 and 2008 election audits in Boulder County, Colorado. Major contributor to IEEE 1622 standards for election data interchange. With Internet2, co-organized the annual IDtrust symposium for 10 years. An Ubuntu Member on the Server Team and the Colorado Ubuntu Linux Team.

A long history of volunteering. Served on the Boulder Public Library Commission and Boulder's Energy Advisory Board. In 1993, co-founded the Boulder Community Network. Chaired the Information Technology team at KGNU for several years.

Last modified: Tuesday, June 27, 2017, 1:23 PM