Aug. 24
|
Shardha Jogee
University of Texas at Austin
|
Organizational Meeting
|
|
Aug. 31
|
Markus Kissler-Patig
Director, Gemini Observatory
Heidi Hammel
Executive Vice President, AURA
|
Gemini Observatory Community Event
|
|
Sep. 7
|
No talk scheduled
|
|
|
Sep. 14
|
Taft Armandroff
The University of Texas at Austin
|
Progress with McDonald Observatory Initiatives
|
|
Sep. 21
|
Anita Cochran
The University of Texas at Austin
|
** Starting at 3:00 p.m. this week only **
Update on GMT and Discussion of SAC Technical Document
|
|
Sep. 28
|
No talk scheduled
|
|
|
Oct. 5
|
Talk rescheduled to October 26
|
|
|
Oct. 12
|
Sean Wang
Director, Data Science at Fidelity Investments
|
It Doesn't Have to be Rocket Science - Non-academic careers for fun and profit
|
|
Oct. 19
|
Keely Finkelstein
The University of Texas at Austin
|
Teaching Tools, Tips, and Strategies: Low Stakes Testing, Immediate Feedback Assessment & More
|
|
Oct. 26
|
Niall Gaffney
Director for Data Intensive Computing, Texas Advanced Computing Center
Zhao Zhang
Research Associate, Texas Advanced Computing Center
|
Recent Developments at TACC
Processing Astronomy Imagery Using Big Data Technology
Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, HPC tools are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark—a modern platform for data intensive computing—to parallelize many-task applications. Using Apache Spark, we implement Kira, a flexible and distributed astronomy image processing toolkit. We then use the Kira toolkit to implement the Kira SE application for extracting sources from astronomy images. Using Kira SE as a case study, we study the programming flexibility, dataflow richness, scheduling capacity and performance of Apache Spark running on the EC2 cloud. By exploiting data locality, Kira SE achieves a 4.1× speedup over an equivalent C program when analyzing a 1TB dataset using 512 cores on the Amazon EC2 cloud. Furthermore, we show that by leveraging software originally designed for big data infrastructure, we are able to use the Amazon EC2 cloud to achieve a 1.8× speedup over the C implementation running on the NERSC Edison supercomputer, when holding core count constant. Using the same implementation of Kira SE, a 128-core EC2 cloud deployment that uses Spark Streaming can achieve second-scale latency with a sustained throughput of ∼600 MB/s. Our experience with Kira demonstrates that data intensive computing platforms like Apache Spark are a performant alternative for many-task scientific applications.
close
|
|
Nov 2
|
Scott Acton
Ball Aerospace
|
JWST: An Observatory Beyond the Moon
|
|
Nov. 9
|
On hold
|
On hold
|
|
Nov. 16
|
Anna Quider
Director of Federal Relations, Northern Illinois University
|
The Federal Budget: A Primer for Scientists
abstract
|
|
Nov. 23
|
Thanksgiving Holiday
|
|
|
Nov. 30
|
On hold
|
On hold
|
|