Inspired by Archimedes...Counting Sand

enOctober 05, 2021

Counting Sand

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

What was the main topic of the podcast episode?

Summarise the key points discussed in the episode?

Were there any notable quotes or insights from the speakers?

Which popular books were mentioned in this episode?

Were there any points particularly controversial or thought-provoking discussed in the episode?

Were any current events or trending topics addressed in the episode?

About this Episode

How much sand would it take to fill the universe? And what does this 2,000-year-old question have to do with a podcast on today’s big data challenges? In this kick-off episode of the Counting Sand podcast, host Angelo Kastroulis, CEO of Carrera Group, explains how an early research paper by Archimedes of Syracuse has much in common with his own approach to today’s big questions in data science and how the paper provides not only a metaphor for how we can meld research and practice in tackling today’s big problems but also the inspiration for the perfect podcast name.

In order to explain the origin of the name of this podcast, Angelo starts with a little history on Archimedes, as both a practical designer and also a scientist interested in the theoretical underpinnings of mathematical principles.

Angelo then talks about some important research by Archimedes but begins by explaining what a research paper is, what the history of research papers is, and why anyone undertakes writing one. He then spends time talking about Archimedes’ paper that attempts to spell out how many grains of sand would be needed to fill the universe. Of course, to answer this, Archimedes needed to approximate the size of the universe and, in order to do that, he had to develop a new number system.

Angelo—who himself has both a Greek and entrepreneurial heritage—begins to draw parallels to Archimedes and his approach to the sand problem and his own approach to understanding and addressing big problems today. He talks about his journey to find the balance of the theoretical and practical, just as Archimedes did, applying a rigorous methodology, dealing with disappointment, and exercising patience. Angelo shares his first operating axiom: “When the solution isn’t readily apparent, be patient, keep researching; the solution will present itself.”

In his work as a data scientist and technologist best known for his high-performance computing and Health IT experience, Angelo uses this process time and again. In this episode he gives examples from his own research career and the applications he has developed. Ultimately he shares his axiom #2: “If you find yourself doing too much theory, do more application and it will make your theory better, If you find yourself doing too much application, do more theory and it will make your application better.”

As Angelo says, Counting Sand will be a bit different than other podcasts. We will talk about some big problems and both discuss the theory behind potential solutions and see how they can be applied to tackle real problems. We are excited to bring listeners along for the ride.

Citations

Bourne, S. (2004, Deecembeer 6). A Conversation with Bruce Lindsay. A conversation with Bruce Lindsay – ACM Queue. Retrieved October 4, 2021, from https://queue.acm.org/detail.cfm?id=1036486.

Heath, T.G. (2020). The Sand-Reckoner of Archimedes (Vol. 1). Library of Alexandria.

Kastroulis, A. (2019). Towards Learned Access Path Selection: Using Artificial Intelligence to Determine the Decision Boundary of Scan vs Index Probes in Data Systems (Doctoral dissertation, Harvard University)

Further Reading

On Archimedes’ Sand Reckoner

Angelo Kastroulis’ Harvard master’s thesis

The Harvard Data Systems Lab

“Publish or Perish”

About the Host

Angelo Kastroulisis an award-winning technologist, inventor, entrepreneur, speaker, data scientist, and author best known for his high-performance computing and Health IT experience. He is the principal consultant, lead architect, and owner of Carrera Group, a consulting firm specializing in software modernization, event streaming (Kafka), big data, analytics (Spark, elastic Search, and Graph), and high-performance software development on many technical stacks (Java, .net, Scala, C++, and Rust). A Data Scientist at heart, trained at the Harvard Data Systems Lab, Angelo enjoys a research-driven approach to creating powerful, massively scalable applications and innovating new methods for superior performance. He loves to educate, discover, then see the knowledge through to practical implementation.

Host:Angelo Kastroulis

Executive Producer:Kerri Patterson; Producer:Leslie Jennings Rowley; Communications Strategist:Albert Perrotta

Music: All Things Grow byOliver Worth

Recent Episodes from Counting Sand

AI Hot Sauce Taste Test Challenge

Key Topics

AI-optimized vs Commercially Available Hot Sauce: Angelo and Petter perform a blind taste test with three different hot sauces, one of which is AI-optimized, to see if they can determine which one is created by AI.

Background of the AI Hot Sauce Creators: A brief insight into the story of Shekeib and Shohaib, the two brothers who combined their passion for data science and business to create an AI-optimized hot sauce.

Understanding Bayesian Optimization: A comprehensive discussion on Bayesian Optimization, a technique that uses previous knowledge to influence future decisions, perfect for creating unique hot sauce recipes.

Discussion on Other Optimization Techniques: Petter invites Angelo to delve into the different types of optimization algorithms and their pros and cons.

Understanding Gradient Descent: Angelo gives a brief introduction to the concept of Gradient Descent, a popular optimization algorithm, explaining it as akin to finding a valley when on a mountain.

Recommendations

Check out the previous episode interviewing the creators of the AI-optimized hot sauce to understand their process better.
For tech enthusiasts interested in AI and its applications, further exploration into optimization techniques like Bayesian Optimization and Gradient Descent can be insightful.

Episode Quotes

"Hot sauces are part of my favorite start of the day, so it'd be interesting to see what AI could come up with here." - Petter Graff

"Bayesian is an optimization technique that centers around using your previous knowledge to influence the future and that works really well." - Angelo Kastroulis

"Bayesian can kind of skip a bunch of steps because you've got a better second try." - Angelo Kastroulis

"The algorithm of gradient descent basically goes like this. If you're trying to find from where you are to where you should go, imagine that you're on a mountain trying to find the valley." - Angelo Kastroulis

enJuly 26, 2023

AI Hot Sauce Brothers Part 2

Introduction:

Angelo and Shohaib discuss the inclusion of new ingredients in hot sauce batches.
Shohaib explains the process of introducing new ingredients and the excitement surrounding it.

Incorporating New Ingredients:

Angelo asks about the approach to incorporating new ingredients: creating new models or expanding the feature space.
Shohaib suggests keeping the base model and increasing the search space for new ingredients.
Both options are considered, including transferring the optimal features to another model.

Metaphorical Understanding:

Angelo highlights the advantage of using hot sauce as a metaphor for complex concepts.
Shekeib acknowledges the clarity provided by the hot sauce analogy and the opportunity to learn more.

Engaging with Mathematics:

Angelo expresses his enthusiasm for discussing the mathematical side of AI.
Shekeib shares his brother's interest in math and how it goes beyond his own understanding.
Shohaib emphasizes the subset of AI concepts being discussed and the value of conceptualizing them through hot sauce.

AI as an Expansive Field:

Angelo mentions that AI encompasses various subfields, such as machine learning, Bayesian optimization, and active learning.
Neural networks, deep learning, and reinforcement learning are discussed as additional branches of AI.
Shohaib highlights the similarities between Bayesian optimization and reinforcement learning.

Reinforcement Learning:

Angelo mentions the significance of reinforcement learning in solving video games and its applicability to different domains.
Shohaib shares his experience with reinforcement learning in an AI class, specifically using it to make Pac-Man play autonomously.

Specialization and Continuous Learning:

Angelo praises Shohaib's expertise in Bayesian optimization while acknowledging the vastness of AI knowledge.
The discussion emphasizes the complexity of AI and the continuous learning required to stay up to date.

Generative Pre-trained Transformers:

Angelo brings up the popularity of generative pre-trained transformers like ChatGPT.
The ensemble nature of these models and their unique combination of techniques is highlighted.

enJuly 12, 2023

bayesian optimization

AI Hot Sauce Brothers - Part 1

Introduction

Shekeib and Shohaib join the podcast as guests to talk about their experience with creating hot sauces using AI optimization.
They created a special hot sauce, named "Counting Sauce," specifically for the podcast hosts.

The Making of Counting Sauce

This is a unique hot sauce that includes pineapple and mango flavors.
The sauce was created as a token of appreciation for being featured on the podcast.

Journey through Different Versions of the Sauce

The hosts have tried versions 19, 20, 21, and they just received version 25.
There will be a blind taste test to determine if they can tell the difference between the different iterations and compare them with other sauces to tell which is AI-created.

Optimization Process

The process involves optimizing the amount of each ingredient.
They use a Gaussian process regression model and an acquisition function called Expected Improvement for the optimization.

Choice of Ingredients

The base hot sauce has five main ingredients: vinegar, pepper, jalapeno, and lime.
After 25 iterations, the differences in taste become so minute it becomes hard to tell the difference.

Subjective Taste Testing

Shekeib talks about how his taste tolerance changes after tasting hot sauces all day.
They involved family and friends in the tasting process and asked for ratings on a scale of one to ten.

The Learning Curve of the AI

Early on, the AI would try extreme variations like too much or too little salt.
It learned quickly from feedback and adjusted accordingly.

Strength of Bayesian Optimization

The AI can learn mathematically from feedback and apply the learnings, making the optimization process quicker and more efficient.
It was also able to tweak multiple ingredients simultaneously, unlike a human who might focus on one ingredient at a time.

No Prior Experience in Hot Sauce Making

Both brothers had no prior experience or generational knowledge in hot sauce making.
The AI managed to create a decent hot sauce in just five iterations.

Power of Bayesian Optimization with Human Expertise

The brothers emphasize the importance of having a human expert in the loop of Bayesian optimization.
The AI simulates the intuition and experience of a human expert, but having a real human guide the process further enhances the results.

Application Beyond Hot Sauce

They discuss the potential of their Bayesian optimization process in other areas such as drug discovery.
The process can be guided by human experts in the respective fields for even better results.

enJune 21, 2023

bayesian optimization

Bonus: Season 2 Recap

In a time crunch? Check out the time stamps below:

[00:45] - Moore's Law, where do we go from here?

[03:00] - How do we improve data system efficiency?

[10:30] - Purpose-built systems (FPGAs)

[11:13] - Insights on (FPGAs)

[13:32] - Event Streaming

[17:50] - Data storage

[18:34] - Google’s approach to data storage

[19:00] - Downtime

[21:06] - Serves impact on environment and solutions to optimize

[23:00] - Improving data systems, machine learning, artificial intelligence

[24:06] - How do we regulate AI?

[26:10] - Benefits of simulations through machine learning

[28:38] - The impact computer science has on astrophysics

[31:09] - How do we defy Moore's Law, the future of quantum computing

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

enAugust 09, 2022

The End of Moore's Law Part 2

The last time we had Manos on the program, we talked about Moore's Law coming to an end. It's important to note that we can't rely on just sheer computing power doubling to be able to meet our ever-increasing demand for data. We must find new and exciting ways to collect and compute large amounts of data. In this episode of Counting Sand, we will dive deep into what does a database actually do? What is at the core of a data system? Most importantly, how can we use new and exciting techniques to free up the CPU's load by algorithmic trickery.

In a time crunch? Check out the time stamps below:

[00:53] - Guest Intro

[01:30] - Intro to data systems

[03:00] - Hardware types

[05:00] - Why is it important to choose the right format

[10:15] - What is column storage, and what the benefits

[16:30] - Injecting the CPU, The hierarchy of memory

[20:00] - Why not just duplicate data

[22:55] - Acid properties

Notable references:

Relational Memory: Native In-Memory Accesses on Rows and Columns

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

enJuly 26, 2022

Dynamo: The Research Paper that Changed the World

The cycle between research and application is often too long and can take decades to complete. It is often asked what bit of research or technology is the most important? Before we can answer that question, I think it's important to take a step back and share the story of why we believe The Dynamo Paper is so essential to our modern world and how we encountered it.

Citations:

DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (pp. 654-663).Lamport, L. (2019). Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport (pp. 179-196).Merkle, R. C. (1987). A digital signature based on conventional encryption. In Proceedings of the USENIX Secur. Symp (pp. 369-378).

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

enJuly 05, 2022

The Promise of AI: Opportunities and Obstacles

This show often discusses artificial intelligence and ideas to consider as technology progresses. We have discussed the deep tech of how it works and its implications on privacy. In this episode, we'll talk about the complex and controversial topic of AI policy and speak about some of the things we should be worried about regarding its future.

In a time crunch? Check out the time stamps below:

[01:15] - Guest Intro

[03:38] - Western technology leadership

[04:50] - Regulating AI

[11:00] - The promise of self-driving cars

[13:05] - AI data audition

[17:50] - Neural networks to train AI

[19:00] - Reducing mathematical knowledge, AI bottleneck

[20:35] - What is in the way of the promise of AI

[24:20] - Eric Daimler book

[27:50] - The uses of trained AI models

[29:30] - Health care industry data usage

[33:25] - AI to speed up research

[33:50] - What is rural AI?

Guest Links:

https://www.linkedin.com/in/ericdaimler/

https://conexus.com/

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

enJune 21, 2022

Energy, Edge Computing, and Data Centers

What if there was a way to reduce the amount of energy consumed and produced from servers around the world. Would these new methods positively or negatively impact the environmental footprint of today’s big data ecosystems?

In a time crunch? Check out the time stamps below:

[02:15] - Research Paper

[05:55] - Power consumption of data centers and methods to save energy

[08:50] - Server cooling methods

[12:00] - Energy production from data transportation

[13:55] - The impact of location and climate through venting and cooling computers.

[15:38] - Edge devices and cloud computing

[20:47] - Cost and energy optimization

[21:45] - Machine Learning + A.I. productive maintenance

[24:45] - Automobile processing unit, big data

Our Team:

Host: Angelo Kastroulis

Executive Producer: Náture Kastroulis

Producer: Albert Perrotta

Communications Strategist: Albert Perrotta

Audio Engineer: Ryan Thompson

Music: All Things Grow by Oliver Worth

enJune 07, 2022

Cutting-Edge Data Systems: Machine Learning

Over the last couple of years, Harvard Data Systems Lab has been focused on cutting-edge research and applications of complex data systems, focusing on such areas as artificial intelligence and machine learning pipelines. In this episode of Counting Sand, Angelo and Stratos dive deep into what they have learned and what’s next in these fields.

In a time crunch? Check out the time stamps below:

[01:00] - What’s new at Harvard Data Systems Lab?

[08:20] - What are examples of general data structure applications?

[14:13] - How do we decrease the time spent from research to application?

[20:23] - What are the benefits of machine learning?

[22:15] - What are some helpful tips when writing a thesis?

[25:00] - How important is the creative process when writing a research paper?

Helpful links:

Harvard Data Systems Lab: http://daslab.seas.harvard.edu/

Harvard Data Systems Lab Twitter: https://twitter.com/HarvardDASlab

Our Team:

Host:Angelo Kastroulis