DATA Pill feed

DATA Pill #040 - Cloud edition! How to pay LESS for Cloud?


Our cloud spend in 2022 | 5 min | Fernando Álvarez | 37 Signals Blog
This one should catch your attention. When 37 Signals decided they’re leaving the cloud, they received a lot of questions about their actual spending. After some time, there is a quick update where they are, and what their plans for 2023 are. Enjoy!
Cloud data warehouses have become extremely popular in recent years. Their low cost and fully managed services make it easy for businesses to get started and scale their data analysis efforts as needed. However, the pricing models for these services can be complicated, with a lot of factors affecting cost.The choice between Snowflake and BigQuery will depend on the organization's specific needs and usage patterns. Without further ado let’s dig deeper into Jakub Jurczak’s post and find out which solution you should choose.
FinOps: Four Ways to Reduce Your BigQuery Storage Cost | 9 min | Xiaoxu Gao | Towards Data Science
2022 has been one of the hardest years ever to run a business. All sorts of challenges pushed engineers to look at their technical stacks from different perspectives, thinking from how to scale the system to how to control the cost to make the business more resilient. These four tips can have a huge impact on your business and allow the company to allocate money to more critical domains.
How We Cut Our Databricks Costs by 50% | 6 min | Ran Sasportas | Similiarweb Engineering Blog
What changed after the Similiarweb team decided to use Databricks clusters as the compute for their Batch API? A story about how they were able to reduce their monthly Databricks costs from $25,000 to just $12,500 by making a few key changes to their setup. Also, you will find a reminder with 4 things you should do:
  • Analysis
  • Initiative
  • Patience
  • Clicking (but what exactly? Check it out in the blogpost!)
How BookMyShow saved 80% in costs by migrating to an AWS modern data architecture | 8 min | Mahesh Vandi Chalil, Priya Jathar, Vatsal Shah | Amazon Web Services Blog
How a modern data architecture on AWS helped BMS to easily share data across organizational boundaries. Read how the AWS and Minfy Technologies teams helped a company from India choose the correct technology services and complete the migration in four months. The solution overview, walkthrough, and benefits of a modern data architecture are waiting for you.
How Michelin Cut Kafka Costs by 35% with Confluent Cloud | 5 min | Erin Junio | Confluent Blog
Open source Kafka had helped Michelin jumpstart their event-driven transformation, so it was time for the company’s next bold move—a leap to the cloud. With the help of Confluent Cloud, a fully managed, cloud-native Kafka service, the company embarked on a cloud transition. Read how #BeEvergreen #BeDataDriven criteria were important to them and what changed after this reformation.


Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue | 12 min | Luis Gerardo Baeza, SaiKiran Reddy Aenugu, Narendra Merla | Amazon Web Services Blog
A tutorial on how to easily replicate all your relational data stores into a transactional data lake in an automated fashion with a single ETL job using Apache Iceberg and AWS Glue. After deploying this solution, you have automated the ingestion of your tables on a single relational data source.
An Automated Guide to Distributed and Decentralized Management of Unity Catalog | 12 min | Vuong Nguyen, Zeashan Pappa, Mattia Zeni | Databricks Blog
This one answers the question as to how can customers leverage the support for Unity Catalog objects in the Databricks Terraform provider to manage a distributed governance pattern on the lakehouse effectively?

You will find two solutions here:

  • One that completely delegates responsibilities to teams when it comes to creating assets in the Unity Catalog
  • One that limits which resources teams can create in the Unity Catalog


Build limitless workloads on BigQuery: New features beyond SQL | 7 min | Data Analytics | Christopher Crosbie, Joe Malone | Google Cloud Blog
BigQuery is removing its limit as a SQL-only interface and providing new developer extensions for workloads that require programming beyond SQL. These flexible programming extensions are all offered without the limitations of running virtual servers. What does it bring?

  • BigQuery Stored Procedures for Apache Spark
  • Google Colab Integration with BigQuery Console
  • Remote Functions now GA
We are pleased to announce the General Availability (GA) of support for orchestrating dbt projects in Databricks Workflows. Since the start of Public Preview, we have hundreds of customers leverage this integration with dbt to collaboratively transform, test, and document data in Databricks SQL warehouses.


Day Two Cloud 176: Comparing Cloud Provider Network Performance | 39 min | Hosts: Ned Bellavance, Ethan Banks Guest: Angelique Medina | Day Two Cloud Podcast
In this episode you will listen about information on global network performance of some of the biggest public cloud providers. The sponsor ThousandEyes, a Cisco company, has a worldwide network of sensors that measure performance to, from and across AWS, Azure and Google Cloud.

Subjects that were discussed:

  • Highlights of the 2022 report
  • Why small outages can be as impactful as bigger ones
  • Cloud performance is not a “steady state”
  • Why cloud-to-cloud performance looks pretty good
  • The role of networking in cloud design and application performance

…and more.


Subsurface Live 2023 | 1-2 March | Online and on-site
Hear firsthand from technology leaders at companies such as Apple, Uber, Bloomberg, AWS, and Microsoft about their experiences architecting and building modern cloud data lakes. Learn how to innovate with open source technologies such as Apache Arrow, Apache Iceberg, Nessie, Delta Lake, Airflow, Dagster, Apache Superset, Apache Druid, Apache Ranger and more.
AWS Innovate | 9 Mar | Online
Take your AI/ML skills to the next level today! Get hands-on and step-by-step architectural and deployment best practices to help you build better, innovate faster and deploy at scale. Whether you are just getting started with AI/ML, an advanced user, or simply curious about AI/ML, we have a specific track for your level of experience and job role.
Lakehouse Days by Databricks | 14 Mar - 19 Apr | On-site
Discover the power of the Databricks Lakehouse at our series of live Lakehouse Days across EMEA. Join and find out how the lakehouse architecture unifies your data, analytics and AI, combining the best of data warehouses and data lakes on one simple platform. Built on an open and reliable data foundation that efficiently handles all data types, the lakehouse applies one common security and governance approach across all your data and cloud platforms.
Have any interesting content to share in the DATA Pill newsletter?
➡ Join us on GitHub
➡ Dig previous editions of DataPill
Made on