A few takeaways:
For those still feeling sentiment to Hadoop & Spark on-premise. Uber still invests a lot in an open-source, on-premise Spark/YARN setup!
How to use eBPF syscall tracing in a creative way in order to detect process behaviour anomalies at runtime, using an unsupervised learning model called autoencoder. The presented technique can potentially detect process exploitation, denial-of-service and several other types of attacks.
Data engineering architecture on Google Cloud with dbt. This is part 1 with:
A lot of automation hints based on a long journey involving writing your own (shared in the article) patches to the open source connectors.
Spark-Lineage provides a visual representation of the data’s journey, including all of the steps from origin to destination, with detailed information about where the data goes, who owns the data and how the data is processed and stored at each step.
Airflow was not built for modern development workflows. Data teams no longer need a tool for running tasks in ordered steps (aka, the infamous DAG); they now need control planes, coordination planes and broader, richer systems of orchestration.
Ben describes his experiences in reverse orchestration:
Because each model was configured independently, we could easily set different requirements for different tables. Important dimension tables often had tight guarantees of an hour or less; computationally expensive tables, like rollup tables that we used for reporting, were rebuilt once a day, or even once a week. This significantly lowered the burden we put on our database—and, had metered cloud databases like Snowflake and BigQuery existed at the time, would’ve lowered our costs.
Intro to Kafka and Cloud as a fairytale (for those who like analogies and beautiful things):
An insight into Flybits platform. Data physics, architecture and design patterns.
What mindset is expected from you as a Developer in a Scrum Team.
How a Scrum mindset helps to achieve better development results.
Just see what AI can do for photography and cinematography.