DATA Pill feed

DATA Pill #015 - Harvesting disaster, anomaly detection, dbt at scale and Blockchain conf


AI, ML, and Data Engineering InfoQ Trends Report | 8 min read | AI, ML / Data | Srini Penchikala & Dr Einat Orr | InfoQ Blog

A few takeaways:
  1. Natural Language Understanding (NLU) and Natural Language Generation (NLG) have been promoted to the early adopters category.
  2. Since last year, deep learning solutions and technologies have seen wider adoption in organizations, so we are moving deep learning from early adopters to the early majority category.
  3. Streaming data analytics and technologies like Spark Streaming have been moved to the late majority category.

Uber’s Highly Scalable and Distributed Shuffle as a Service | 20 min read | AI, ML / Data | Mayank Bansal, Bo Yang, Mayur Bhosale & Kai Jiang | Uber Engineering Blog

For those still feeling sentiment to Hadoop & Spark on-premise. Uber still invests a lot in an open-source, on-premise Spark/YARN setup!

Process behaviour anomaly detection using eBPF and Unsupervised-learning | 10 min read | AI | Simone Margaritelli |

How to use eBPF syscall tracing in a creative way in order to detect process behaviour anomalies at runtime, using an unsupervised learning model called autoencoder. The presented technique can potentially detect process exploitation, denial-of-service and several other types of attacks.

dbt at scale on Google Cloud - Part 1| 6 min read | dbt & Google Cloud| Charles Verleyen | Astrafy Blog

Data engineering architecture on Google Cloud with dbt. This is part 1 with:
  • Overall architecture
  • A brief word on ingestion
  • dbt project 
  • Cloud Composer as Orchestrator
  • Distribution of data

How HomeToGo connected dbt and Superset to make metadata more accessible and reduce analytical overhead | 6 min read | dbt | Agustin Figueroa | HomeToGo Engineering Blog

A lot of automation hints based on a long journey involving writing your own (shared in the article) patches to the open source connectors.

What is Spark-Lineage? | 6 min read | Spark | Swathi Vodela & Tien Nam Le | Yelp Blog

Spark-Lineage provides a visual representation of the data’s journey, including all of the steps from origin to destination, with detailed information about where the data goes, who owns the data and how the data is processed and stored at each step.


Down with the DAG Reverse the ETL timeline. | 7 min read | Data Lake | Ben Stancil | benn.substack 

Airflow was not built for modern development workflows. Data teams no longer need a tool for running tasks in ordered steps (aka, the infamous DAG); they now need control planes, coordination planes and broader, richer systems of orchestration. 
Ben describes his experiences in reverse orchestration:

Because each model was configured independently, we could easily set different requirements for different tables. Important dimension tables often had tight guarantees of an hour or less; computationally expensive tables, like rollup tables that we used for reporting, were rebuilt once a day, or even once a week. This significantly lowered the burden we put on our database—and, had metered cloud databases like Snowflake and BigQuery existed at the time, would’ve lowered our costs. 


Intro to Kafka and Cloud as a fairytale (for those who like analogies and beautiful things):

A gentle introduction to Apache Kafka | 10 min read and watch | Mitch Seymore | 

A walk to the cloud. A gentle introduction to fully managed environments | 10 min read and watch | Mitch & Elyse Seymore 



Data Infrastructure for Finance | 54  min | Software Engineering Daily | Hossein Rahnama, Petar Kramaric & Justin Lam

An insight into Flybits platform. Data physics, architecture and design patterns.




How to become good Developer in Scrum Team? | 0,5 h | 💪 Rafał Zalewski | GetInData 

What mindset is expected from you as a Developer in a Scrum Team. 
How a Scrum mindset helps to achieve better development results.

Google’s New AI Learned To See In The Dark! | 9 min | AI | Two Minutes Paper | 

Just see what AI can do for photography and cinematography.


Blockchain Community Day 2022 - Online Free Tech Conference | 20 September | Online 

• Industry insights
• Into Web3 & NFT

2022-08-19 08:39