A guide to preparing reliable data for GenAI by migrating to the cloud, adding semantic meaning, and implementing data quality and governance strategies for scalable use cases.
An exploration of how event-driven architectures with Kafka and Flink enable real-time GenAI use cases, combining large language models (LLMs) with vector databases for semantic search.
Doctolib's shift from a centralized monolithic platform to a data mesh architecture that supports scalable AI, analytics, and robust data governance.
An overview of trends in AI, highlighting agentic workflows, inference optimizations, and the societal impacts of AI-driven automation.
An overview of Delta Lake's RESTORE command, explaining how it reverts tables to past versions, records new commits, and handles production reversion scenarios.
An introduction to Amazon S3 Tables using Apache Iceberg, detailing table structure, access control, and benefits for scalable, secure analytics.
SemHash is a tool for deduplicating datasets using semantic similarity, combining fast embedding generation and efficient similarity search for text and multi-column data.
Learn how RAG integrates external data with LLMs to enhance query accuracy, avoiding outdated responses and hallucinations in real-world AI applications.
Learn how to build ChatGPT-like models with this comprehensive Stanford lecture.
Master Git in this beginner-friendly series, starting with why Git is essential for developers.
Explore how AI models like OpenAI’s o3 and Google’s Gemini use Chain of Thought to push reasoning capabilities forward.
Compare top data quality tools to find the right solution for governance, observability, and no-code prep.
Create SQL bots using LLMs and Microsoft Fabric to turn natural language into actionable queries.