The study by Vela et al. showed that the ML model's performance doesn't remain static, even when they achieve high accuracy during deployment. And that different ML models age at different rates even when trained on the same datasets. Another relevant remark is that not all temporal drifts will cause performance degradation. Therefore, the choice of the model and its stability also becomes one of the most critical factors in dealing with performance temporal degradation.
Niels presents several reasons to consider using dbt and Duckdb instead of Spark. He also highlights some limitations and challenges of using DBT and DuckDB.
The article provides a comprehensive overview of DBT and DuckDB and how they can be used in data pipelines. It encourages readers to explore these tools as alternatives to Spark.
Lauren strikes back. This time some conspiracy theory about Modern Data Stack vendors and what the long-awaited Fivetran's S3 connector has to do with that. As usual, it may be a provocative narration style, but it is still good food for thought. If you start looking at your cloud spend, human capital, and products as a portfolio of investments that generate returns, you will develop habits that lead you away from these Modern Data Stack games.
In this article, you will read about the road to running Apache Flink applications on AWS KDA. Why did the Deliveroo team choose AWS KDA, and what lessons they’ve learned? Dive into the text and let yourself know their plan for the future.
Check out how you can optimize your Databricks deployment significantly using Delta Live Tables (DLT). It is a new feature that enables real-time change data capture (CDC) with transactional consistency, enabling analytics on data in motion. Dive into a detailed description of the process and tools used.
Robert shares a way that may save a few thousand dollars a year.
It's actually quite easy to extract datasets that have the biggest potential of cost savings, you can query the information schema. If your data compresses well and is partitioned and appended only then there is a good chance that you will save cost by switching the billing model to physical storage. For example, in our project used as a bronze layer we could see as much as 80% cost savings potential!
The sixth edition of the AI Index Report is here, featuring more original data than any previous version. Few takeaways for you:
- Industry races ahead of academia.
- The world’s best new scientist… AI?
- AI is both helping and harming the environment.
- The number of incidents concerning the misuse of AI is rapidly rising.
This one provides a step-by-step guide to set up a BigQuery connection in the dbt Cloud project, how to enable BigQuery API, and how to create a service account for the project. It concludes by providing a workflow to manage and execute dbt projects on multiple big projects in dbt Cloud.
Databricks announced MLflow 2.3. This open-source ML platform has been enhanced with several innovative features that expand its capabilities in managing and deploying LLMs. One of the main highlights of this update is the improvement in LLM support, which now includes three new model flavors - Hugging Face Transformers, OpenAI functions, and LangChain. Additionally, users can now enjoy faster model download and upload speeds for model files when using cloud services, thanks to the introduction of multi-part download and upload functionality.
Auto-GPT is a complex system relying on multiple components. It connects to the internet to retrieve specific information and data (something ChatGPT’s free version cannot do), features long-term and short-term memory management, uses GPT-4 for OpenAI’s most advanced text generation, and GPT-3.5 for file storage and summarization.
Ververica has announced the beta release of Ververica Cloud. It is a fully-managed service for deploying, operating, and monitoring Apache Flink applications, including stream processing and real-time analytics. Ververica Cloud offers several benefits, including:
- Simplified deployment and management of Apache Flink clusters
- Efficient resource utilization and automatic scaling
- Integration with popular data sources and sinks
- Powerful monitoring and alerting capabilities
A generative AI service that can help developers create conversational agents, chatbots, and voice assistants is already released. Bedrock uses GPT-3 technology to generate text and natural language responses. It also includes pre-built conversational components and a machine learning model trained on diverse data sources.
Databricks announces the public preview of AI Functions. AI Functions is a built-in DB SQL function, allowing you to directly access Large Language Models (LLMs) from SQL.
Ludwig works as a Product Analytics Director at Mentimeter. Before joining Mentimeter, he worked with data & analytics for over a decade at various companies such as Kry, Spotify, and Google.
Discussed subjects:
- What is an audience engagement platform
- Analytics use-cases at Mentimeter e.g. real-time visualization, customer journey
- Autonomous teams at Mentimeter
- Analytics stack at Mentimeter e.g. AWS, Redshift, LookerKPIs and dashboards e.g. Pirate Metrics (AARRR), Viral loop, LTV (Customer lifetime value)
- Unique aspects of working with data at Mentimeter
Dr. Tim Scarfe interviews Minqi Jiang, on the impact of deep reinforcement learning on technology, startups, and research. Minqi shares his experiences in balancing serendipity and planning, explains the role of objectives and Goodhart's Law in decision-making, and discusses the differences between RL and supervised learning.
They also explore the possibilities of open-endedness and the intelligence explosion, as well as limitations of RL and interpretability concerns with software 2.0.
Attend Snowflake Summit 2023 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization.
At the Summit, you’ll hear all about the latest innovations coming to the Data Cloud, and learn from hundreds of technical, data, and business experts about what’s possible for you and your organization in a world of data collaboration.