When I didn’t implement tests like this one, I often ran into scenarios where my data tables were considered “fresh”, as in data had been ingested within the last day. ![]() I recommend using this alongside a freshness test to ensure data is not only up-to-date, but the amount of data being ingested also meets your expectations. This is good for checking if the correct volume of rows has been ingested into a table. One of my favorites is the expect_table_row_count_to_be_between test. There are a few different tests within dbt-expectations that look at the row counts of a table. Vars: 'dbt_date:time_zone': 'America/Los_Angeles'īe sure to change this timezone to the timezone you are working in as this is used in a few of the date-related tests within the package. To add dbt-expectations to your dbt project, add the following code to your packages.yml file: Adding dbt-expectations to your dbt project You can also test for null and unique values with more complexity than the generic tests offered in every dbt project. With dbt-expectations, you can test the following aspects of your data without needing to write any custom SQL: This ensures your data stays consistent across different models as well as different runs of your data pipeline. This way you can check data quality at every layer of your data model rather than just checking the final product. With dbt-expectations you can test sources, models, and seeds. These tests allow you to assert powerful indicators of data quality without the huge lift of writing custom tests yourself. This way, you don’t have to refactor your entire codebase, which is written in a different language, just to take advantage of a powerful data quality tool. Rather than using Python, these tests use SQL code and integrate directly into your dbt project. ![]() Because it is Python based, you need to be actively using Python in your analytics environment.ĭbt-expectations enables analytics engineers and data analysts that use SQL more than Python to take advantage of the same data quality testing as data engineers and software engineers. It creates data documentation and data quality reports for your codebases, focusing on simplifying the lives of data engineers and software engineers. If you aren’t familiar with Great Expectations, this is an open-source tool that focuses on validating, documenting, and profiling your data. ![]() This package is essentially the dbt version of Great Expectations, a popular Python testing library. ![]() If your data doesn’t meet these expectations, it will raise an error. What is dbt-expectations?ĭbt-expectations is a free open-source testing library for dbt that allows you to assert “expectations” about your data. Wouldn’t the debugging process be easier if we knew the exact transformation step where a problem manifested? Luckily, if you are a dbt user, there is an easy solution. In order to get to the root cause, you need to debug each individual step to hone in on where things are going wrong. How many times have you run into an error in your data pipeline, confused about where to even begin debugging it? You probably start by checking the core model, then the source, or maybe an intermediate model that sits between the source and the core model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |