1. Knowledge base
  2. Data preparation & Model Calibration

Data Validation in Cassandra - Validate your freshly extracted data

We need to validate the completeness of the extraction. To do so, we use certain metrics to check for all the data, for example for advertising connectors we check for the spend amounts.

We check for the sums of each year present in our platform and confront them with the data in your source.


Important:

Before you begin, prepare the data, by creating a report on the CRM of your source. You'll need the sum for each year of the specific column mentioned in the data validation section of the connector you're validating. For example, if you're validating an advertising connector, you'll need the sum of spend amounts. If you're validating an organic connector, you'll need the sum of impressions.


Once the data is loaded:



  • Go to Data Integration -> Dashboard
  • Click on “Create Validation Report” 

  • You can then go to the report either through the link sent to your email or pressing Go to report List and then choosing the latest report

  • Once on the report, you need to check the loaded data that is presented as sums for each year. IMPORTANT: Before validating the connector, get the data ready from your source. Specifically, calculate the yearly totals for the column specified in the data validation section of the connector.

  • If all the sums are matching your data, press validate report and you are good to start modelling


If the data in Cassandra doesn't match the sums you provided, it means there were extraction issues (like duplication or missing extractions).


To fix this:

  • Provide the correct sum from your source.
  • Click Validate report to reload the mismatched years.
  • After the new extraction finishes, generate a new report and check if the data matches.
  • If it still doesn't match, request help from our data integration experts.

  • Finally you will be redirected to the connectors page.

To summarize what the tool reports, there are three main cases:

  1. The data is matching with the sums that you provided -> This means that the extraction went well and you don’t need to do further actions on that connectors
  2. The data shown in Cassandra is not matching the sums that you provided -> This means that there were some extraction issues (duplication, missing extractions… just to name a few) -> what you need to do is provide the correct sum from your source and then press on Validate report. This will start a reload for the years that aren’t matching. -> After this new extraction has finished, generate a new report and check if the data finally matches. -> if not request further help by our data integration experts.
  3. If you see No data available, this means that there is a technical problem with your connector. In this case, a fix could be deleting and recreating the connector, but if that doesn’t work contact our Data Integration team for a more detailed troubleshooting.