Data preparation is also known as pre-processing, which is simply the process of taking raw data and transforming it into a version that is easy and accurate enough to analyze or be used as input into a machine learning model. People typically don't like to talk about this aspect of being a data professional, but data preparation is time consuming and one of the core functions of successfully executing data analytics and machine learning. Alteryx can take that complexity away by allowing you to prepare data in just few clicks, acquiring data, cleaning data, joining data, and making transformation. Now let's talk through how to perform each of those steps.
Data gathering
The initial step before we can even start getting our data ready is to acquire it - this could be from a cloud data warehouse, data lake... In our instance we will be using one of the sample data sets existing on AlteryX. To start, we need to grab the "Input data" tool and drag it from the palette where the tool options are, which will now display a configurable bar on the left side. Select Set Up a Connection > Files > Alteryx Database (.yxdb) > TutorialData.yxdb

After bringing the data into Alteryx, analysts and data scientists normally would begin their investigations and data profiling in order to understand the data that they have. One way we can do this is to drag the "Browse" tool onto the canvas, connect the anchor from the input data to the browse tool, and then run the workflow. Now, we can click on the preview window and select different columns to look at the quality of the data in that column.

Notice there is a value in the “Last” name column that has trailing whitespace on it in our data.

We'll be able to mitigate this with some data cleaning.
Data cleansing
Data cleansing is cleaning poorly structured data to improve its quality. This could include things such as: correcting entry errors, accounting for missing data masking sensitive or confidential data addressing duplicates or outliers For data cleansing in Alteryx, drag the "Data Cleansing" tool from the tool palette and attach it to the output anchor of your input dataset. In the configuration area deselect all of the other options except "Last" as well as "Leading and Trailing Whitespace" that are under the "Remove Unwanted Characters" category. Then, run the workflow to execute the command.

To check if the cleaning has been performed correctly, click the "Browse" tool and select the "Last" column from the preview window.

You will notice that the "Values with Trailing Whitespace" parameter states "0," which indicates that the process was successful.
Data gathering
The initial step before we can even start getting our data ready is to acquire it - this could be from a cloud data warehouse, data lake... In our instance we will be using one of the sample data sets existing on AlteryX. To start, we need to grab the "Input data" tool and drag it from the palette where the tool options are, which will now display a configurable bar on the left side. Select Set Up a Connection > Files > Alteryx Database (.yxdb) > TutorialData.yxdb

After bringing the data into Alteryx, analysts and data scientists normally would begin their investigations and data profiling in order to understand the data that they have. One way we can do this is to drag the "Browse" tool onto the canvas, connect the anchor from the input data to the browse tool, and then run the workflow. Now, we can click on the preview window and select different columns to look at the quality of the data in that column.

Notice there is a value in the “Last” name column that has trailing whitespace on it in our data.

We'll be able to mitigate this with some data cleaning.
Data cleansing
Data cleansing is cleaning poorly structured data to improve its quality. This could include things such as: correcting entry errors, accounting for missing data masking sensitive or confidential data addressing duplicates or outliers For data cleansing in Alteryx, drag the "Data Cleansing" tool from the tool palette and attach it to the output anchor of your input dataset. In the configuration area deselect all of the other options except "Last" as well as "Leading and Trailing Whitespace" that are under the "Remove Unwanted Characters" category. Then, run the workflow to execute the command.

To check if the cleaning has been performed correctly, click the "Browse" tool and select the "Last" column from the preview window.

You will notice that the "Values with Trailing Whitespace" parameter states "0," which indicates that the process was successful.