Visualizing a Dataset
This guide teaches you how to load a CSV file from a URL and how to visualize it using Plotly.
This guide will use three Prefabs and one custom Workflow Object to load a CSV file from GitHub over HTTP and visualize it as a bar chart using Plotly.
Begin by importing the miranda.fetch
Prefab by draging it from the Prefab Library on the left-hand side of the workspace into the workflow graph.
In this example, we will be using a sample dataset of population statistics with the following columns:
id
- A unique identifier for each rowgender
- Possible values areMale
,Female
,Bigender
,Agender
, andGenderfluid
.age
- A positive integercountry
- A two letter country code
The dataset is available as a CSV file on GitHub at the following URL: https://raw.githubusercontent.com/mainly-ai/the-lab/main/datasets/population_stats.csv
Enter this dataset URL into the URL
field of the miranda.fetch
Prefab. Then import the miranda_test.printer
prefab and connect the Body
transmitter on the miranda.fetch
Prefab to the input_2
(which takes a String) receiver on the miranda_test.printer
Prefab.
Now let’s run the project and look at the output in the logs using the Processor panel on the right of the workspace.
You should now see the text contents of the CSV file in the logs. To visualize this data, we will first need to parse it into a format that Plotly can understand, such as a Pandas DataFrame. To do this, we will use the pandas.from_csv
Prefab. Import it from the Prefab Library and connect the Body
transmitter on the miranda.fetch
Prefab to the CSV
receiver on the pandas.from_csv
Prefab. Then you can connect the Dataframe
transmitter on the pandas.from_csv
Prefab to the input_1
(which takes a Dataframe) receiver on the miranda_test.printer
Prefab.
However, if we try to plot this data directly using Plotly, we will get an error or incohorent results. This is because highly dimensional data. Let’s write a custom Workflow Object (Node) to aggregate the data and visualize it as a bar chart. In this example, we will group the data by country
and average the age
column.
Create a new Node by right clicking on the workspace and selecting Create Node
. Then right click the node and select Edit Code
to begin implementing our own logic. By default, the new Node contains some boilerplate code to get you started.
These are the four main parts of a Workflow Object, which are evaluated in the following order:
init
- This is the constructor for the Workflow Object. It is called when the object is created and can be used to initialize any variables.receiver
- Recieves data from other Workflow Objects or from Controls.execute
- This is the main function of the Workflow Object. It is called when all the receivers have been called.transmitter
- Sends data to other Workflow Objects.
Let’s initialize our Workflow Object. We’re gonna want two variables, self.df
to store the DataFrame recieved from the pandas.from_csv
Prefab and self.transformed
to store the transformed DataFrame.
The default code is configured to recieve and transmit strings. We will need to modify this to use DataFrames. Let’s also change the names to better reflect the purpose of the Node and make sure we’re setting and returningthe right variables.
Now let’s write the logic to transform the DataFrame. We will use the groupby
method to group the data by country
and then use the mean
method to average the age
column.
Now lets plot this data using Plotly. Import the plotly.bar
Prefab and connect the Population by Country
transmitter on the custom Workflow Object to the Dataset
receiver on the plotly.bar
Prefab. Then run the project, and you should see a bar chart of the average age by country appear on the plotly.bar
node.