Remember it this way, DISTINCT = UNIQUE
Hey Guys, hope you are good.
How’s the learning process going? Do let me know in the comments.
Learning never stops. You’re gonna get done with SQL, in a few months there’s gonna be better ways to work on SQL, there’s gonna be better ways to integrate it, to optimise it.
So, without further delay lets get going on today’s topic.
Use Of DISTINCT In SQL
DISTINCT is a great clause to add to your queries while working on data. DISTINCT as obvious by its name selects the unique entries and avoids repetition.
For example, look at the table below:
If you would do a count(ID) it would show you 3 as the answer but if you do count(distinct ID) it would show you 2 as the answer because it wont consider the 3rd entry which is again 1
So, lets get started on the query
The query without distinct returns 3916 rows of data and repeats the values continuously.
Now lets run the same query with DISTINCT
You can see that only 16 rows were returned using DISTINCT and there were no repetitions.
The processing and memory required by using DISTINCT is definitely reduced as there is a lot of difference between 3916 and 16 rows.
Its pretty simple but still if there’s any confusion you can comment below.
Exercise
Here’s what you need to do today:
Use the UK Train Rides dataset again
Count the number of entries for departure stations
Now count the unique number of departure stations
Are they the same? If yes, then apply this same concept on some other column
If no, comment below how much was the difference and what business KPI can you derive from this query
If you any doubts or queries do let me know in the comments.
I am a BI Engineer working at a Data And AI Firm. I’m trying to help newbies in Data land their first data job and impact as many fellow data professionals as I can. Here’s my contact info:
ajmal7809@gmail.com
bideveloper_ (Discord)
SELECT COUNT([Departure Station]) as Total_Number_Of_Departure_Stations
FROM railway
SELECT COUNT(DISTINCT[Departure Station]) as Total_Different_Number_Of_Departure_Stations
FROM railway
SELECT COUNT([Departure Station]) - COUNT(DISTINCT[Departure Station]) as Total_Difference_Of_Departure_Stations
FROM railway
Today's Query Solution