Filter

Filtering on aggregated data

Author: Scott A. Adams
Length: 7 MINS
Type: Guide
Published: 2022-03-21
Intro
Performing data analysis in SQL with proper tools and code.
Tools used in the tutorial
Tool
Description
DbVisualizer
Top rated database management tool and sql client

Key Takeaways

  • In SQL we can filter on values in the original, non-aggregated data as well as values generated from an aggregate function applied over values of another column.
  • To filter on values from aggregate functions grouped by values of another column, however, we need to use slightly different code.

Introduction

If you have ever written any piece of code to analyze data you probably encountered a situation where the output on the screen did not match what you wanted. If you are like me, you also may have said (or yelled) to the computer, ”Why can’t you just do what I want you to do?!” followed by a moment of realization that the code you wrote was not completely correct for completing the intended task at hand. For all the great things they can do, computers do not infer meaning and intent from people well. As such, computers require very deliberate instructions, written in code, to produce the desired result.

SQL is no different, and one area that is prone to a disconnect between intention and output involves filtering on aggregated data. In a previous post entitled ”Translating Basic Excel to SQL” I covered some fundamental operations in SQL, including WHERE statements, which are used to filter rows based on some condition (the materials used in that prior article are also used in this article and can be downloaded here). For example, the media below shows the execution of a WHERE statement in DbVisualizer.

execution of a WHERE statement

↑  Execution of a WHERE statement

In ”Translating Basic Excel to SQL” I also discussed the use of aggregate functions and GROUP BY statements. Using publicly available Airbnb data in New York City (original data source can be found here) say, for example, we wanted to see the mean price per listing by neighborhood group. We would use the code below to accomplish this task.

1
SELECT neighbourhood_group, AVG(price)
2
FROM AB_NYC_2019
3
GROUP BY neighbourhood_group;

Here is the execution of the code above in DbVisualizer.

execution of the code

Filtering Values From Aggregate Functions

Now, say that we want to identify neighbourhood groups and associated mean listing prices for those neighbourhood groups where the mean listing price is less than $100.00. A first, and reasonable, guess on obtaining the desired output would be to use WHERE to filter.

1
SELECT neighbourhood_group, AVG(price) FROM AB_NYC_2019
2
GROUP BY neighbourhood_group
3
WHERE AVG(price) < 100.00;

Let’s see what happens when we execute this code.

We get an error message and the code in the editor is underlined with wavy red lines, which are present to help draw attention to erroneous code. So why is this code erroneous? Well, WHERE applies to non-aggregated records, but we want to filter mean price values, which are aggregated across neighbourhood groups. Now you are probably wondering if there is a way to filter aggregated values and the answer is yes. Even better, the answer is simple! Just replace WHERE with HAVING.

1
SELECT neighbourhood_group, AVG(price) FROM AB_NYC_2019
2
GROUP BY neighbourhood_group
3
HAVING AVG(price) < 100.00;

Now the code is properly executed.

Conclusion

Being able to filter data is a fundamental skill for data analyses. When filtering in SQL, it is important to be mindful of where values in the filtering logic originated. That is, do the values on which we want to apply a filter represent the most granular level of detail in the particular data table of interest, or do the values represent a higher level of aggregation produced by an aggregate function and GROUP BY statement? Knowing this distinction and knowing the proper tools to use in both cases, that is, WHERE vs HAVING, can save a lot of headaches when performing data analyses in SQL.

DbVisualizer SQL Client.
About the author
Scott A. Adams.
Scott A. Adams
Scott is a data scientist, social scientist, and educator who cannot think of anything witty to put here.
SIGN UP TO RECEIVE THE TABLE'S ROUNDUP
More from the table
TITLE
AUTHOR
Gayatri Sachdeva
TAGS
DBA'S
DronaHQ
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
TITLE
AUTHOR
Bonnie
TAGS
Generation
TITLE
AUTHOR
Bonnie
TAGS
Joins
TITLE
AUTHOR
Igor Bobriakov
TAGS
MySQL
Security
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Operators
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
NULL
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Reserved words
TITLE
AUTHOR
Igor Bobriakov
TAGS
Oracle
TITLE
AUTHOR
Antonello Zanini
TAGS
DELETE
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
MySQL
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
JSON
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
Null
TITLE
AUTHOR
Antonello Zanini
TAGS
Driver
JDBC
ODBC
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
Connection
TITLE
AUTHOR
Lukas Vileikis
TAGS
Deduplication
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
SQL
Functions
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Math
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Docker
MySQL
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Views
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Index
TITLE
AUTHOR
Bonnie
TAGS
BigQuery
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
Join
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
PostgreSQL
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
PrestoDb
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Ansible
Automation
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
PostgreSQL
NoSQL
JSON
TITLE
AUTHOR
Igor Bobriakov
TAGS
Oracle
Data types
TITLE
AUTHOR
TheTable
TAGS
ElasticSearch
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Security
TITLE
AUTHOR
Lukas Vileikis
TAGS
Language
Design
TITLE
AUTHOR
Lukas Vileikis
TAGS
CRUD
DELETE
TITLE
AUTHOR
Lukas Vileikis
TAGS
CRUD
UPDATE
TITLE
AUTHOR
Lukas Vileikis
TAGS
CRUD
SELECT
TITLE
AUTHOR
Lukas Vileikis
TAGS
CRUD
INSERT
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
PostgreSQL
TITLE
AUTHOR
TheTable
TAGS
Bug
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Daemon
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Partitions
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
Migration
MySQL
PostgreSQL
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
hstore
TITLE
AUTHOR
TheTable
TAGS
SQL
TITLE
AUTHOR
Igor Bobriakov
TAGS
SQL server
Security
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Visualize
TITLE
AUTHOR
TheTable
TAGS
MySQL
TITLE
AUTHOR
Lukas Vileikis
TAGS
SQL
Security
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
TITLE
AUTHOR
TheTable
TAGS
PostgreSQL
Docker
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Connection
TITLE
AUTHOR
Lukas Vileikis
TAGS
Performance
TITLE
AUTHOR
Lukas Vileikis
TAGS
Security
TITLE
AUTHOR
Antonello Zanini
TAGS
Columns
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Performance
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
SQL
TITLE
AUTHOR
Lukas Vileikis
TAGS
Performance
Indexes
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Subquery
TITLE
AUTHOR
Lukas Vileikis
TAGS
Performance
TITLE
AUTHOR
Lukas Vileikis
TAGS
ACID
TITLE
AUTHOR
Lukas Vileikis
TAGS
ALTER TABLE
TITLE
AUTHOR
TheTable
TAGS
MySQL
Ports
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Security
TITLE
AUTHOR
Lukas Vileikis
TAGS
ACID
MySQL
Security
TITLE
AUTHOR
Antonello Zanini
TAGS
BLOB
TITLE
AUTHOR
TheTable
TAGS
Foreign Key
PostgreSQL
TITLE
AUTHOR
Leslie S. Gyamfi
TAGS
PostgreSQL
Concurrency
TITLE
AUTHOR
Lukas Vileikis
TAGS
Security
Encryption
TITLE
AUTHOR
Lukas Vileikis
TAGS
Security
TITLE
AUTHOR
Bonnie
TAGS
Security
PostgreSQL
TITLE
AUTHOR
Antonello Zanini
TAGS
Subquery
TITLE
AUTHOR
Antonello Zanini
TAGS
Transactions
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Data structures
TITLE
AUTHOR
Antonello Zanini
TAGS
MySQL
TITLE
AUTHOR
Lukas Vileikis
TAGS
SSH
TITLE
AUTHOR
Antonello Zanini
TAGS
Stored procedure
MySQL
TITLE
AUTHOR
Antonello Zanini
TAGS
Triggers
TITLE
AUTHOR
Igor Bobriakov
TAGS
Microsoft SQL Server
Optimization
TITLE
AUTHOR
Bonnie
TAGS
PostreSQL
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
Reusable queries
TITLE
AUTHOR
Antonello Zanini
TAGS
BIG Data
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Security
TITLE
AUTHOR
TheTable
TAGS
Beginner
SQL
TITLE
AUTHOR
Ochuko Onojakpor
TAGS
CRUD
SQL Transactions
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
Security
TITLE
AUTHOR
Antonello Zanini
TAGS
PostgreSQL
JSON
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
InnoDB
ibdata1
TITLE
AUTHOR
Lukas Vileikis
TAGS
MySQL
TITLE
AUTHOR
Scott A. Adams
TAGS
Filter
TITLE
AUTHOR
Scott A. Adams
TAGS
SQLite
TITLE
AUTHOR
Scott A. Adams
TAGS
Excel
Export
TITLE
AUTHOR
Scott A. Adams
TAGS
ERD
Join

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.