Back to Browse

Chapter_02_Practical SQL: Beginning Data Exploration with SELECT

1 views
Apr 13, 2026
7:48

Chapter 2: Beginning Data Exploration with SELECT — A Detailed Overview Introduction: The Concept of "Interviewing Data" Chapter 2 opens with a compelling analogy that sets the tone for the entire chapter. The author compares the process of exploring data to interviewing a job applicant — just as you would ask questions to verify whether a candidate's skills match their resume, you use SQL queries to verify whether the data matches your expectations. This framing is intentional: it encourages the reader to approach data with curiosity and healthy skepticism rather than blind trust. The author emphasizes that the most exciting part of working with data isn't the preparation — the gathering, loading, or cleaning — but the actual moment of discovery. Interviewing data can reveal surprising truths, such as finding that half the respondents skipped a field in a questionnaire, or uncovering that a public official hasn't paid taxes in years. It can also expose problems like inconsistent spelling, incorrect dates, or numbers that don't align with expectations. All of these findings, whether good or bad, become part of the data's story. This mindset of curiosity and investigation is woven throughout the entire chapter and serves as the philosophical foundation for learning SQL's SELECT statement. The SELECT Statement: The Foundation of Data Querying The chapter introduces the SELECT keyword as the primary tool for retrieving data from a database. It is described as the starting point for almost every interaction with a database, whether the query is simple or extremely complex. The author explains that a SELECT statement can range from fetching everything in a single table to linking dozens of tables while performing calculations and applying precise filters. Basic SELECT Syntax The most fundamental form of a SQL query introduced in the chapter is: sqlSELECT * FROM my_table; Each component of this statement is carefully explained: SELECT — The keyword that initiates the query and tells the database you want to retrieve data. * (asterisk/wildcard) — A stand-in symbol that means "select everything" or "all columns." It retrieves every column available in the table without having to name them individually. FROM — The keyword that specifies which table the data should come from. Semicolon (;) — Signals the end of the query statement to PostgreSQL. The chapter applies this basic syntax to a teachers table created in Chapter 1, which contains six rows of teacher data including columns for ID, first name, last name, school, hire date, and salary. Running SELECT * FROM teachers returns all rows and columns, giving the reader an immediate, complete view of the table's contents. An important observation is made here: the id column, which is of type bigserial, automatically fills with sequential integers even though no values were explicitly inserted. This auto-incrementing integer acts as a unique identifier or key, ensuring every row is distinct and providing a way to connect the table to other tables in the database later. Querying a Subset of Columns While the wildcard is useful for getting a broad overview of a table, the chapter quickly points out that it's often more practical — especially with large databases — to retrieve only the specific columns you need. This is done by naming the desired columns after the SELECT keyword, separated by commas: sqlSELECT some_column, another_column, amazing_column FROM table_name; Using the teachers table as an example, the author demonstrates selecting only last name, first name, and salary — skipping the school and hire date columns entirely. An important note is made: columns can be retrieved in any order, not necessarily the order they appear in the table. This gives the analyst great flexibility in how they structure their output. This technique is presented as a smart early step in any analysis — checking whether the data is present, complete, and formatted as expected before going deeper. Using DISTINCT to Find Unique Values The chapter then introduces the DISTINCT keyword, which is used to eliminate duplicate values and display only unique entries in a column. This is particularly useful for understanding the range of values present in a dataset. sqlSELECT DISTINCT school FROM teachers; Even though the teachers table has six rows, this query returns only two results — the two unique school names. The author highlights this as a powerful data quality check: if a school name is spelled inconsistently across rows, those variations will immediately stand out in the DISTINCT results. The chapter also illustrates that DISTINCT can work across multiple columns simultaneously. When applied to both school and salary, the query returns each unique combination of the two values. Since two teachers at Myers Middle School share the same salary, that pair appears only once, reducing the output from six rows to five. This technique is described as a way to ask questions like: For each.

Download

0 formats

No download links available.

Chapter_02_Practical SQL: Beginning Data Exploration with SELECT | NatokHD