Chapter_11_Practical SQL: Working with Dates and Times.
Practical SQL: Chapter 12 – Advanced Query Techniques This chapter moves beyond basic SQL queries to cover powerful techniques for more complex data analysis. The chapter uses U.S. Census data, meat inspection records, and a new temperature readings dataset to demonstrate each concept in practice. Key Topics Covered 1. Subqueries A subquery is a query nested inside another query, enclosed in parentheses. The chapter covers four main uses: WHERE clause filtering – Using a subquery to dynamically generate a comparison value (e.g., finding counties at or above the 90th percentile population) instead of hardcoding it. DELETE with subqueries – Removing rows based on a dynamically calculated threshold. Derived tables (FROM clause) – Treating a subquery's results as a temporary table, allowing multi-step calculations (e.g., comparing average vs. median county population) in a single query. Derived tables can also be joined together. The chapter also covers expressions — IN (subquery), EXISTS (subquery), and NOT EXISTS — for filtering rows based on whether matching values appear in another table. 2. Common Table Expressions (CTEs) CTEs, written using the WITH ... AS syntax, let you define one or more named temporary tables before the main query. Advantages over subqueries include: Reusability – A CTE can be referenced multiple times in the main query without repeating code. Readability – Complex multi-step logic is easier to follow than deeply nested subqueries. Modularity – Multiple CTEs can be chained in a single WITH block, each building on the last. The chapter redoes earlier derived-table joins and repeated subquery expressions as CTEs to demonstrate the clarity gained. 3. Cross Tabulations (Crosstabs) Cross tabulations display data in a matrix format — rows represent one variable, columns represent another, and each cell holds an aggregate value (like a count or median). PostgreSQL's crosstab() function (from the tablefunc module) enables this. Two examples are worked through: Ice cream survey – Counting flavor preferences by office location, revealing which flavors are popular where. Temperature readings – Calculating the median monthly high temperature for three U.S. cities (Chicago, Seattle, Waikiki), transforming 1,000+ rows of daily readings into a clean 3-row, 12-column summary table. 4. The CASE Statement CASE adds conditional logic to queries, allowing numeric values to be reclassified into descriptive categories. The syntax evaluates conditions in order and returns a result for the first match. The chapter uses temperature ranges to assign labels like Hot, Warm, Pleasant, Cold, Freezing, and Inhumane. This classification is then combined with a CTE to count how many days per year each city falls into each category — revealing, for instance, that Waikiki has 361 "Warm" days versus Chicago's 8 "Inhumane" days. By working with date and time data, analysts can uncover patterns, measure durations, and explore relationships between events and when they occurred. The chapter uses two real-world datasets — New York City yellow taxi rides and Amtrak train routes — to make the concepts concrete and practical. Key Topics Covered 1. Date and Time Data Types The chapter opens with a review of PostgreSQL's four core temporal data types: date – Stores only a calendar date (recommended format: YYYY-MM-DD per ISO 8601). time – Stores only a time of day, optionally with time zone awareness. timestamp – Stores both date and time; the with time zone variant (shorthand: timestamptz) is time zone aware. interval – Stores a duration of time (e.g., 12 days or 8 hours) rather than a specific moment. All four types are calendar-aware, meaning the database understands things like leap years and the number of days in each month. 2. Manipulating Dates and Times Extracting components with date_part(): Pull individual pieces — year, month, day, hour, minute, week, quarter, or epoch — out of a timestamp. This is essential for grouping and aggregating data by time periods. Building datetimes from parts with make_date(), make_time(), and make_timestamptz(): Useful when source data stores year, month, and day in separate columns that need to be combined. 3. Working with Time Zones Time zones are essential for accurate calculations whenever data spans multiple locations. Key concepts covered include: Checking the server's default time zone with SHOW timezone. Listing all available time zones using pg_timezone_names and pg_timezone_abbrevs. The chapter emphasizes that PostgreSQL always stores timestamp with time zone values internally as UTC. 4. NYC Taxi Data: Finding Patterns in Time Using a dataset of 368,774 yellow cab rides from a single day (June 1, 2016), the chapter demonstrates two practical analyses: 5. Amtrak Data: Calculating Trip Durations Across Time Zones Using a manually created table of six legs of the cross-country "All American" train route, the chapter covers: Columns filled with dates and times
Download
0 formatsNo download links available.