SQL_notes_Mode

Learning from https://mode.com/sql-tutorial/

Basic

1.SELECT * FROM table

  • rename column names:

    SELECT west AS "West Region" FROM tutorial.us_housing_units

2.LIMIT

3.WHERE

  • Equal to =

  • Not equal to <> or !=

  • Greater than >

  • Less than <

  • Greater than or equal to >=

  • Less than or equal to <=

  • Arithmetic in SQL : You can perform arithmetic in SQL using the same operators you would in Excel: +, -, *, /.

     SELECT west + south AS south_plus_west 
     FROM tutorial.us_housing_units

    4.SQL Logical operators

  • LIKE allows you to match similar values, instead of exact values. EX: include rows for which “group” starts with “Snoop” :

    SELECT *
    FROM tutorial.billboard_top_100_year_end
    WHERE "group" LIKE 'Snoop%'

    (In general, putting double quotes around a word or phrase will indicate that you are referring to that column name) like is case_sensitive,ILIKE is not case sensitive. You can also use _ (a single underscore) to substitute for an individual character:

SELECT *
tutorial.billboard_top_100_year_end
WHERE artist ILIKE 'dr_ke'
  • IN allows you to specify a list of values you’d like to include. EX:

    SELECT *
    tutorial.billboard_top_100_year_end
    WHERE year_rank IN (1, 2, 3)

    (non-numerical values, but they need to go inside single quotes. )

    SELECT *
    FROM tutorial.billboard_top_100_year_end
    WHERE artist IN ('Taylor Swift', 'Usher', 'Ludacris')
  • BETWEEN allows you to select only rows within a certain range.

  • IS NULL allows you to select rows that contain no data in a given column.

      SELECT *
      FROM tutorial.billboard_top_100_year_end
    WHERE artist IS NULL
(WHERE artist = NULL will not work��you can't perform arithmetic on null values.)
  • AND allows you to select only rows that satisfy two conditions. You can use SQL’s AND operator with additional AND statements or any other comparison operator, as many times as you want.

    SELECT *
    FROM tutorial.billboard_top_100_year_end
    WHERE year = 2012 AND year_rank <= 10 AND "group" ILIKE '%feat%'
  • OR allows you to select rows that satisfy either of two conditions.

  • NOT allows you to select rows that do not match a certain condition.

    NOT is commonly used with LIKE.

        SELECT *
        FROM tutorial.billboard_top_100_year_end
        WHERE year = 2013
        AND "group" NOT ILIKE '%macklemore%'

? NOT is also frequently used to identify non-null rows

 SELECT *
 FROM tutorial.billboard_top_100_year_end
 WHERE year = 2013 AND artist IS NOT NULL
  1. ORDER BY DESC

    You can also order by multiple columns. This is particularly useful if your data falls into categories and you’d like to organize rows by date, for example, but keep all of the results within a given category together.

SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE year_rank <= 3
ORDER BY year DESC, year_rank

you can make your life a little easier by substituting numbers for column names in the ORDER BY clause. The numbers will correspond to the order in which you list columns in the SELECT clause.

SELECT *
tutorial.billboard_top_100_year_end
WHERE year_rank <= 3
ORDER BY 2, 1 DESC

6.Using comments You can use– (two dashes) to comment out everything to the right of them on a given line:

SELECT *  --This comment won't affect the way the code runs
FROM tutorial.billboard_top_100_year_end
WHERE year = 2013

You can also leave comments across multiple lines using / to begin the comment and / to close it:

/* Here's a comment so long and descriptive that
it could only fit on multiple lines. Fortunately,
it, too, will not affect how this code runs. */
SELECT *
FROM tutorial.billboard_top_100_year_end
WHERE year = 2013

Intermediate

1.Aggregate functions

  • COUNT counts how many rows are in a particular column.

    • Typing COUNT(1) has the same effect as COUNT(*). Which one you use is a matter of personal preference.

    • Counting individual columns Count of all of rows in which the high column is not null:

      SELECT COUNT(high)
      FROM tutorial.aapl_historical_stock_prie
  • SUM adds together all the values in a particular column. you can only use SUM on columns containing numerical values. You don’t need to worry as much about the presence of nulls with SUM as you would with COUNT, as SUM treats nulls as 0.

  • MIN and MAX return the lowest and highest values in a particular column, respectively. They’re similar to COUNT in that they can be used on non-numerical columns. Depending on the column type, MIN will return the lowest number, earliest date, or non-numerical value as close alphabetically to “A” as possible.

  • AVG calculates the average of a group of selected values. First, it can only be used on numerical columns. Second, it ignores nulls completely.

2.GROUP BY You can group by multiple columns. The order of column names in your GROUP BY clause doesn’t matter����the results will be the same regardless. If you want to control how the aggregations are grouped together, use ORDER BY.

3.HAVING Having is the “clean” way to filter a query that has been aggregated,The WHERE clause won’t work for filtering on aggregate columns

SELECT year,
       month,
       MAX(high) AS month_high
FROM tutorial.aapl_historical_stock_price
GROUP BY year, month
HAVING MAX(high) > 400
ORDER BY year, month

4.DISTINCT

Only the unique values in a particular column If you include two (or more) columns in a SELECT DISTINCT clause, your results will contain all of the unique pairs of those two columns:

SELECT DISTINCT year, month
FROM tutorial.aapl_historical_stock_price

DISTINCT goes inside the aggregate function rather than at the beginning of the SELECT clause:

SELECT COUNT(DISTINCT month) AS unique_months
FROM tutorial.aapl_historical_stock_price

5.CASE Every CASE statement must end with the END statement. The ELSE statement is optional:

CASE WHEN...THEN
END
  SELECT player_name,
         year,
         CASE WHEN year = 'SR' THEN 'yes'
         ELSE NULL END AS is_a_senior
  FROM benn.college_football_players
SELECT player_name,
         weight,
         CASE WHEN weight > 250 THEN 'over 250'
              WHEN weight > 200 THEN '201-250'
              WHEN weight > 175 THEN '176-200'
              ELSE '175 or under' END AS weight_group
  FROM benn.college_football_players

6.Joins

 SELECT teams.conference AS conference,
         AVG(players.weight) AS average_weight
  FROM benn.college_football_players players
  JOIN benn.college_football_teams teams
  ON teams.school_name = players.school_name
  GROUP BY teams.conference
  ORDER BY AVG(players.weight) DESC
  • inner join Inner joins eliminate rows from both tables that do not satisfy the join condition set forth in the ON statement. In mathematical terms, an inner join is the intersection of the two tables. rename column names:

  • outer join

    • LEFT JOIN returns only unmatched rows from the left table.
    • RIGHT JOIN returns only unmatched rows from the right table.
    • FULL OUTER JOIN returns unmatched rows from both tables.

Filtering in the ON clause: filter one or both of the tables before joining them

   SELECT companies.permalink AS companies_permalink,
       companies.name AS companies_name,
       acquisitions.company_permalink AS acquisitions_permalink,
       acquisitions.acquired_at AS acquired_date
   FROM tutorial.crunchbase_companies companies
   LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
   ON companies.permalink = acquisitions.company_permalink
   AND acquisitions.company_permalink != '/company/1000memories'
   ORDER BY 1

Filtering in the WHERE clause:(Join then filter)

   SELECT companies.permalink AS companies_permalink,
       companies.name AS companies_name,
       acquisitions.company_permalink AS acquisitions_permalink,
       acquisitions.acquired_at AS acquired_date
   FROM tutorial.crunchbase_companies companies
   LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
   ON companies.permalink = acquisitions.company_permalink
   WHERE acquisitions.company_permalink != '/company/1000memories'
   OR acquisitions.company_permalink IS NULL
   ORDER BY 1
  • Joining on multiple keys why? The first has to do with accuracy.

     The second reason has to do with performance. SQL uses "indexes" (essentially pre-defined joins) to speed up queries
  • Self join Sometimes it can be useful to join a table to itself. Let����s say you wanted to identify companies that received an investment from Great Britain following an investment from Japan.

SELECT DISTINCT japan_investments.company_name,
       japan_investments.company_permalink
  FROM tutorial.crunchbase_investments_part1 japan_investments
  JOIN tutorial.crunchbase_investments_part1 gb_investments
    ON japan_investments.company_name = gb_investments.company_name
   AND gb_investments.investor_country_code = 'GBR'
   AND gb_investments.funded_at > japan_investments.funded_at
 WHERE japan_investments.investor_country_code = 'JPN'
 ORDER BY 1
  1. UNION SQL joins allow you to combine two datasets side-by-side, but UNION allows you to stack one dataset on top of the other SQL has strict rules for appending data:

    • Both tables must have the same number of columns
    • The columns must have the same data types in the same order as the first table column names don’t necessarily have to be the same

tips: 1.If you must use spaces in naming, you will need to use double quotes: SELECT COUNT(date) AS “Count Of Date” FROM tutorial.aapl_historical_stock_price 2.Query clause order:

  • SELECT
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY

ADVANCED

Data Type**

Imported as Stored as With these rules
String VARCHAR(1024) Any characters, with a maximum field length of 1024 characters.
Date/Time TIMESTAMP Stores year, month, day, hour, minute and second values as YYYY-MM-DD hh:mm:ss.
Number DOUBLE PRECISION Numerical, with up to 17 significant digits decimal precision.
Boolean BOOLEAN Only TRUE or FALSE values.

Changing a column’s data type��

two types: CAST(column_name AS integer) and column_name::integer

2.Data Format

SELECT companies.permalink,
       companies.founded_at_clean,
       acquisitions.acquired_at_cleaned,
       acquisitions.acquired_at_cleaned -
         companies.founded_at_clean::timestamp AS time_to_acquisition
  FROM tutorial.crunchbase_companies_clean_date companies
  JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
    ON acquisitions.company_permalink = companies.permalink
 WHERE founded_at_clean IS NOT NULL
SELECT companies.permalink,       
       companies.founded_at_clean,       
       companies.founded_at_clean::timestamp +         
       INTERVAL '1 week' AS plus_one_week  
       FROM tutorial.crunchbase_companies_clean_date companies WHERE founded_at_clean IS NOT NULL
SELECT companies.permalink,       
companies.founded_at_clean,       
NOW() - companies.founded_at_clean::timestamp AS founded_time_ago  
FROM tutorial.crunchbase_companies_clean_date companies WHERE founded_at_clean IS NOT NULL
SELECT
    DATE_FORMAT(wk,'%Y%m%d'),
    wk,
    song
 FROM totp
 WHERE singer='Madness'

3.Data Wrangling with SQL

4.Using SQL String Functions to Clean data

LEFT, RIGHT, LENGTH

LEFT(string, number of characters)

SELECT incidnt_num,
        date,       
        LEFT(date, 10) AS cleaned_date  
FROM tutorial.sf_crime_incidents_2014_01
SELECT incidnt_num,       
       date,       
       LEFT(date, 10) AS cleaned_date,       
       RIGHT(date, LENGTH(date) - 11) AS cleaned_time  
       FROM tutorial.sf_crime_incidents_2014_01

TRIM

SELECT location,       
TRIM(both '()' FROM location)  
FROM tutorial.sf_crime_incidents_2014_01

POSITION and STRPOS

Importantly, both the POSITION and STRPOS functions are case-sensitive. If you want to look for a character regardless of its case, you can make your entire string a single by using the UPPER or LOWER functions described below.

POSITION allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string. For example, the following query will return the position of the character ‘A’ (case-sensitive) where it first appears in the descript field:

SELECT incidnt_num,       
       descript,       
       POSITION('A' IN descript) AS a_position  
FROM tutorial.sf_crime_incidents_2014_01

You can also use the STRPOS function to achieve the same results?just replace IN with a comma and switch the order of the string and substring:

SELECT incidnt_num,       
       descript,       
       STRPOS(descript, 'A') AS a_position  
FROM tutorial.sf_crime_incidents_2014_01

SUBSTR

SELECT incidnt_num,       
       date,       
       SUBSTR(date, 4, 2) AS day  
FROM tutorial.sf_crime_incidents_2014_01

CONCAT

SELECT incidnt_num,      
       day_of_week,       
       LEFT(date, 10) AS cleaned_date,       
       CONCAT(day_of_week, ', ', LEFT(date, 10)) AS day_and_date  
FROM tutorial.sf_crime_incidents_2014_01

Alternatively, you can use two pipe characters (||) to perform the same concatenation:

SELECT incidnt_num,       
       day_of_week,       
       LEFT(date, 10) AS cleaned_date,       
       day_of_week || ', ' || LEFT(date, 10) AS day_and_date  
       UPPER(address) AS address_upper,
       LOWER(address) AS address_lower
FROM tutorial.sf_crime_incidents_2014_01

Turning strings into dates

SELECT incidnt_num,date,       
       (SUBSTR(date, 7, 4) || '-' || LEFT(date, 2) ||'-' || SUBSTR(date, 4, 2))::date AS cleaned_date  
FROM tutorial.sf_crime_incidents_2014_01
SELECT cleaned_date,       
       EXTRACT('year' FROM cleaned_date) AS year,       
       EXTRACT('month'  FROM cleaned_date) AS month,       
       EXTRACT('day'    FROM cleaned_date) AS day,       
       EXTRACT('hour'   FROM cleaned_date) AS hour,       
       EXTRACT('minute' FROM cleaned_date) AS minute,       
       EXTRACT('second' FROM cleaned_date) AS second,       
       EXTRACT('decade' FROM cleaned_date) AS decade,       
       EXTRACT('dow'    FROM cleaned_date) AS day_of_week  
FROM tutorial.sf_crime_incidents_cleandate

You can also round dates to the nearest unit of measurement. This is particularly useful if you don’t care about an individual date, but do care about the week (or month, or quarter) that it occurred in. The DATE_TRUNCfunction rounds a date to whatever precision you specify. The value displayed is the first value in that period. So when you DATE_TRUNC by year, any value in that year will be listed as January 1st of that year:

SELECT cleaned_date,       
       DATE_TRUNC('year'   , cleaned_date) AS year,       
       DATE_TRUNC('month'  , cleaned_date) AS month,       
       DATE_TRUNC('week'   , cleaned_date) AS week,       
       DATE_TRUNC('day'    , cleaned_date) AS day,      
       DATE_TRUNC('hour'   , cleaned_date) AS hour,       
       DATE_TRUNC('minute' , cleaned_date) AS minute,       
       DATE_TRUNC('second' , cleaned_date) AS second,       
       DATE_TRUNC('decade' , cleaned_date) AS decade  
FROM tutorial.sf_crime_incidents_cleandate
SELECT CURRENT_DATE AS date,       
       CURRENT_TIME AS time,       
       CURRENT_TIMESTAMP AS timestamp,       
       LOCALTIME AS localtime,       
       LOCALTIMESTAMP AS localtimestamp,       
       NOW() AS now
SELECT CURRENT_TIME AS time,       
CURRENT_TIME AT TIME ZONE 'PST' AS time_pst

COALESCE

SELECT incidnt_num,       
       descript,       
       COALESCE(descript, 'No Description')  
FROM tutorial.sf_crime_incidents_cleandate 
ORDER BY descript DESC

5.Writing subqueries in SQL

6.SQL window functions

  • Basic windowing syntax
SELECT start_terminal,
       duration_seconds,
       SUM(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_total,
       COUNT(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_count,
       AVG(duration_seconds) OVER
         (PARTITION BY start_terminal) AS running_avg
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
  • ROW_NUMBER() displays the number of a given row. It starts are 1 and numbers the rows according to the ORDER BY part of the window statement. ROW_NUMBER() does not require you to specify a variable within the parentheses:
SELECT start_terminal,
       start_time,
       duration_seconds,
       ROW_NUMBER() OVER (ORDER BY start_time)
                    AS row_number
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'

Using the PARTITION BY clause will allow you to begin counting 1 again in each partition. The following query starts the count over again for each terminal:

SELECT start_terminal,
       start_time,
       duration_seconds,
       ROW_NUMBER() OVER (PARTITION BY start_terminal
                          ORDER BY start_time)
                    AS row_number
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
  • RANK() and DENSE_RANK() RANK() would give the identical rows a rank of 2, then skip ranks 3 and 4, so the next result would be 5 DENSE_RANK() would still give all the identical rows a rank of 2, but the following row would be 3?no ranks would be skipped.
SELECT start_terminal,
       duration_seconds,
       RANK() OVER (PARTITION BY start_terminal
                    ORDER BY start_time)
              AS rank
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
  • NTILE NTILE(# of buckets) identify what percentile a given row falls into. In this case, ORDER BY determines which column to use to determine the quartiles (or whatever number of ‘tiles you specify). For example:
SELECT start_terminal,
       duration_seconds,
       NTILE(4) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
          AS quartile,
       NTILE(5) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS quintile,
       NTILE(100) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS percentile
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
 ORDER BY start_terminal, duration_seconds
  • LAG and LEAD

It can often be useful to compare rows to preceding or following rows, especially if you’ve got the data in an order that makes sense. You can use LAG or LEAD to create columns that pull values from other rows?all you need to do is enter which column to pull from and how many rows away you’d like to do the pull. LAG pulls from previous rows and LEAD pulls from following rows:

SELECT start_terminal,
       duration_seconds,
       LAG(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds) AS lag,
       LEAD(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds) AS lead
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
 ORDER BY start_terminal, duration_seconds

This is especially useful if you want to calculate differences between rows:

SELECT start_terminal,
       duration_seconds,
       duration_seconds -LAG(duration_seconds, 1) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS difference
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
 ORDER BY start_terminal, duration_seconds

The first row of the difference column is null because there is no previous row from which to pull. Similarly, using LEAD will create nulls at the end of the dataset. If you’d like to make the results a bit cleaner, you can wrap it in an outer query to remove nulls:

SELECT *
  FROM (
    SELECT start_terminal,
           duration_seconds,
           duration_seconds -LAG(duration_seconds, 1) OVER
             (PARTITION BY start_terminal ORDER BY duration_seconds)
             AS difference
      FROM tutorial.dc_bikeshare_q1_2012
     WHERE start_time < '2012-01-08'
     ORDER BY start_terminal, duration_seconds
       ) sub
 WHERE sub.difference IS NOT NULL
  • Defining a windows alias If you’re planning to write several window functions in to the same query, using the same window, you can create an alias. Take the NTILE example above:
SELECT start_terminal,
       duration_seconds,
       NTILE(4) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS quartile,
       NTILE(5) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS quintile,
       NTILE(100) OVER
         (PARTITION BY start_terminal ORDER BY duration_seconds)
         AS percentile
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
 ORDER BY start_terminal, duration_seconds

This can be rewritten as:

SELECT start_terminal,
       duration_seconds,
       NTILE(4) OVER ntile_window AS quartile,
       NTILE(5) OVER ntile_window AS quintile,
       NTILE(100) OVER ntile_window AS percentile
  FROM tutorial.dc_bikeshare_q1_2012
 WHERE start_time < '2012-01-08'
WINDOW ntile_window AS
         (PARTITION BY start_terminal ORDER BY duration_seconds)
 ORDER BY start_terminal, duration_seconds

7.Performance Tunning SQL Queries

8.Pivoting Data in SQL

SELECT conference,
       SUM(players) AS total_players,
       SUM(CASE WHEN year = 'FR' THEN players ELSE NULL END) AS fr,
       SUM(CASE WHEN year = 'SO' THEN players ELSE NULL END) AS so,
       SUM(CASE WHEN year = 'JR' THEN players ELSE NULL END) AS jr,
       SUM(CASE WHEN year = 'SR' THEN players ELSE NULL END) AS sr
  FROM (
        SELECT teams.conference AS conference,
               players.year,
               COUNT(1) AS players
          FROM benn.college_football_players players
          JOIN benn.college_football_teams teams
            ON teams.school_name = players.school_name
         GROUP BY 1,2
       ) sub
 GROUP BY 1
 ORDER BY 2 DESC

Table Manipulation

INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode, Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
DELETE FROM Customers WHERE CustomerName='Alfreds Futterkiste';

 Previous
https://www.1point3acres.com/bbs/interview/amazon-data-science-478865.html 亚麻BIE physical retail seatle 1)一个manuafacture
2020-09-16
Next 
2.0_leetcode_database_practice 2.0_leetcode_database_practice
EASY tag:176. SELECT IFNULL( (SELECT ... LIMIT 1 OFFSET 1,NULL ) AS SecondHighestSalary SELECT MAX(salary) AS
2020-09-10
  TOC