SQL 기초수업요약

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

rubus0304 님의 블로그

SQL 기초수업요약 본문

Data Analyst/Weekly

SQL 기초수업요약

rubus0304 2024. 10. 4. 15:57

BDeaver

'테이블' - 엑셀파일, '컬럼' - 각 열 / ctrl + Enter 또는 ▷ (실행)

(모든 토글 열고 닫는 단축키) Windows: Ctrl +alt+t //Mac :⌘+⌥+t

SELECT, FROM 문 (데이터 조회)

select * (전체) /

select 컬럼1, 컬럼2 (필요한 항목만) /

select 컬럼 as " 한글, 특수문자 (-하이픈 포함)" 또는 select 컬럼 (한칸 띄어쓰기) 별명(영문, 언더바)

from 테이블

WHERE 절 (데이터 필터링) (* 반드시 from 다음에 기재)

: 비교연산

ex) (select from 뒤 ) where age >= 30

: BETWEEN 숫자 and 숫자

ex) (select from 뒤) where age between 20 and 30

: IN (A, B, C) 포함(숫자,'문자')

ex) (select from 뒤) where age in (30, 20) / where name in ('우향숙', '김보민')

: LIKE ('문자%', '%문자%', '문자%')

ex) (select from 뒤) where like 'B%' (B로 시작하는 문자)

: AND, OR, NOT

ex) (select from 뒤) where age >=20 and gender = 'male' /

ex) where pay_type='card' or vat <=0.2

ex) where not gender = 'female'

(응용) 상품 준비시간이 20~30분 사이인 '한국음식'의 식당명과 고객번호 조회하기

SELECT restaurant_name, customer_id

from food_orders

where food_preparation_time between 20 and 30 and cuisine_type = 'Korean'

**밑줄 부분 혹시 몰라 과제힌트보고 알아챔.

'한국'음식 이니까 restaurant_name 만 하는게 아니라 조건식에 cuisine type = 'Korean' 도 추가!

1. 계산하기: + (더하기), - (빼기), * (곱하기), / (나누기), Sum (합계), Average (평균)

ex) 음식주문 테이블에서 음식준비시간, 배달시간을 뽑고, 준비시간+배달시간 합계 (합계별명 total_time)

select food_preparation_time, delivery time, food_preparation_time + delivery_time as total_time

from food_orders

2. 갯수구하기: Count

COUNT(컬럼) * 컬럼명 대신 1 혹은 * 사용 가능 )

몇개의 값을 가지고 있는지 구할 때 : DISTINCT

select count (*)/ count (1) - 전체

select count (distinct customer_id) 주문을 한 고객 갯수

ex) 주문건수와, 주문 한 고객 수 구하기 (별명포함)

select count (1) count_of_orders,

select count (distinct customer_id) count_of_customers

from food_orders

3. 최대, 최소값: max, min

ex) 주문 가격의 최솟값, 최댓값 구하기 (별명포함)

select min(price) min_price,

max(price) max_price

from food_orders

< Where구문과 응용 >

1) 주문금액이 30,000원 이상인 주문건의 갯수 구하기 (별명포함)

select count(order_id) count_of_orders

from food_orders

where price>= 30000

** select count(*) 아님!

2) 한국음식의 주문 당 평균 음식가격 구하기 (별명포함)

select avg(price) as average_price

from food_orders

where cuisine_type = 'Korean'

4. 범주 구하기: group by 컬럼 (select 뒤 동일컬럼, 뽑고자하는 데이터)

ex) 음식 종류별 주문 금액 합계

select cusine_type,

sum(price) sum_of_price

from food_orders

group by cuisine_type

ex) 음식점별 주문 금액 최댓값 조회하기

select restaurante_name,

max(price) "최대 주문금액"

from food_orders

group by restaurant_name

ex) 결제 타입별 "가장 최근" 결제일 조회하기

select pay_type,

max(date) "최근 결제일"

from payments

grounp by pay_type

5. 정렬하기: order by (그대로는 오름차순) , order by 컬럼 desc (내림차순)

ex) 음식점별 주문 금액 최댓값 조회하기 - 최댓값 기준으로 내림차순 정렬

select restaurant_name,

max(price) "최대 주문금액"

from food_orders

group by restaurant_name

order by max(price) desc

ex) 고객을 이름 순으로 오름차순으로 정렬하기

select *

from customers

order by name

6. SQL 구조 마스터 - WHERE, GROUP BY, ORDER BY

과제:

음식 종류별 가장 높은 주문 금액과 가장 낮은 주문금액을 조회하고, 가장 낮은 주문금액 순으로 (내림차순) 정렬하기

select cuisine_type,

max(price) max_price,

min(price) min_price

from food_orders

group by cuisine_type

order by min(price) desc

1. 특정문자를 다른 문자로 바꾸기

Replace(바꿀 컬럼, 현재 값, 바꿀 값)

ex) 식당 명의 ‘Blue Ribbon’ 을 ‘Pink Ribbon’ 으로 바꾸기

select restaurant_name as "원래 상점명",

replace(restaurant_name, 'Blue', 'Pink') as "바뀐 상점명"

from food_orders

where restaurant_name like '%blue Ribbon%'

ex) 주소의 ‘문곡리’ 를 ‘문가리’ 로 바꾸기

select addr as "원래 주소",

replace(addr, '문곡리', '문가리') as "바뀐 주소"

from food_orders

where addr like '%문곡리'

2. 원하는 문자만 남기기

Substr(조회 할 컬럼, 시작 위치, 글자 수)

ex) 서울 음식점들의 주소를 전체가 아닌 ‘시도’ 만 나오도록 수정

select addr "원래주소",

substr(addr,1,2) "시도"

from food_orders

where addr like '%서울특별시%'

3. 여러 컬럼의 문자를 합치기

Concat(붙이고 싶은 값1, 붙이고 싶은 값2, 붙이고 싶은 값3, .....)

ex) 서울시에 있는 음식점은 ‘[서울] 음식점명’ 이라고 수정

select restaurant_name "원래 이름",

addr "원래 주소",

concat('[', substr(addr,1,2), ']', restaurant_name) "바뀐 이름"

from food_orders

where addr like '%서울특별시%'

**concat (문자는 ' ', 함수 붙일 땐 ' ' 제외)

4. 문자데이터 바꾸고 Group by

ex) 서울 지역의 음식 타입별 평균 음식 주문금액 구하기 (출력 : ‘서울’, ‘타입’, ‘평균 금액’)

select substr(addr,1,2) "시도",

cuisine_type "음식 종류",

avg(price) "평균금액"

from food_orders

where addr like '%서울특별시%

group by substr(addr,1,2), cuisine_type (= group by 1, 2 (1번째, 2번째 컬럼))

ex) 이메일 도메인별 고객 수와 평균 연령 구하기 (아이디부분 8로 통일된 예제)

select substr(email, 10) "도메인",

count(distinct customer_id) "고객 수",

avg(age) "평균 연령"

from customers

group by 1

** 10인 이유 @ 없는 메일주소도 있음 도메인별로 묶이려면 @ 제외한 10번째자리 부터

** count(distinct 해도 되고), count(customer_id) 컬럼 그대로 써도 됨.

ex) [지역(시도)] 음식점이름 (음식종류)’ 컬럼을 만들고, 총 주문건수 구하기

SELECT concat('[',substr(addr,1,2),']', restaurant_name, ' (',cuisine_type,')')"바뀐 이름",

count(1) "주문건수"

from food_orders

group by 1

** 총 주문''건수"는 음식점별 주문건수이므로 sum이 아닌 count(1).

5. 조건에 따라 다른 방법을 적용하고 싶을 때

if(조건, 조건을 충족할 때, 조건을 충족하지 못할 때)

ex) 음식 타입을 ‘Korean’ 일 때는 ‘한식’, ‘Korean’ 이 아닌 경우에는 ‘기타’ 라고 지정

select restaurant_name,

cuisine_type '원래음식타입',

if(cuisine_type='Korean', '한식', '기타') '음식타입'

FROM food_orders

**오타조심! cuisine

ex) Replace 예시에서 ‘문곡리’ 가 평택에만 해당될 때, 평택 ‘문곡리’ 만 ‘문가리’ 로 수정

select addr as "원래 주소",

if(addr like '%평택%',replace(addr,'문곡리','문가리'), addr)as "바뀐주소"

FROM food_orders

where addr like '%문곡리'

**정답에선 addr 대신 '문가리' (참고)

ex) ★ ★ ★ Group by 2번째 예시에서 잘못된 이메일 주소 (gmail) 만 수정을 해서 사용 ★ ★ ★

select substr(if(email like '%gmail%', replace(email, 'gmail', '@gmail'), email), 10) "도메인",

count(customer_id) "고객수",

avg(age) "평균연령"

from customers

group by 1

** 어려웠다 substr 안에 if 를 넣을 줄은...!!!

잘못된 이메일 주소만 수정할 땐, if(email like '%gmail%', replace ('gmail, '@gmail', email (그 외 그냥 email) //

그리고 substr(범위 (if 절 조건 넣을 수 있음, 그 조건으로 바꾼 범위), 10) - 10번째 이하부터 끝까지 추출★★

6. 조건을 여러가지 지정하고 싶을 때

Case (when~then, else, end)

ex) 음식 타입을 ‘Korean’ 일 때는 ‘한식’, ‘Japanese’ 혹은 ‘Chienese’ 일 때는 ‘아시아’, 그 외에는 ‘기타’ 라고 지정

select restaurant_name,

cuisine_type as "원래 음식타입",

case when cuisine_type='Korean' then '한식'

when cuisine_type='Japanese' then '일식'

when cuisine_type='Chinese' then '중식'

else '기타'

end as "음식 타입"

ex) 음식 단가를 주문 수량이 1일 때는 음식 가격, 주문 수량이 2개 이상일 때는 음식가격/주문수량 으로 지정

select order_id,

price,

quantity,

case when quantity=1 then price

when quantity>=2 then price/quantity

end '음식단가'

from food_orders

*위와 같은 식은 if 로도 쓸 수 있죠! 한 번 연습해보시는 것을 추천합니다!

select order_id,

price,

quantity,

if(quantity>=2, price/quantity, price) '음식단가'

from food_orders

ex) 주소의 시도를 ‘경기도’ 일때는 ‘경기도’, ‘특별시’ 혹은 ‘광역시’ 일 때는 붙여서, 아닐 때는 앞의 두 글자만 사용

select restaurant_name,

addr '원래주소',

case when addr like '경기도%' then '경기도'

when addr like '%특별시%' or addr like '%광역시%' then substr(addr,1,5)

else substr(addr,1,2)

end '시도'

from food_orders

** 경기도 는 단순하니까 Substr 할 필요 없이 그냥 '경기도'로 별명// '특별시' 혹은 '광역시' 는 요청결과가 동일하므로 case when 안에서 or로 묶어서 간단하게! 여기서는 시도가 다르므로 subsr 로 5섯글자까지 추출!

< 조건을 사용할 수 있는 경우 >

1) 새로운 카테고리 만들기 (ex. 음식타입, 고객분류 등)

2) 연산식 적용할 조건 지정하기 (ex. 현금일 때 수수료율, 카드일 때 수수료율 혹은 계산방식 다르게 적용시)

3) 다른 문법 안에서 적용하기 (- if, case 문 안에 다른 문법이나 연산 적용가능

(ex. concat문으로 여러 컬럼 합칠 때, rating 있는 경우 rating을 넣어주고 없는 경우 아무것도 넣지 않도록 concat문 안에 if문 넣을 수 있음)

7. User Segmentation

ex) ★ ★ ★10세 이상, 30세 미만의 고객의 나이와 성별로 그룹 나누기 (이름도 같이 출력) ★ ★ ★

select name,

age,

gender,

case when (age between 10 and 19) and gender='male' then '10대 남자'

when (age between 10 and 19) and gender='female' then '10대 여자'

when (age between 20 and 29) and gender='male' then '20대 남자'

when (age between 20 and 29) and gender='female' then '20대 여자'

end '연령대'

from customers

where age between 10 and 29

** 나이와 성별로 그룹 나누라고 하는데 when 안에 '괄호'로 조건1 (나이범위) 묶어주고 and로 조건2 추가 then !!

** 그 외 나이들은 NULL 이 나와서 요거 어떡하지 ? 했는데,, where 로 딱 그 범위만 지정해줄 수 있음!!

ex) ★ ★ ★ 음식 단가, 음식 종류 별로 음식점 그룹 나누기 ★ ★ ★

select restaurant_name,

price/quantity '단가',

cuisine_type,

order_id,

case when (price/quantity <5000) and cuisine_type ='Korean' then '한식1'

when (price/quantity between 5000 and 15000) and cuisine_type ='Korean' then '한식2'

when (price/quantity >15000) and cuisine_type = 'Korean' then '한식3'

when (price/quantity <5000) and cuisine_type in('Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '아시아1'

when (price/quantity between 5000 and 15000) and cuisine_type in('Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '아시아2'

when (price/quantity >15000) and cuisine_type in('Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '아시아3'

when (price/quantity <5000) and cuisine_type not in ('Korean','Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '기타1'

when (price/quantity between 5000 and 15000) and cuisine_type not in ('Korean','Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '기타2'

when (price/quantity >15000) and cuisine_type not in ('Korean','Japanese', 'Chinese', 'Thai', 'Vietnamese', 'Indian') then '기타3'

end 식당그룹

from food_orders

** 단가를 price/quantity 로 기재! 아시아식 할 때 in 으로 묶은 것. 그외를 역발상으로 not in 으로 앞에꺼 다 묶어서 이게 아닌 것들 '기타' 로 보내버림. segmentation 은 case로 2개 이상 조건 묶어서 한번에 나누기!

<조건문으로 서로 다른 식 적용한 수수료 구해보기>

지역과 배달시간을 기반으로 배달수수료 구하기 (식당 이름, 주문 번호 함께 출력)
(지역 : 서울, 기타 - 서울일 때는 수수료 계산 * 1.1, 기타일 때는 곱하는 값 없음 시간 : 25분, 30분 - 25분 초과하면 음식 가격의 5%, 30분 초과하면 음식 가격의 10%)★ ★ ★

select restaurant_name,

order_id,

delivery_time,

price,

addr,

case when delivery_time>25 and delivery_time<=30 then price * 0.05 * (if(addr like'%서울%', 1.1, 1))

when delivery_time>30 then price * 1.1 * (if(addr like'%서울%', 1.1, 1))

else 0 end '수수료'

from food_orders

** 식을 세울 때, 앞에 있다고 먼저 쓰는게 아니라, 효율적으로 식 세우는 연습! // 곱하는 값이 없는 것도 있는 '서울' 조건은 if 르로 추후에 조건에 따라 곱하기로 뒤에 붙히고// 25분 초과일 때, 30분 초과일 때 조건이 다르므로 이 식으로 case when은 나눔!!// 마지막으로 NULL 안 나오게 그 밖에 수수료는 else 0 적어주기!

주문 시기와 음식 수를 기반으로 배달할증료 구하기
(주문 시기 : 평일 기본료 = 3000 / 주말 기본료 = 3500 음식 수 : 3개 이하이면 할증 없음 / 3개 초과이면 기본료 * 1.2)

** CASE 문으로 혼자

SELECT order_id,

price,

quantity,

day_of_the_week,

case when day_of_the_week = 'weekday' then 3000*if(quantity>3,1.2,1)

when day_of_the_week = 'weekend' then 3500*if(quantity>3,1.2,1)

end '배달할증료'

FROM food_orders

** IF문 으로 (정답)

SELECT order_id,

price,

quantity,

day_of_the_week,

if(day_of_the_week='weekday',3000,3500)*(if(quantity>3,1.2,1)) '배달할증료'

FROM food_orders

** 조건이 주중 아니면 주말이므로 IF 문 가능!! 곱하기 뒤 IF문 또 가능// 조건이 YES of NO 일 때 IF문 으로 좀더 간단하게 게!!

8. 오류해결

문자/숫자 계산을 했더니 오류 -> avg, substring 등 함수를 썼더니 에러메세지에 ‘data type’ 단어가 뜨면서 실행되지 않아요

-> 우리가 실습하는 Mysql 과 다르게, 다른 SQL 문법에서는 data type 이 다를 때 연산이 되지 않을 수 있음.

예를 들면, rating 은 숫자가 포함되어 있지만 문자 형으로 저장이 되어있음.

(출력 결과 컬럼명 옆의 ‘123’ 는 숫자 ‘ABC’ 는 문자로 저장이 되어있다는 의미.)

따라서 문자, 숫자를 혼합하여 함수에 사용 할 때에는 데이터 타입을 변경

--숫자로 변경

cast(if(rating='Not given', '1', rating) as decimal)

--문자로 변경

concat(restaurant_name, '-', cast(order_id as char))

[과제]

다음의 조건으로 배달시간이 늦었는지 판단하는 값을 만들어주세요. - 주중 : 25분 이상 - 주말 : 30분 이상

select order_id,

restaurant_name,

day_of_the_week,

delivery_time,

case when day_of_the_week='weekday' and delivery_time>=25 then 'Late'

when day_of_the_week='weekend' and delivery_time>=30 then 'Late'

else "On_time"

end '지연여부'

from food_orders

** restaurante 이름추가 // 지연되지 않은 값은 else 'On_time'으로 지정//

1. 여러 번의 연산을 한 번의 SQL 문으로 수행하기 (Subquery)

ex) Subquery 문 안을 수정해서, 음식 주문시간이 25분보다 초과한 시간을 가져오기 (주문 테이블에서 주문 번호, 음식점명, 음식 준비시간)

select order_id,

restaurant_name,

if(over_time>=0, over_time, 0) 'over_time'

from

(

select order_id,

restaurant_name,

food_preparation_time-25 'over_time'

from food_orders

) a

1-1. User Segmentation 과 조건별 수수료를 Subquery 로 결합해보기

ex) ★ ★ ★ 음식점의 평균 단가별 segmentation 을 진행하고, 그룹에 따라 수수료 연산하기 ★ ★ ★

- 수수료 구간 -

~5000원 미만 0.05%

~20000원 미만 1%

~30000원 미만 2%

30000원 초과 3%

select restaurant_name,

price_per_plate*ratio_of_add "수수료"

from

(

SELECT restaurant_name,

case when price_per_plate<5000 then 0.005

when price_per_plate between 5000 and 19999 then 0.01

when price_per_plate between 20000 and 29999 then 0.02

else 0.03 end ratio_of_add,

price_per_plate

from

(

select restaurant_name,

avg(price/quantity) price_per_plate

from food_orders

group by 1

) a

) b

**어렵다 어려워 ㄷㄷ,, 일주일에 한 번씩 반복해서 쳐보면서 이해해보자

ex) 음식점의 지역과 평균 배달시간으로 segmentation 하기

select restaurant_name,

sido,

case when avg_time<=20 then '<=20'

when avg_time>20 and avg_time <=30 then '20<x<=30'

when avg_time>30 then '>30' end time_segment

from

(

select restaurant_name,

substring(addr, 1, 2) sido,

avg(delivery_time) avg_time

from food_orders

group by 1, 2

) a

1-2. 복잡한 연산을 Subquery로 수행하기

ex) 음식 타입별 총 주문수량과 음식점 수를 연산하고, 주문수량과 음식점수 별 수수료율을 산정하기

(음식점수 5개 이상, 주문수 30개 이상 → 수수료 0.5% 음식점수 5개 이상, 주문수 30개 미만 → 수수료 0.8% 음식점수 5개 미만, 주문수 30개 이상 → 수수료 1% 음식점수 5개 미만, 주문수 30개 미만 → 수수로 2%)

select cuisine_type,

total_quantity,

count_res,

case when count_res>=5 and total_quantity>=30 then 0.005

when count_res>=5 and total_quantity<30 then 0.008

when count_res<5 and total_quantity>=30 then 0.01

when count_res<5 and total_quantity<30 then 0.02 end rate

from

(

select cuisine_type,

sum(quantity) total_quantity,

count(distinct restaurant_name) count_res

from food_orders

group by 1

) a

ex) 음식점의 총 주문수량과 주문 금액을 연산하고, 주문 수량을 기반으로 수수료 할인율 구하기

(할인조건 수량이 5개 이하 → 10% 수량이 15개 초과, 총 주문금액이 300000 이상 → 0.5% 이 외에는 일괄 1%)

select restaurant_name,

case when sum_of_quantity<=5 then 0.1

when sum_of_quantity>15 and sum_of_price>=300000 then 0.005

else 0.01 end ratio_of_add

from

(

select restaurant_name,

sum(quantity) sum_of_quantity,

sum(price) sum_of_price

from food_orders

group by 1

) a

2. 필요한 데이터가 서로 다른 테이블에 있을 때 조회하기 JOIN

LEFT JOIN : 공동 컬럼을 기준으로 다른 테이블에 값이 없더라도 모두 조회.

INNER JOIN : 공통 컬럼을 기준으로 두 테이블에 모두 있는 값만 조회.

ex) 주문 테이블과 고객 테이블을 cusomer_id 를 기준으로 left join 으로 묶어보기

(조회 컬럼 : order_id, customer_id, restaurant_name, price, name, age, gender)

select a.order_id,

a.customer_id,

a.restaurant_name,

a.price,

b.name,

b.age,

b.gender

from food_orders a left join customers b on a.customer_id=b.customer_id

2-1. JOIN 으로 두 테이블의 데이터 조회하기

ex) 한국 음식의 주문별 결제 수단과 수수료율을 조회하기

(조회 컬럼 : 주문 번호, 식당 이름, 주문 가격, 결제 수단, 수수료율) *결제 정보가 없는 경우도 포함하여 조회

select a.order_id,

a.restaurant_name,

a.price,

b.pay_type,

b.vat

from food_orders a left join payments b on a.order_id=b.order_id

where cuisine_type='Korean'

ex) 고객의 주문 식당 조회하기 (NULL 값 제거 포함)

(조회 컬럼 : 고객 이름, 연령, 성별, 주문 식당) *고객명으로 정렬, 중복 없도록 조회

select distinct c.name,

c.age,

c.gender,

f.restaurant_name

from food_orders f left join customers c on f.customer_id=c.customer_id

where c.name is not null

order by c.name

2-2. JOIN 으로 두 테이블의 값을 연산하기

ex) 주문 가격과 수수료율을 곱하여 주문별 수수료 구하기

(조회 컬럼 : 주문 번호, 식당 이름, 주문 가격, 수수료율, 수수료) *수수료율이 있는 경우만 조회

select a.order_id,

a.restaurant_name,

a.price,

b.vat,

a.price*b.vat "수수료율"

from food_orders a inner join payments b on a.order_id=b.order_id

ex) 50세 이상 고객의 연령에 따라 경로 할인율을 적용하고, 음식 타입별로 원래 가격과 할인 적용 가격 합을 구하기

- 조회 컬럼 : 음식 타입, 원래 가격, 할인 적용 가격, 할인 가격) 할인 : 나이-50* 0.005

- 고객 정보가 없는 경우도 포함하여 조회, 할인 금액이 큰 순서대로 정렬

select cuisine_type,

sum(price) "원래 가격",

sum(price)-sum(discount_price) "할인 적용 가격",

sum(discount_price) "할인 가격"

from

(

select a.cuisine_type,

price,

price*((b.age-50)*0.005) discount_price

from food_orders a inner join customers b on a.customer_id=b.customer_id

where b.age>=50

) t

group by 1

order by 4 desc

<과제> 식당별 평균 음식 주문 금액과 주문자의 평균 연령을 기반으로 Segmentation 하기

- 평균 음식 주문 금액 기준 : 5,000 / 10,000 / 30,000 / 30,000 초과 - 평균 연령 : ~ 20대 / 30대 / 40대 / 50대 이상

select restaurant_name,

case when price <=5000 then 'price_group1'

when price >5000 and price <=10000 then 'price_group2'

when price >10000 and price <=30000 then 'price_group3'

when price >30000 then 'price_group4' end price_group,

case when age <30 then 'age_group1'

when age between 30 and 39 then 'age_group2'

when age between 40 and 49 then 'age_group3'

else 'age_group4' end age_group

from

(

select a.restaurant_name,

avg(price) price,

avg(age) age

from food_orders a inner join customers b on a.customer_id=b.customer_id

group by 1

) t

order by 1

식당별 평균 음식 주문 금액과 주문자의 평균 연령을 기반으로 Segmentation 하기 - 평균 음식 주문 금액 기준 : 5,000 / 10,000 / 30,000 / 30,000 초과 - 평균 연령 : ~ 20대 / 30대 / 40대 / 50대 이상

과제 .. 역시..난이도 중

평균 음식 주문금액 case 하고 , !! 콤마 +_+

평균 연력대 case 구문 나이는 > 초과랑 between 으로

subquery는 group by 까지 묶고

전체 정렬은 subquery 밖에서 마지막에 order by

1. 없는 값을 제외해주기 NULL

ex)

select a.order_id,

a.customer_id,

a.restaurant_name,

a.price,

b.name,

b.age,

b.gender

from food_orders a left join customers b on a.customer_id=b.customer_id

where b.customer_id is not null

2. 다른 값을 대신 사용하기

select a.order_id,

a.customer_id,

a.restaurant_name,

a.price,

b.name,

b.age,

coalesce(b.age, 20) "null 제거",

b.gender

from food_orders a left join customers b on a.customer_id=b.customer_id

where b.age is null

coalesce(값이 없는 컬럽, 20) 값이 없는 컬럼이면, 20 을 줘라

3. 조회한 데이터가 상식적이지 않은 값을 가지고 있다면 범위지정해주기

[방법] 조건문으로 값의 범위를 지정하기

조건문으로 가장 큰 값, 가장 작은 값의 범위를 지정해 줄 수 있습니다. → 상식적인 수준 안에서 범위를 지정해줍니다.
위의 나이의 경우 아래와 같은 범위를 지정해 줄 수 있습니다.

select customer_id,

name,

email,

gender,

age,

case when age<15 then 15

when age>80 then 80

else age end "범위를 지정해준 age"

from customers

범위를 지정해준 결과, 15세 미만이거나 80세 초과인 경우 15, 80으로 각각 대체된 것을 확인할 수 있습니다.

4. Pivot Table 만들기

- Pivot Table : 2개 이상의 기준으로 데이터를 집계할 때, 보기 쉽게 배열하여 보여주는 것을 의미

ex) 음식점별 시간별 주문건수 Pivot Table 뷰 만들기 (15~20시 사이, 20시 주문건수 기준 내림차순)

1) 음식점별, 시간별 주문건수 집계하기

select a.restaurant_name,

substring(b.time, 1, 2) hh,

count(1) cnt_order

from food_orders a inner join payments b on a.order_id=b.order_id

where substring(b.time, 1, 2) between 15 and 20

group by 1, 2

2) Pivot view 구조 만들기

select restaurant_name,

max(if(hh='15', cnt_order, 0)) "15",

max(if(hh='16', cnt_order, 0)) "16",

max(if(hh='17', cnt_order, 0)) "17",

max(if(hh='18', cnt_order, 0)) "18",

max(if(hh='19', cnt_order, 0)) "19",

max(if(hh='20', cnt_order, 0)) "20"

from

(

select a.restaurant_name,

substring(b.time, 1, 2) hh,

count(1) cnt_order

from food_orders a inner join payments b on a.order_id=b.order_id

where substring(b.time, 1, 2) between 15 and 20

group by 1, 2

) a

group by 1

order by 7 desc

ex) 성별, 연령별 주문건수 Pivot Table 뷰 만들기 (나이는 10~59세 사이, 연령 순으로 내림차순)

1) 성별, 연령별 주문건수 집계하기

select b.gender,

case when age between 10 and 19 then 10

when age between 20 and 29 then 20

when age between 30 and 39 then 30

when age between 40 and 49 then 40

when age between 50 and 59 then 50 end age,

count(1)

from food_orders a inner join customers b on a.customer_id=b.customer_id

where b.age between 10 and 59

group by 1, 2

2) Pivot view 구조 만들기

select age,

max(if(gender='male', order_count, 0)) male,

max(if(gender='female', order_count, 0)) female

from

(

select b.gender,

case when age between 10 and 19 then 10

when age between 20 and 29 then 20

when age between 30 and 39 then 30

when age between 40 and 49 then 40

when age between 50 and 59 then 50 end age,

count(1) order_count

from food_orders a inner join customers b on a.customer_id=b.customer_id

where b.age between 10 and 59

group by 1, 2

) t

group by 1

order by age

5. 업무 시작을 단축시켜 주는 마법의 문법 (Window Function - RANK, SUM)

window Function : 각 행의 관계를 정의하기 위한 함수로 그룹 내의 연산을 쉽게 만들어줍니다.

- RANK, SUM

1) N 번째까지의 대상을 조회하고 싶을 때, Rank

ex) 음식 타입별로 주문 건수가 가장 많은 상점 3개씩 조회하기

(1) 음식 타입별, 음식점별 주문 건수 집계하기

select cuisine_type, restaurant_name, count(1) order_count

from food_orders

group by 1, 2

(2) Rank 함수 적용하기

select cuisine_type,

restaurant_name,

rank() over (partition by cuisine_type order by order_count desc) rn,

order_count

from

(

select cuisine_type, restaurant_name, count(1) order_count

from food_orders

group by 1, 2

) a

(3) 3위까지 조회하고, 음식 타입별, 순위별로 정렬하기

select cuisine_type,

restaurant_name,

order_count,

rn "순위"

from

(

select cuisine_type,

restaurant_name,

rank() over (partition by cuisine_type order by order_count desc) rn,

order_count

from

(

select cuisine_type, restaurant_name, count(1) order_count

from food_orders

group by 1, 2

) a

) b

where rn<=3

order by 1, 4

2) 전체에서 차지하는 비율, 누적합을 구할 때, Sum

ex) 각 음식점의 주문건이 해당 음식 타입에서 차지하는 비율을 구하고, 주문건이 낮은 순으로 정렬했을 때 누적 합 구하기

(1) 음식 타입별, 음식점별 주문 건수 집계하기

select cuisine_type, restaurant_name, count(1) order_count

from food_orders

group by 1, 2

(2) 카테고리별 합, 카테고리별 누적합 구하기

select cuisine_type,

restaurant_name,

cnt_order,

sum(cnt_order) over (partition by cuisine_type) sum_cuisine,

sum(cnt_order) over (partition by cuisine_type order by cnt_order) cum_cuisine

from

(

select cuisine_type,

restaurant_name,

count(1) cnt_order

from food_orders

group by 1, 2

) a

order by cuisine_type , cnt_order

*** 누적합 고쳐보기

위 정답에서 order_count가 동일한 값을 일괄로 더해 cum_sum으로 출력되는 문제가 있었어요.

문제 상황 이해
이 상황은 SQL에서 WINDOW 함수(SUM 등)를 사용할 때, SUM으로 동일한 cnt_order 값을 가진 여러 행이 있을 경우, SQL 엔진은 이 값을 한꺼번에 더하는 현상이 발생합니다. cnt_order의 순서를 결정할 명확한 기준이 없으므로 발생한 문제입니다.
해결 방법누적합을 순서대로 표기하기 위해 order by에 cum_cuisine 을 추가해줍니다.
이 문제를 해결하려면 ORDER BY 절에 cnt_order 외에 추가적인 열에 순서를 부여할 수 있는 restaurant_name을 포함시켜야 합니다. 이렇게 하면 동일한 cnt_order 값을 가진 행들이 명확하게 순서가 정해져 누적합이 정상적으로 처리됩니다.

select cuisine_type,

restaurant_name,

cnt_order,

sum(cnt_order) over (partition by cuisine_type) sum_cuisine,

sum(cnt_order) over (partition by cuisine_type order by cnt_order, restaurant_name) cum_cuisine

from

(

select cuisine_type,

restaurant_name,

count(1) cnt_order

from food_orders

group by 1, 2

) a

order by cuisine_type , cnt_order, cum_cuisine

*** 이외 함수들은 검색해서 공부!!

6. 날짜 포맷과 조건까지 SQL 로 한 번에 끝내기 (포맷 함수)

날짜 데이터의 이해
- 문자타입, 숫자타입과 같이 날짜 데이터도 특정한 타입을 가지고 있습니다.
- 년, 월, 일, 시, 분, 초 등의 값을 모두 갖고 있으며 목적에 따라 ‘월’, ‘주’, ‘일’ 등으로 포맷을 변경할 수도 있습니다.

ex) 날짜 데이터의 여러 포맷

yyyy-mm-dd 형식의 컬럼을 date type 으로 변경하기

select date(date) date_type,

date

from payments

ex) date type 을 date_format 을 이용하여 년, 월, 일, 주 로 조회해보기

select date(date) date_type,

date_format(date(date), '%Y') "년",

date_format(date(date), '%m') "월",

date_format(date(date), '%d') "일",

date_format(date(date), '%w') "요일"

from payments

년, 월, 일, 시, 분, 초 등의 값을 모두 갖고 있으며 목적에 따라 ‘월’, ‘주’, ‘일’ 등으로 포맷을 변경할 수도 있습니다. W요일은 1 일요일, 2 월요일, 3 화요일, 4 수요일 ~

ex) 3월 조건으로 지정하고, 년도별로 정렬하기

select date_format(date(date), '%Y') "년",

date_format(date(date), '%m') "월",

date_format(date(date), '%Y%m') "년월",

count(1) "주문건수"

from food_orders a inner join payments b on a.order_id=b.order_id

where date_format(date(date), '%m')='03'

group by 1, 2, 3

order by 1

년도, 월을 포함하여 데이터 가공하기 년도, 월별 주문건수 구하기 3월 조건으로 지정하고, 년도별로 정렬하기

5주차 강의 숙제

음식 타입별, 연령별 주문건수 pivot view 만들기 (연령은 10~59세 사이)

(정답 보고 난 후) 이마를 탁..! 10~19인 경우를 묶어서 10으로 !! 가정을 두 번 줬어!! 디박.. count(1)로 전체갯수 세어주고..+_+

'Data Analyst > Weekly' 카테고리의 다른 글

[데이터분석 4주차] Pandas 기본 문법정리 (1)	2024.11.01
[데이터분석 3주차] 관심 데이터직무 넓히기 & 파이썬 문법복습 (0)	2024.10.20
[데이터분석 2주차] 데이터분석 도메인 이해, 파이썬 기초문법 (2)	2024.10.12
Python 강의요약 (4)	2024.10.08
［데이터분석 1주차］데이터 직무분석，ＳＱＬ기초문법 (4)	2024.10.04

'Data Analyst/Weekly' Related Articles

rubus0304 님의 블로그

SQL 기초수업요약 본문

SQL 기초수업요약

'Data Analyst > Weekly' 카테고리의 다른 글

티스토리툴바