데이터 정렬 & 집계

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

juooo1117

데이터 정렬 & 집계 - pandas 본문

python

데이터 정렬 & 집계 - pandas

Hyo__ni 2023. 9. 29. 16:52

데이터를 불러와서 정렬하고 일정한 조건으로 집계해 보자.

1. 데이터 정렬

df.sort_index()  # 오름차순 정렬
df.sort_index(ascending=False)   # 내림차순 정렬

df.sort_index(axis=1)  # column 이름을 기준으로 정렬
df.sort_index().loc['A':'B']   # A로 시작하는 값들만 조회('B'는 포함아님)

df.sort_values(by='Country')  # Country를 기준으로 오름차순 정렬
df.sort_values(by='Country', ascending=False)  # 내림차순 정렬

df.sort_values(['Country','Region'], ascending=[True, False])[['Country','Region']].head(10)
df.query('Age > 20').sort_values('Country')  # 'Age'가 20이상인 값을 'Country'기준으로 정렬

2. 데이터 집계

df.mode()   # 최빈값 
df.max()    # 최대값
df.min()    # 최소값
df.mean(numeric_only=True)  # 평균값, 숫자만 계산
df.sum(numeric_only=True)  # 합계, 숫자만 계산

df[['Age','Extra']].sum()  # column 선택 후 집계
df.select_dtypes(include='object').mode()   # 값 형식이 'object'인 것들만 최빈값구함
df.mean(numeric_only=True, skipna=False)    # 결측치가 있다면 결과는 NaN!

3. 데이터 집계 - aggregate

df['Age'].agg(['min','max'])  # 'Age' column의 min,max값 출력
df['Age','Year'].aggregate(['mean','std']).T   # 평균, 표준편차를 조회하고 transpose 해서 출력

# column별로 다른 집계를 출력
d = {'Age':'mean', 'Year':['min','max']}
df.agg(d)

# 사용자 정의 집계 함수
def min_max_diff(column):
    if column.dtype == 'object' or column.dtype == 'str':
       return None
    return column.max() - column.min()
 
# 동일한 결과!!
min_max_diff(df.['Age'])
df['Age'].agg(min_max_diff)

df['Age'].agg(['min','max', min_max_diff])  # 'Age'의 'min','max','min_max_diff'값 출력

4. 데이터 집계 - groupby

df['Country'].value_counts()  # 'Country'열의 고유값(unique value) 반환

# 나라별 나이의 평균
df.groupby('Country')['Age'].mean(numeric_only=True)
df.groupby('Country')['Age','Year'].mean()
df.groupby('Country').agg({'Age':'mean', 'Year':'min'})

df.groupby(['Country','Region'])['Age'].mean()   # 나라별, 지역별 평균

# 나라별로 나이의 평균이 25이상인 것만 조회
result = df.groupby('Country')['Age'].mean()
result[result >= 25]

# 'count' : 결측치를 제외한 원소개수
df.groupby('Country')['Region'].agg("count")

# 'Country'별로 'Age'의 'mode','mean'값을 구하고 'mean'기준으로 정렬
df.groupby('Country')['Age'].agg(['mode','mean']).sort_values('mean')


dic = {'Age':'mean','Year':'max'}
result = df.groupby(['Country','Region']).agg(dic).rename(columns = {'Age':'Age_mean', 'Year':'Max'})
result[(result['Age_mean'] >= 20) & (result['Year_max'] = 2002)]

'python' 카테고리의 다른 글

데이터프레임 합치기 - pandas (0)	2023.10.01
데이터 처리(조회,변환,집계,일괄처리,범주형) - pandas (0)	2023.10.01
데이터 불러오기 & 조회 및 변경 - pandas (0)	2023.09.29
Comprehension (0)	2023.08.17
if문, while문, for in문(range, enumerate, zip), continue, break (0)	2023.08.16

'python' Related Articles

juooo1117

데이터 정렬 & 집계 - pandas 본문

데이터 정렬 & 집계 - pandas

1. 데이터 정렬

2. 데이터 집계

3. 데이터 집계 - aggregate

4. 데이터 집계 - groupby

'python' 카테고리의 다른 글

티스토리툴바