Description

Pandas Schema

data = [['A', 'Math'], ['B', 'English'], ['C', 'Math'], ['D', 'Biology'], ['E', 'Math'], ['F', 'Computer'], ['G', 'Math'], ['H', 'Math'], ['I', 'Math']]
courses = pd.DataFrame(data, columns=['student', 'class']).astype({'student':'object', 'class':'object'})


Table: Courses

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| student     | varchar |
| class       | varchar |
+-------------+---------+
(student, class) is the primary key (combination of columns with unique values) for this table.
Each row of this table indicates the name of a student and the class in which they are enrolled.

Write a solution to find all the classes that have at least five students.

Return the result table in any order.

The result format is in the following example.

Example 1:

  • Input:
      Courses table:
      +---------+----------+
      | student | class    |
      +---------+----------+
      | A       | Math     |
      | B       | English  |
      | C       | Math     |
      | D       | Biology  |
      | E       | Math     |
      | F       | Computer |
      | G       | Math     |
      | H       | Math     |
      | I       | Math     |
      +---------+----------+
      
  • Output:
      +---------+
      | class   |
      +---------+
      | Math    |
      +---------+
      
  • Explanation:
    • Math has 6 students, so we include it.
    • English has 1 student, so we do not include it.
    • Biology has 1 student, so we do not include it.
    • Computer has 1 student, so we do not include it.

Submitted Code

import pandas as pd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    groups = courses.groupby('class', as_index=False).count()
    return groups[groups['student'] >= 5][['class']]

Runtime: 255 ms | Beats 88.53%
Memory: 68.06 MB | Beats 62.98%

groupby()와 count()를 사용했다.

Other Solutions

1st

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    df = courses.groupby('class')['class'].size().reset_index(name='count')
    return df[df['count']>=5][['class']]

count() 대신 size()로 그룹별 행 갯수를 셀 수도 있다.

2nd

def find_classes(courses: pd.DataFrame) -> pd.DataFrame:
    return courses['class'].value_counts()[(courses['class'].value_counts()>4)].index.to_frame()

value_counts()로 class 열에서 각 클래스가 등장한 횟수를 셀 경우 클래스 이름은 인덱스로, 등장 횟수는 값으로 반환된다. 5 이상만 필터링한 후 인덱스만 가져와서 이를 DataFrame으로 변환했다. 참고로 courses['class'].value_counts()까지를 변수에 저장하면 함수 중복 호출을 줄일 수 있다.

3rd

SELECT class
FROM Courses
GROUP BY class
HAVING COUNT(student) >= 5;

MySQL에서는 GROUP BY와 HAVING을 사용한다.

Leave a comment