Description

Pandas Schema

data = [[1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 3], [1, 2, 4], [2, 1, 5], [2, 1, 6]]
actor_director = pd.DataFrame(data, columns=['actor_id', 'director_id', 'timestamp']).astype({'actor_id':'int64', 'director_id':'int64', 'timestamp':'int64'})


Table: ActorDirector

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| actor_id    | int     |
| director_id | int     |
| timestamp   | int     |
+-------------+---------+
timestamp is the primary key (column with unique values) for this table.

Write a solution to find all the pairs (actor_id, director_id) where the actor has cooperated with the director at least three times.

Return the result table in any order.

The result format is in the following example.

Example 1:

  • Input:
      ActorDirector table:
      +-------------+-------------+-------------+
      | actor_id    | director_id | timestamp   |
      +-------------+-------------+-------------+
      | 1           | 1           | 0           |
      | 1           | 1           | 1           |
      | 1           | 1           | 2           |
      | 1           | 2           | 3           |
      | 1           | 2           | 4           |
      | 2           | 1           | 5           |
      | 2           | 1           | 6           |
      +-------------+-------------+-------------+
      
  • Output:
      +-------------+-------------+
      | actor_id    | director_id |
      +-------------+-------------+
      | 1           | 1           |
      +-------------+-------------+
      
  • Explanation: The only pair is (1, 1) where they cooperated exactly 3 times.

Submitted Code

import pandas as pd

def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:
    df = actor_director.groupby(['actor_id', 'director_id']).size().reset_index(name='count')
    filtered_pairs = df[df['count'] >= 3]
    return filtered_pairs[['actor_id', 'director_id']]

Runtime: 273 ms | Beats 77.23%
Memory: 67.50 MB | Beats 90.81%

groupby()와 size()로 개수를 센 뒤 reset_index()로 새 컬럼을 만들어서 결과를 저장했다. 그 후 3 이상인 그룹만 필터링하면 된다.

Other Solutions

1st

import pandas as pd

def actors_and_directors(actor_director: pd.DataFrame) -> pd.DataFrame:

    actor_director = actor_director.groupby(['actor_id', 'director_id'])['timestamp'
                      ].size().reset_index()
   
    return actor_director[actor_director.timestamp >= 3].iloc[:,[0,1]]

2nd

select actor_id,director_id
from ActorDirector 
group by actor_id,director_id
Having count(timestamp)>=3;

Leave a comment