1068. Product Sales Analysis I
Description
Pandas Schema
data = [[1, 100, 2008, 10, 5000], [2, 100, 2009, 12, 5000], [7, 200, 2011, 15, 9000]]
sales = pd.DataFrame(data, columns=['sale_id', 'product_id', 'year', 'quantity', 'price']).astype({'sale_id':'Int64', 'product_id':'Int64', 'year':'Int64', 'quantity':'Int64', 'price':'Int64'})
data = [[100, 'Nokia'], [200, 'Apple'], [300, 'Samsung']]
product = pd.DataFrame(data, columns=['product_id', 'product_name']).astype({'product_id':'Int64', 'product_name':'object'})
Table: Sales
+-------------+-------+ | Column Name | Type | +-------------+-------+ | sale_id | int | | product_id | int | | year | int | | quantity | int | | price | int | +-------------+-------+ (sale_id, year) is the primary key (combination of columns with unique values) of this table. product_id is a foreign key (reference column) to Product table. Each row of this table shows a sale on the product product_id in a certain year. Note that the price is per unit.
Table: Product
+--------------+---------+ | Column Name | Type | +--------------+---------+ | product_id | int | | product_name | varchar | +--------------+---------+ product_id is the primary key (column with unique values) of this table. Each row of this table indicates the product name of each product.
Write a solution to report the product_name, year, and price for each sale_id in the Sales table.
Return the resulting table in any order.
The result format is in the following example.
Example 1:
- Input:
Sales table: +---------+------------+------+----------+-------+ | sale_id | product_id | year | quantity | price | +---------+------------+------+----------+-------+ | 1 | 100 | 2008 | 10 | 5000 | | 2 | 100 | 2009 | 12 | 5000 | | 7 | 200 | 2011 | 15 | 9000 | +---------+------------+------+----------+-------+ Product table: +------------+--------------+ | product_id | product_name | +------------+--------------+ | 100 | Nokia | | 200 | Apple | | 300 | Samsung | +------------+--------------+
- Output:
+--------------+-------+-------+ | product_name | year | price | +--------------+-------+-------+ | Nokia | 2008 | 5000 | | Nokia | 2009 | 5000 | | Apple | 2011 | 9000 | +--------------+-------+-------+
- Explanation:
From sale_id = 1, we can conclude that Nokia was sold for 5000 in the year 2008.
From sale_id = 2, we can conclude that Nokia was sold for 5000 in the year 2009.
From sale_id = 7, we can conclude that Apple was sold for 9000 in the year 2011.
Submitted Code
import pandas as pd
def sales_analysis(sales: pd.DataFrame, product: pd.DataFrame) -> pd.DataFrame:
return product.merge(sales, on='product_id', how='inner')[['product_name', 'year', 'price']]
Runtime: 328 ms | Beats 85.59%
Memory: 70.60 MB | Beats 11.06%
product 데이터프레임을 기준으로 합치고, inner 방식으로 일치하는 값이 없는 행은 제거했다. 또 on 파라미터로 두 데이터프레임에 공통으로 존재하는 열을 명시해서 효율을 높였다.
Other Solutions
1st
SELECT product_name, year, price
FROM Sales
JOIN Product USING(product_id);
두 테이블에 동일한 컬럼명(product_id)이 있을 때만 USING을 사용하여 해당 컬럼명을 조건으로 join할 수 있다.
2nd
SELECT p.product_name, s.year, s.price
FROM Sales s
JOIN Product p
ON s.product_id = p.product_id;
ON은 두 테이블에 동일한 컬럼명이 없어도 사용 가능하고 join 조건을 다양하게 설정할 수 있다.