Skip to main content
Lab : Hive Queries

Objectives:

To understand and be able to write simple HQL queries.

Due date:

Tuesday, Sep 18 at 8:30 pm (in class exercise).

Tools for the assignment

For this assignment, you will use your Cloudera Live Hive query engine.

What to turn in:

You need to turn in a word document containing the Screenshot of HQL query and the query result for steps 4 – 11 below.

Lab Description

In this lab you are asked to write some HQL queries. The data in this database is already in your Cloudera QuickStart VM that we installed last week.  The database consists of six tables as shown in the db schema – namely categories, customers, departments, order_items, orders and products.

  1. Launch Cloudera QuickStart VM. Open firefox browser and Launch Hue UI.
  2. Click on button on Hue select Files go to /user/hive/warehouse you should see folders for each of the tables mentioned above.
  3. Click on button on Hue select tables select categories Explore metadata related categories table
  4. Run the following query that you ran in Impala in the previous lab. Note the amount of time it takes to give you the result first time.
-- Most popular product categories
select c.category_name, count(order_item_quantity) as count
from order_items oi
inner join products p on oi.order_item_product_id = p.product_id
inner join categories c on c.category_id = p.product_category_id
group by c.category_name
order by count desc
limit 10;
  1. Run the following query that you ran in Impala in the previous lab. Note the amount of time it takes to give you the result first time.
-- top 10 revenue generating products
select p.product_id, p.product_name, r.revenue
from products p inner join
(select oi.order_item_product_id, sum(cast(oi.order_item_subtotal as float)) as revenue
from order_items oi inner join orders o
on oi.order_item_order_id = o.order_id
where o.order_status <> 'CANCELED'
and o.order_status <> 'SUSPECTED_FRAUD'
group by order_item_product_id) r
on p.product_id = r.order_item_product_id
order by r.revenue desc
limit 10;
  1. Create a new HQL query to List all the customers who bought at least one ‘Electronics’ product
  2. List first name and last name of all the customers who purchased products that has ‘Under Armour' in product name.
  3. List top 10 customers with largest total purchase – list first name, last name and total sales for each top customer   
  4. For every customer, find the total number of orders created. Sort the answers in decreasing order of the total number of orders.
  5. Find all the customers who have made at least 10 orders
  6. Find all the customers who have purchased at least 25 distinct products
  7. Submit word document with screenshot of query and query result to Black Board.

Comments

Popular posts from this blog

CUMIPMT and CUMPRINC function

CUMIPMT Cumulative interest payment function allows you to calculate the interest paid for a loan or from an investment from period A to period B. When getting a loan, CUMIPMT function can be used to calculate the total amount of interest paid in the first five months or from period 12 to period 20. A period can be a month, a week or two week. Loan Amount : 350,000.00 APR: 4.5% Down payment: 0.00 Years: 25 Payment per year: 12 From the above data, we can calculate the following: No of Period: 25 × 12 = 300 Periodic Rate: 4.5/12 = 0.375% Here is how you will substitute these values into the function. = CUMIPMT (periodic rate, No of period, vehicle price, start period, end period,  ) = CUMIPMT (0.375, 300, 350000, 1, 5, 0) In an excel worksheet, we use cell address instead of actual values as shown below: Here is the formula view of the worksheet: CUMPRINC Another related function is CUMPRINC. CUMPRINC function is used to calculate cumul

Excel PMT Function

PMT function is very useful for calculating monthly payment required to payback a loan or mortgage at a fixed rate. This function require a minimum of three inputs, periodic rate, number of periods, present value or the loan amount. Here is a simple example. Home Loan: 350,000.00 Interest rate: 4.5% Number of years to repay the loan: 25 Note: To calculate monthly payment, we need to find the monthly rate and number of months as shown above. Then it is simply a matter of substituting the values into the payment function, as shown in the formula view below.

BCG's Brand Advocacy Index

The Boston Consulting Group's (BCG) Brand Advocacy Index (BAI) is a metric developed to help companies measure the degree of customer advocacy for their brands. BAI focuses on the likelihood of customers to recommend a brand to others, which is a powerful indicator of brand strength and customer loyalty. Unlike other customer satisfaction or loyalty metrics, BAI emphasizes the importance of customer referrals and word-of-mouth marketing. BAI is calculated based on a survey where customers are asked about their willingness to recommend a brand to their friends, family, or colleagues. The responses are then used to compute a score, which ranges from -100 to 100. A higher BAI score indicates that a brand has more advocates who are likely to recommend the brand to others, while a lower score suggests that the brand has fewer advocates or even a higher number of detractors. BCG's research has shown that companies with higher BAI scores tend to experience higher growth rates and bett