Skip to main content

Lab – More Hive Queries and Hive Built-In functions

Objectives

This lab builds on the last lab. We will analyse the data uploaded to Hive using some HQL queries

Deliverable

Screenshot of each query results. One for each query. Each query should be fully displayed. No need to show the entire query result. Partial display of the result is sufficient.
  1. Open the Hive query editor in Cloudera
  2. Select all column from the salesdata table
  3. Select orderdate, salesamount and rowid from salesdata table and show 100 records.
  4. Modify the above step to show orderdate column heading as OrderDate, salesamount as Sales, and rowed as RowNum
  5. Create a new hive query to find OrderMonth, the total number of orders, total sales, average sales, minimum sales and maximum sales for each month. Sort so that newer sales statistics are at the top.  
  6. Modify the query you created for question 5 to show breakdown for each category
  7. Use GROUPING SETS to modify the above query to show the statistics for each category and for each ordermonth.
  8. Create a new query to extract ordered, orderdate, quantity, rate, discountpct, QuoteAmt and QuoteAmtRound.
    1. QuoteAmt = quantity*rate*(1-discountpct)
    2. QuoteAmtRound = quantity*rate*(1-discountpct)
Select orderid, orderdate, quantity, rate, discountpct,
   quantity*rate*(1-discountpct) as QuoteAmt,
   round(quantity*rate*(1-discountpct)) as QuoteAmtRound
from salesdata


  1. Use rand, floor and ceiling function to simulate new data. Create a query that has a random number, salesamount, ordered, wagemargin, Random sales amount, wage margin floor and wage margin ceiling column.
select rand(), saleamount, orderid, wagemargin,
   round(saleamount*rand()) as RandSaleAmount,
   floor(wagemargin) as WageMarginFlr,
   ceiling(wagemargin) as WageMarginCl
from salesdata


  1. Using the datediff function find the difference between two dates.  Create a query that has a ordered, productcategory, productsubcategory, orderdate, projectcompletedate, and duration.
Select orderid, productcategory, productsubcategory,
   orderdate, projectcompletedate,
   datediff(projectcompletedate, orderdate) as duration
from salesData


  1. Generate some statistics by month using year, month, and avg functions. Create a query that has productcategory,  productsubcategory, yearofOrder, monthofOrder, and Average duration.
select
   productcategory, productsubcategory,
   year(orderdate) year, month(orderdate) month,
   avg(datediff(projectcompletedate, orderdate)) Avgduration
from salesdata
group by    
   productcategory,productsubcategory,
   year(orderdate), month(orderdate)
order by
   3,4
  1. Find the last day of the month using data_sub(), to_date(), cast(), and concat() functions.
select distinct
date_sub(to_date(
concat(cast(year(orderdate) as string),"-",cast(month(orderdate)+1 as string),"-01")
)
,1 ),
orderdate
from salesdata
limit 100;


  1. Identify large sales with IF function.  
select orderid, saleamount,
if(saleamount > 5000, 1, 0) as LargeSale
from salesdata
limit 1000;


  1. Create sales size categories  
select orderid, saleamount,
   case
       when saleamount > 5000 then 'large'
       when saleamount > 1000 then 'medium'
       else 'small'
   end as SalesSize
from salesdata
limit 1000;


  1. Perform analysis by reassigning regions. Find yearly total sales for regions - southwest(south and west), east,  and central.
select  
case
    when lower(region) = 'west' then 'southwest'
    when lower(region) = 'south' then 'southwest'
    else region
end as newregion,
year(orderdate) as y,
sum(saleamount) as TotalSales
from salesdata
group by
case
    when lower(region) = 'west' then 'southwest'
    when lower(region) = 'south' then 'southwest'
    else region
end,
year(orderdate)
limit 100;


Comments

Popular posts from this blog

CUMIPMT and CUMPRINC function

CUMIPMT Cumulative interest payment function allows you to calculate the interest paid for a loan or from an investment from period A to period B. When getting a loan, CUMIPMT function can be used to calculate the total amount of interest paid in the first five months or from period 12 to period 20. A period can be a month, a week or two week. Loan Amount : 350,000.00 APR: 4.5% Down payment: 0.00 Years: 25 Payment per year: 12 From the above data, we can calculate the following: No of Period: 25 × 12 = 300 Periodic Rate: 4.5/12 = 0.375% Here is how you will substitute these values into the function. = CUMIPMT (periodic rate, No of period, vehicle price, start period, end period,  ) = CUMIPMT (0.375, 300, 350000, 1, 5, 0) In an excel worksheet, we use cell address instead of actual values as shown below: Here is the formula view of the worksheet: CUMPRINC Another related function is CUMPRINC. CUMPRINC function is used to calculate cumul

Excel PMT Function

PMT function is very useful for calculating monthly payment required to payback a loan or mortgage at a fixed rate. This function require a minimum of three inputs, periodic rate, number of periods, present value or the loan amount. Here is a simple example. Home Loan: 350,000.00 Interest rate: 4.5% Number of years to repay the loan: 25 Note: To calculate monthly payment, we need to find the monthly rate and number of months as shown above. Then it is simply a matter of substituting the values into the payment function, as shown in the formula view below.

BCG's Brand Advocacy Index

The Boston Consulting Group's (BCG) Brand Advocacy Index (BAI) is a metric developed to help companies measure the degree of customer advocacy for their brands. BAI focuses on the likelihood of customers to recommend a brand to others, which is a powerful indicator of brand strength and customer loyalty. Unlike other customer satisfaction or loyalty metrics, BAI emphasizes the importance of customer referrals and word-of-mouth marketing. BAI is calculated based on a survey where customers are asked about their willingness to recommend a brand to their friends, family, or colleagues. The responses are then used to compute a score, which ranges from -100 to 100. A higher BAI score indicates that a brand has more advocates who are likely to recommend the brand to others, while a lower score suggests that the brand has fewer advocates or even a higher number of detractors. BCG's research has shown that companies with higher BAI scores tend to experience higher growth rates and bett