# ITS632 UCumberlands Week 3 Data Mining Video Analysis Help

Instructions

a. View each video.The questions are based on information in the videos.

c. You must use your own words, not the words of an online source or the words of another student, when writing an answer.

1.You are considering purchasing a local business that ships items to a variety of destinations.You begin to analyze the businesses data, but become suspicious about the data’s veracity including the number of shipments made each week.

a.How could you estimate the number of weekly shipments if the only data you are given is the Order ID of each shipment?

b.How can the Order ID be considered both nominal and ordinal?

2.Analyzing Zip Codes

http://www.structnet.com/instructions/zip_min_max_by_state.html

a.Use a Google search to determine the lowest and highest Zip code numbers for the State of Maine.

b.How many Zip codes are assigned to the State of Maine?

c.How did you calculate the answer for item (b)?

d.Determine the number of Zip codes for the State of Ohio.Show your calculations.

e.Determine the population density of Maine and the population of Ohio based on the number of Zip codes.

f.What type of attribute is Zip code?

g.What type of attribute is population?

3.Why is height considered a continuous attribute?

4.The Minnesota Department of Natural Resources (DNR) sells logging permits to loggers who want to fell trees on State land and sell the logs to mills.Loggers register with the State of Minnesota by supplying their company name and address.No logger is allowed to have more than six active permits at one time.

a.If you know the total number of active permits on February 1, 2019 and you know the total number of loggers who have active permits, how could you estimate the number of active permits each logger has?

b.What factors might result in significant data quality error in your estimate?

Hint:see the video for Chapter 2B

c.What data cleaning steps would you recommend to the Minnesota DNR regarding loggers?

5.In the United States presidential election of 1936, two different polling organizations made predictions about who would win the contest based on two disparate sampling sizes.One organization polled 2.4 million people, while the other organization polled only 50,000 people.The organization polling fewer people was accurate in its prediction of a winner.Research this historic case of sampling and explain the error that was made.

6.The following scatter plot shows amount of sleep needed per day based on age.

Explain the correlation between hours and age.

Week03 Chapter 2A – Types of Data

### WEEK 3 CH02B – DATA QUALITY

### SIMILARITY AND DISSIMILARITY

