1GB6023
Statistik Deskriptif
2Kuliah 2Meneroka data dgn graf dan rumusan
berangka
Obj... 1. Guna graf utk menerangkan data.
2. Kaedah berangka dalam merumuskan data. 3. Cross-tabulation
3 3
Jenis statistik
Statistik deskriptif Memerihalkan ciri sesuatu sampel Merumuskan dan menyusun bilangan data yang
banyak. Statistik inferensi Satu kaedah untuk membuat keputusan atau
penganggaran mengenai populasi berdasarkan keputusan yg didapati drp sampel.
4Parameter dan Statistik
Parameter adalah rumusan berangka bagi suatu populasi.
Statistik merupakan rumusan berangka bagi suatu sampel yang di ambil dari populasi.
5Pembolehubah
Pembolehubah adalah sebarang ciri yang direkodkan untuk subjek kajian.
6Variasi dalam Data
Termanalogi pembolehubah (variable) menunjukkan bahawa nilai bagi data yang diambil adalah berubah-ubah (vary).
7
Contoh : Statistik bagi pelajar KP2
Pembolehubah: Umur PNGK Major Status Perkahwinan
8
Pencerapan data
setiap pencerapan data boleh bersifat:
Kuantitatif
Kualitatif/ Kategori
9Pembolehubah kualitatif/kategori
setiap individu yang dicerap mempunyai satu set kategori
Contoh: Jantina (lelaki @ perempuan) Pegangan Agama (Muslim, Catholic, hindu, ) Tempat tinggal (Apt, Condo, ) Percaya hidup selepas mati (Ya atau Tidak )
10
Pembolehubah kuantitatif
hasil pencerapan adalah berbentuk angka.
Contoh: Umur Bilangan adik beradik Pendapatan Tahunan Jumlah tahun mendapat pendidikan formal
11
Rumusan Graf dan berangka penerangan ciri utama bagi sesuatu
pembolehubah
Bagi pembolehubah kuantitatif: ciri utama yang dilihat ialah titik tengah dan serakan.
Bagi pembolehubah kategori: ciri utama yang dilihat ialah peratusan untuk setiap kategori.
12
Jadual taburan frekuensi
satu kaedah menguruskan data
senaraikan semua nilai pembolehubah disamping bilangan nilai penerapan bagi setiap pembolehubah.
13
Contoh: Shark Attacks
14
Example: Shark Attacks
What is the variable?
Is it categorical or quantitative?
How is the proportion for Florida calculated?
How is the % for Florida calculated?
Example: Shark Attacks
15
Insights what the data tells us about shark attacks
Example: Shark Attacks
16
How Can We Describe Data Using Graphical Summaries?
17
Graphs for Categorical Data Pie Chart: A circle having a slice of
pie for each category
Bar Graph: A graph that displays a vertical bar for each category
18
Example: Sources of Electricity Use in the U.S. and Canada
19
Pie Chart
20
Bar Chart
21
Pie Chart vs. Bar Chart Which graph do you prefer? Why?
22
Graphs for Quantitative Data Dot Plot: shows a dot for each
observation
Stem-and-Leaf Plot: portrays the individual observations
Histogram: uses bars to portray the data
23
Example: Sodium and Sugar Amounts in Cereals
24
Dotplot for Sodium in Cereals Sodium Data: 0 210 260 125 220 290 210 140 220 200 125
170 250 150 170 70 230 200 290 180
25
Stem-and-Leaf Plot for Sodium in Cereal
Sodium Data: 0 210
260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
26
Frequency TableSodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
27
Histogram for Sodium in Cereals
28
Which Graph? Dot-plot and stem-and-leaf plot:
More useful for small data sets Data values are retained
Histogram More useful for large data sets Most compact display More flexibility in defining intervals
29
Shape of a Distribution Overall pattern
Clusters? Outliers? Symmetric? Skewed? Unimodal? Bimodal?
30
Symmetric or Skewed ?
31
Example: Hours of TV Watching
32
Identify the minimum and maximum sugar values:
33
Consider a data set containing IQ scores for the general public:
What shape would you expect a histogram of this data set to have?
a. Symmetric b. Skewed to the left c. Skewed to the right d. Bimodal
34
Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly:
What shape would you expect a histogram of this data set to have?
a. Symmetric b. Skewed to the left c. Skewed to the right d. Bimodal
35
How Can We describe the Center of Quantitative Data?
36
Mean
The sum of the observations divided by the number of observations
x = xn
37
Median
The midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest)
38
Find the mean and median
CO2 Pollution levels in 8 largest nations measured in metric tons per person:
2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 a. Mean = 4.6 Median = 1.5 b. Mean = 4.6 Median = 5.8 c. Mean = 1.5 Median = 4.6
39
Outlier An observation that falls well above or
below the overall set of data
The mean can be highly influenced by an outlier
The median is resistant: not affected by an outlier
40
Mode
The value that occurs most frequently.
The mode is most often used with categorical data
41
Perbandingan di antara min, median dan mod
Kesesuaian
Skala pengukuran Min Median Mod
Nominal tidak tidak ya
Ordinal tidak ya ya
Interval ya ya ya
Ratio ya ya ya
42
How Can We Describe the Spread of Quantitative Data?
43
Measuring Spread: Range
Range: difference between the largest and smallest observations
44
Measuring Spread: Standard Deviation
Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations
s = (x x)n 1
45
How Can Measures of Position Describe Spread?
46
Quartiles
Splits the data into four parts The median is the second quartile, Q2 The first quartile, Q1, is the median of the lower
half of the observations The third quartile, Q3, is the median of the
upper half of the observations
47
Measuring Spread: Interquartile Range
The interquartile range is the distance between the third quartile and first quartile:
IQR = Q3 Q1
48
Detecting Potential Outliers
An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile
49
The Five-Number Summary
The five number summary of a dataset:
Minimum value First Quartile Median Third Quartile Maximum value
50
Boxplot
A box is constructed from Q1 to Q3
A line is drawn inside the box at the median
A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier
A line extends outward from the upper end of the box to the largest observation that is not a potential outlier
51
Boxplot for Sodium DataSodium Data:
0 200 Five Number Summary: 70 210 125 210 Min: 0 125 220 Q1: 145 140 220 Med: 200 150 230 Q3: 225 170 250 Max: 290 170 260 180 290 200 290
52
Boxplot for Sodium in Cereals
Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180
Top Related