STATISTIK

52
1 GB6023 Statistik Deskriptif

description

STATISTIK

Transcript of STATISTIK

  • 1GB6023

    Statistik Deskriptif

  • 2Kuliah 2Meneroka data dgn graf dan rumusan

    berangka

    Obj... 1. Guna graf utk menerangkan data.

    2. Kaedah berangka dalam merumuskan data. 3. Cross-tabulation

  • 3 3

    Jenis statistik

    Statistik deskriptif Memerihalkan ciri sesuatu sampel Merumuskan dan menyusun bilangan data yang

    banyak. Statistik inferensi Satu kaedah untuk membuat keputusan atau

    penganggaran mengenai populasi berdasarkan keputusan yg didapati drp sampel.

  • 4Parameter dan Statistik

    Parameter adalah rumusan berangka bagi suatu populasi.

    Statistik merupakan rumusan berangka bagi suatu sampel yang di ambil dari populasi.

  • 5Pembolehubah

    Pembolehubah adalah sebarang ciri yang direkodkan untuk subjek kajian.

  • 6Variasi dalam Data

    Termanalogi pembolehubah (variable) menunjukkan bahawa nilai bagi data yang diambil adalah berubah-ubah (vary).

  • 7

    Contoh : Statistik bagi pelajar KP2

    Pembolehubah: Umur PNGK Major Status Perkahwinan

  • 8

    Pencerapan data

    setiap pencerapan data boleh bersifat:

    Kuantitatif

    Kualitatif/ Kategori

  • 9Pembolehubah kualitatif/kategori

    setiap individu yang dicerap mempunyai satu set kategori

    Contoh: Jantina (lelaki @ perempuan) Pegangan Agama (Muslim, Catholic, hindu, ) Tempat tinggal (Apt, Condo, ) Percaya hidup selepas mati (Ya atau Tidak )

  • 10

    Pembolehubah kuantitatif

    hasil pencerapan adalah berbentuk angka.

    Contoh: Umur Bilangan adik beradik Pendapatan Tahunan Jumlah tahun mendapat pendidikan formal

  • 11

    Rumusan Graf dan berangka penerangan ciri utama bagi sesuatu

    pembolehubah

    Bagi pembolehubah kuantitatif: ciri utama yang dilihat ialah titik tengah dan serakan.

    Bagi pembolehubah kategori: ciri utama yang dilihat ialah peratusan untuk setiap kategori.

  • 12

    Jadual taburan frekuensi

    satu kaedah menguruskan data

    senaraikan semua nilai pembolehubah disamping bilangan nilai penerapan bagi setiap pembolehubah.

  • 13

    Contoh: Shark Attacks

  • 14

    Example: Shark Attacks

    What is the variable?

    Is it categorical or quantitative?

    How is the proportion for Florida calculated?

    How is the % for Florida calculated?

    Example: Shark Attacks

  • 15

    Insights what the data tells us about shark attacks

    Example: Shark Attacks

  • 16

    How Can We Describe Data Using Graphical Summaries?

  • 17

    Graphs for Categorical Data Pie Chart: A circle having a slice of

    pie for each category

    Bar Graph: A graph that displays a vertical bar for each category

  • 18

    Example: Sources of Electricity Use in the U.S. and Canada

  • 19

    Pie Chart

  • 20

    Bar Chart

  • 21

    Pie Chart vs. Bar Chart Which graph do you prefer? Why?

  • 22

    Graphs for Quantitative Data Dot Plot: shows a dot for each

    observation

    Stem-and-Leaf Plot: portrays the individual observations

    Histogram: uses bars to portray the data

  • 23

    Example: Sodium and Sugar Amounts in Cereals

  • 24

    Dotplot for Sodium in Cereals Sodium Data: 0 210 260 125 220 290 210 140 220 200 125

    170 250 150 170 70 230 200 290 180

  • 25

    Stem-and-Leaf Plot for Sodium in Cereal

    Sodium Data: 0 210

    260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180

  • 26

    Frequency TableSodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180

  • 27

    Histogram for Sodium in Cereals

  • 28

    Which Graph? Dot-plot and stem-and-leaf plot:

    More useful for small data sets Data values are retained

    Histogram More useful for large data sets Most compact display More flexibility in defining intervals

  • 29

    Shape of a Distribution Overall pattern

    Clusters? Outliers? Symmetric? Skewed? Unimodal? Bimodal?

  • 30

    Symmetric or Skewed ?

  • 31

    Example: Hours of TV Watching

  • 32

    Identify the minimum and maximum sugar values:

  • 33

    Consider a data set containing IQ scores for the general public:

    What shape would you expect a histogram of this data set to have?

    a. Symmetric b. Skewed to the left c. Skewed to the right d. Bimodal

  • 34

    Consider a data set of the scores of students on a very easy exam in which most score very well but a few score very poorly:

    What shape would you expect a histogram of this data set to have?

    a. Symmetric b. Skewed to the left c. Skewed to the right d. Bimodal

  • 35

    How Can We describe the Center of Quantitative Data?

  • 36

    Mean

    The sum of the observations divided by the number of observations

    x = xn

  • 37

    Median

    The midpoint of the observations when they are ordered from the smallest to the largest (or from the largest to the smallest)

  • 38

    Find the mean and median

    CO2 Pollution levels in 8 largest nations measured in metric tons per person:

    2.3 1.1 19.7 9.8 1.8 1.2 0.7 0.2 a. Mean = 4.6 Median = 1.5 b. Mean = 4.6 Median = 5.8 c. Mean = 1.5 Median = 4.6

  • 39

    Outlier An observation that falls well above or

    below the overall set of data

    The mean can be highly influenced by an outlier

    The median is resistant: not affected by an outlier

  • 40

    Mode

    The value that occurs most frequently.

    The mode is most often used with categorical data

  • 41

    Perbandingan di antara min, median dan mod

    Kesesuaian

    Skala pengukuran Min Median Mod

    Nominal tidak tidak ya

    Ordinal tidak ya ya

    Interval ya ya ya

    Ratio ya ya ya

  • 42

    How Can We Describe the Spread of Quantitative Data?

  • 43

    Measuring Spread: Range

    Range: difference between the largest and smallest observations

  • 44

    Measuring Spread: Standard Deviation

    Creates a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations

    s = (x x)n 1

  • 45

    How Can Measures of Position Describe Spread?

  • 46

    Quartiles

    Splits the data into four parts The median is the second quartile, Q2 The first quartile, Q1, is the median of the lower

    half of the observations The third quartile, Q3, is the median of the

    upper half of the observations

  • 47

    Measuring Spread: Interquartile Range

    The interquartile range is the distance between the third quartile and first quartile:

    IQR = Q3 Q1

  • 48

    Detecting Potential Outliers

    An observation is a potential outlier if it falls more than 1.5 x IQR below the first quartile or more than 1.5 x IQR above the third quartile

  • 49

    The Five-Number Summary

    The five number summary of a dataset:

    Minimum value First Quartile Median Third Quartile Maximum value

  • 50

    Boxplot

    A box is constructed from Q1 to Q3

    A line is drawn inside the box at the median

    A line extends outward from the lower end of the box to the smallest observation that is not a potential outlier

    A line extends outward from the upper end of the box to the largest observation that is not a potential outlier

  • 51

    Boxplot for Sodium DataSodium Data:

    0 200 Five Number Summary: 70 210 125 210 Min: 0 125 220 Q1: 145 140 220 Med: 200 150 230 Q3: 225 170 250 Max: 290 170 260 180 290 200 290

  • 52

    Boxplot for Sodium in Cereals

    Sodium Data: 0 210 260 125 220 290 210 140 220 200 125 170 250 150 170 70 230 200 290 180