Data Analysis Report

Prepared By: Zahiruddin Zahidanishah

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
import plotly.express as px
import plotly.graph_objects as go
import dataframe_image as dfi
import geopandas as gpd
import adjustText as aT
from mpl_toolkits.axes_grid1 import make_axes_locatable
#import plotly.offline as py
#%matplotlib notebook
#py.init_notebook_mode(connected=True)

Malaysia 14th General Election Results and Overview

Introduction

Malaysian 14th General Election was held on 9 May 2018. This report shows the overall result for the general election parlimentary seats as shown in this website, https://election.thestar.com.my/. This report will not shows the results for the state seats election, although the election was held concurrently. The objectives of this report is to shows the main results in the parlimentary seats for the whole Malaysia including the details in the states level. In the final section of this report will shows the prediction election results based on a selected criterias.

In [2]:
data = {'Year':[1959,1964,1969,1974,1978,1982,1986,1990,1995,1999,2004,2008,2013,2018,2023],
        'GE':['GE1','GE2','GE3','GE4','GE5','GE6','GE7','GE8','GE9','GE10','GE11','GE12','GE13','GE14','GE15'],
        'Dissolution':['27 Jun 1959','1 Mac 1964','20 Mac 1969','31 Jul 1974','12 Jun 1978','29 Mac 1982','19 Jul 1986','4 Oct 1990','6 Apr 1995','10 Nov 1999','4 Mac 2004','13 Feb 2008','3 Apr 2013','7 Apr 2018','TBA'],
        'Nomination':['15 Jul 1959','21 Mac 1964','5 Apr 1969','8 Aug 1974','21 Jun 1978','7 Apr 1982','24 Jul 1986','11 Oct 1990','15 Apr 1995','20 Nov 1999','13 Mac 2004','24 Feb 2008','20 Apr 2013','28 Apr 2018','TBA'],
        'Polling Date':['19 Aug 1959','25 Apr 1964','10 May 1969','24 Aug 1974','8 Jul 1978','22 Apr 1982','3 Aug 1986','21 Oct 1990','25 Apr 1995','29 Nov 1999','21 Mac 2004','8 Mac 2008','5 May 2013','9 May 2018','TBA'],
        'Day':['Wednesday','Saturday','Saturday','Saturday','Saturday','Thursday','Sunday','Sunday','Tuesday','Monday','Sunday','Sunday','Sunday','Wednesday','TBA']}
df_intro = pd.DataFrame(data,columns=['Year','GE','Dissolution','Nomination','Polling Date','Day'])
df_intro
Out[2]:
Year GE Dissolution Nomination Polling Date Day
0 1959 GE1 27 Jun 1959 15 Jul 1959 19 Aug 1959 Wednesday
1 1964 GE2 1 Mac 1964 21 Mac 1964 25 Apr 1964 Saturday
2 1969 GE3 20 Mac 1969 5 Apr 1969 10 May 1969 Saturday
3 1974 GE4 31 Jul 1974 8 Aug 1974 24 Aug 1974 Saturday
4 1978 GE5 12 Jun 1978 21 Jun 1978 8 Jul 1978 Saturday
5 1982 GE6 29 Mac 1982 7 Apr 1982 22 Apr 1982 Thursday
6 1986 GE7 19 Jul 1986 24 Jul 1986 3 Aug 1986 Sunday
7 1990 GE8 4 Oct 1990 11 Oct 1990 21 Oct 1990 Sunday
8 1995 GE9 6 Apr 1995 15 Apr 1995 25 Apr 1995 Tuesday
9 1999 GE10 10 Nov 1999 20 Nov 1999 29 Nov 1999 Monday
10 2004 GE11 4 Mac 2004 13 Mac 2004 21 Mac 2004 Sunday
11 2008 GE12 13 Feb 2008 24 Feb 2008 8 Mac 2008 Sunday
12 2013 GE13 3 Apr 2013 20 Apr 2013 5 May 2013 Sunday
13 2018 GE14 7 Apr 2018 28 Apr 2018 9 May 2018 Wednesday
14 2023 GE15 TBA TBA TBA TBA
In [3]:
df = pd.read_csv('ge14.csv')
df['Malay Votes']=df['Voter Turnout']*df['Malay']/100
df['Chinese Votes']=df['Voter Turnout']*df['Chinese']/100
df['Indian Votes']=df['Voter Turnout']*df['Indian']/100
df['Sabahan Votes']=df['Voter Turnout']*df['Sabahan']/100
df['Sarawakian Votes']=df['Voter Turnout']*df['Sarawakian']/100
df['Org Asli Votes']=df['Voter Turnout']*df['Org Asli']/100
df['Others Votes']=df['Voter Turnout']*df['Others']/100
#df.head(3)

Party Performance Based on the Winning Seats Against The Seats Contested

In [4]:
df_win = df.groupby('Party').sum()
df_win = df_win['Win']
df_win = df_win.rename_axis('Party').reset_index(name='Total Seats Win')
df_party = df['Party'].value_counts()
df_party = df_party.rename_axis('Party').reset_index(name='Total Seats Contested')
df_party = df_party.merge(df_win, on='Party')
df_party['Percentage Win'] = (df_party['Total Seats Win']/df_party['Total Seats Contested'])*100
df_party.round().sort_values('Percentage Win',ascending=False)
Out[4]:
Party Total Seats Contested Total Seats Win Percentage Win
32 PBRS 1 1 100.0
10 PBB 14 13 93.0
4 DAP 47 42 89.0
2 PKR 71 48 68.0
22 PDP 4 2 50.0
17 PRS 6 3 50.0
8 WARISAN 17 8 47.0
1 UMNO 120 54 45.0
6 AMANAH 34 11 32.0
21 UPKO 4 1 25.0
3 PPBM 52 12 23.0
12 MIC 9 2 22.0
18 PBS 5 1 20.0
15 SUPP 7 1 14.0
7 BEBAS 24 3 12.0
14 STAR 8 1 12.0
0 PAS 157 18 11.0
5 MCA 39 1 3.0
31 PBK 1 0 0.0
29 LDP 1 0 0.0
30 PAP 1 0 0.0
34 PCM 1 0 0.0
33 BERJASA 1 0 0.0
27 SAPP 2 0 0.0
35 MyPPP 1 0 0.0
36 IKATAN 1 0 0.0
28 PERPADUAN 1 0 0.0
19 MUP 5 0 0.0
26 PKAN 2 0 0.0
25 PEACE 2 0 0.0
24 PFP 2 0 0.0
23 PPRS 2 0 0.0
20 PSM 4 0 0.0
16 PRM 6 0 0.0
13 PCS 8 0 0.0
11 GERAKAN 11 0 0.0
9 PHRS 15 0 0.0
37 PBDSB 1 0 0.0
In [5]:
#dfi.export(df_party, 'df_party.png')
In [6]:
bar_plots = [
    go.Bar(x = df_party['Party'], y = df_party['Total Seats Contested'],name='Total Seats Contested'),
    go.Bar(x = df_party['Party'], y = df_party['Total Seats Win'],name='Total Seats Win'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Seats Contested VS Seats Win',x=0.5),
yaxis_title='Total Seats',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.show()

GE14 Total Votes By Races as per Party Candidate Win the Seats

In [7]:
df_race = df.groupby('Party').sum().reset_index()
bar_plots = [
    go.Bar(x = df_race['Party'], y = df_race['Malay Votes'],name='Malay Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Chinese Votes'],name='Chinese Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Indian Votes'],name='Indian Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Sabahan Votes'],name='Sabahan Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Sarawakian Votes'],name='Sarawakian Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Org Asli Votes'],name='Org Asli Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Others Votes'],name='Others Votes'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Race Composition By Party Win Seats',x=0.5),
yaxis_title='Total Votes',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(xaxis={'categoryorder':'total descending'})#barmode='stack', 
fig.show()
In [8]:
df_race = df.groupby('Party').sum().reset_index()
bar_plots = [
    go.Bar(x = df_race['Party'], y = df_race['Malay Votes'],name='Malay Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Chinese Votes'],name='Chinese Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Indian Votes'],name='Indian Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Sabahan Votes'],name='Sabahan Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Sarawakian Votes'],name='Sarawakian Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Org Asli Votes'],name='Org Asli Votes'),
    go.Bar(x = df_race['Party'], y = df_race['Others Votes'],name='Others Votes'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Race Composition By Party Win Seats',x=0.5),
yaxis_title='Total Votes',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [9]:
#df.groupby('State').sum()

Party with the Most Winning Seats

In [10]:
df_joh = df[df['State']=='Johor'].groupby('Party').sum().reset_index()
df_joh.rename(columns={'Win':'Johor'},inplace=True)
df_joh = df_joh.loc[:,['Party','Johor']]
df_ked = df[df['State']=='Kedah'].groupby('Party').sum().reset_index()
df_ked.rename(columns={'Win':'Kedah'},inplace=True)
df_ked = df_ked.loc[:,['Party','Kedah']]
df_kel = df[df['State']=='Kelantan'].groupby('Party').sum().reset_index()
df_kel.rename(columns={'Win':'Kelantan'},inplace=True)
df_kel = df_kel.loc[:,['Party','Kelantan']]
df_mel = df[df['State']=='Melaka'].groupby('Party').sum().reset_index()
df_mel.rename(columns={'Win':'Melaka'},inplace=True)
df_mel = df_mel.loc[:,['Party','Melaka']]
df_neg = df[df['State']=='Negeri Sembilan'].groupby('Party').sum().reset_index()
df_neg.rename(columns={'Win':'Negeri Sembilan'},inplace=True)
df_neg = df_neg.loc[:,['Party','Negeri Sembilan']]
df_pah = df[df['State']=='Pahang'].groupby('Party').sum().reset_index()
df_pah.rename(columns={'Win':'Pahang'},inplace=True)
df_pah = df_pah.loc[:,['Party','Pahang']]
df_pen = df[df['State']=='Penang'].groupby('Party').sum().reset_index()
df_pen.rename(columns={'Win':'Penang'},inplace=True)
df_pen = df_pen.loc[:,['Party','Penang']]
df_prk = df[df['State']=='Perak'].groupby('Party').sum().reset_index()
df_prk.rename(columns={'Win':'Perak'},inplace=True)
df_prk = df_prk.loc[:,['Party','Perak']]
df_per = df[df['State']=='Perlis'].groupby('Party').sum().reset_index()
df_per.rename(columns={'Win':'Perlis'},inplace=True)
df_per = df_per.loc[:,['Party','Perlis']]
df_sab = df[df['State']=='Sabah'].groupby('Party').sum().reset_index()
df_sab.rename(columns={'Win':'Sabah'},inplace=True)
df_sab = df_sab.loc[:,['Party','Sabah']]
df_sar = df[df['State']=='Sarawak'].groupby('Party').sum().reset_index()
df_sar.rename(columns={'Win':'Sarawak'},inplace=True)
df_sar = df_sar.loc[:,['Party','Sarawak']]
df_sel = df[df['State']=='Selangor'].groupby('Party').sum().reset_index()
df_sel.rename(columns={'Win':'Selangor'},inplace=True)
df_sel = df_sel.loc[:,['Party','Selangor']]
df_ter = df[df['State']=='Terengganu'].groupby('Party').sum().reset_index()
df_ter.rename(columns={'Win':'Terengganu'},inplace=True)
df_ter = df_ter.loc[:,['Party','Terengganu']]
df_wil = df[df['State']=='WP'].groupby('Party').sum().reset_index()
df_wil.rename(columns={'Win':'Wilayah Persekutuan'},inplace=True)
df_wil = df_wil.loc[:,['Party','Wilayah Persekutuan']]

df_states = df_joh.merge(df_ked, on='Party',how='outer')
df_states = df_states.merge(df_kel, on='Party',how='outer')
df_states = df_states.merge(df_mel, on='Party',how='outer')
df_states = df_states.merge(df_neg, on='Party',how='outer')
df_states = df_states.merge(df_pah, on='Party',how='outer')
df_states = df_states.merge(df_pen, on='Party',how='outer')
df_states = df_states.merge(df_prk, on='Party',how='outer')
df_states = df_states.merge(df_per, on='Party',how='outer')
df_states = df_states.merge(df_sab, on='Party',how='outer')
df_states = df_states.merge(df_sar, on='Party',how='outer')
df_states = df_states.merge(df_sel, on='Party',how='outer')
df_states = df_states.merge(df_ter, on='Party',how='outer')
df_states = df_states.merge(df_wil, on='Party',how='outer')

#df_states.loc['Total (States)']= df_states.sum(numeric_only=True, axis=0)
df_states.loc[:,'Total'] = df_states.sum(numeric_only=True, axis=1)
df_states['Percentage']=((df_states['Total']/df_states['Total'].sum())*100).round(1)
df_states.fillna(0).sort_values('Total', ascending=False)
Out[10]:
Party Johor Kedah Kelantan Melaka Negeri Sembilan Pahang Penang Perak Perlis Sabah Sarawak Selangor Terengganu Wilayah Persekutuan Total Percentage
5 UMNO 7.0 2.0 5.0 2.0 3.0 8.0 2.0 10.0 2.0 7.0 0.0 2.0 2.0 2.0 54.0 24.3
3 PKR 7.0 6.0 0.0 2.0 1.0 2.0 4.0 3.0 1.0 3.0 4.0 11.0 0.0 4.0 48.0 21.6
1 DAP 5.0 0.0 0.0 1.0 2.0 2.0 7.0 7.0 0.0 3.0 6.0 4.0 0.0 5.0 42.0 18.9
6 PAS 0.0 3.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6.0 0.0 18.0 8.1
14 PBB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 0.0 13.0 5.9
4 PPBM 5.0 3.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 12.0 5.4
0 AMANAH 1.0 1.0 0.0 0.0 1.0 1.0 0.0 2.0 0.0 0.0 0.0 5.0 0.0 0.0 11.0 5.0
12 WARISAN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0 0.0 0.0 0.0 0.0 8.0 3.6
16 PRS 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 3.0 1.4
13 BEBAS 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 1.0 3.0 1.4
7 MIC 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.9
15 PDP 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 2.0 0.9
9 PBS 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.5
11 UPKO 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.5
10 STAR 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.5
8 PBRS 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.5
2 MCA 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5
17 SUPP 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.5
In [11]:
bar_plots = [
    go.Bar(x = df_states['Party'], y = df_states['Total'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Total Seat Win Based On Party',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [12]:
bar_plots = [
    go.Bar(x = df_states['Party'], y = df_states['Johor'],name='Johor'),
    go.Bar(x = df_states['Party'], y = df_states['Kedah'],name='Kedah'),
    go.Bar(x = df_states['Party'], y = df_states['Kelantan'],name='Kelantan'),
    go.Bar(x = df_states['Party'], y = df_states['Melaka'],name='Melaka'),
    go.Bar(x = df_states['Party'], y = df_states['Negeri Sembilan'],name='Negeri Sembilan'),
    go.Bar(x = df_states['Party'], y = df_states['Pahang'],name='Pahang'),
    go.Bar(x = df_states['Party'], y = df_states['Penang'],name='Penang'),
    go.Bar(x = df_states['Party'], y = df_states['Perak'],name='Perak'),
    go.Bar(x = df_states['Party'], y = df_states['Perlis'],name='Perlis'),
    go.Bar(x = df_states['Party'], y = df_states['Sabah'],name='Sabah'),
    go.Bar(x = df_states['Party'], y = df_states['Sarawak'],name='Sarawak'),
    go.Bar(x = df_states['Party'], y = df_states['Selangor'],name='Selangor'),
    go.Bar(x = df_states['Party'], y = df_states['Terengganu'],name='Terengganu'),
    go.Bar(x = df_states['Party'], y = df_states['Wilayah Persekutuan'],name='Wilayah Persekutuan'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Total Seats Win By Each Party At Respective States',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_yaxes(range=[0,14])
fig.update_layout(xaxis={'categoryorder':'total descending'})
fig.show()
In [13]:
bar_plots = [
    go.Bar(x = df_states['Party'], y = df_states['Johor'],name='Johor'),
    go.Bar(x = df_states['Party'], y = df_states['Kedah'],name='Kedah'),
    go.Bar(x = df_states['Party'], y = df_states['Kelantan'],name='Kelantan'),
    go.Bar(x = df_states['Party'], y = df_states['Melaka'],name='Melaka'),
    go.Bar(x = df_states['Party'], y = df_states['Negeri Sembilan'],name='Negeri Sembilan'),
    go.Bar(x = df_states['Party'], y = df_states['Pahang'],name='Pahang'),
    go.Bar(x = df_states['Party'], y = df_states['Penang'],name='Penang'),
    go.Bar(x = df_states['Party'], y = df_states['Perak'],name='Perak'),
    go.Bar(x = df_states['Party'], y = df_states['Perlis'],name='Perlis'),
    go.Bar(x = df_states['Party'], y = df_states['Sabah'],name='Sabah'),
    go.Bar(x = df_states['Party'], y = df_states['Sarawak'],name='Sarawak'),
    go.Bar(x = df_states['Party'], y = df_states['Selangor'],name='Selangor'),
    go.Bar(x = df_states['Party'], y = df_states['Terengganu'],name='Terengganu'),
    go.Bar(x = df_states['Party'], y = df_states['Wilayah Persekutuan'],name='Wilayah Persekutuan'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Total Seats Win By Each Party At Respective States',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_yaxes(range=[0,60])
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [14]:
bar_plots = [
    go.Bar(x = df_joh['Party'], y = df_joh['Johor'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Johor Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [15]:
bar_plots = [
    go.Bar(x = df_ked['Party'], y = df_ked['Kedah'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Kedah Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [16]:
bar_plots = [
    go.Bar(x = df_kel['Party'], y = df_kel['Kelantan'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Kelantan Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [17]:
bar_plots = [
    go.Bar(x = df_mel['Party'], y = df_mel['Melaka'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Melaka Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.update_layout(title_x=0.5)
fig.show()
In [18]:
bar_plots = [
    go.Bar(x = df_neg['Party'], y = df_neg['Negeri Sembilan'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Negeri Sembilan Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [19]:
bar_plots = [
    go.Bar(x = df_pah['Party'], y = df_pah['Pahang'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Pahang Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [20]:
bar_plots = [
    go.Bar(x = df_pen['Party'], y = df_pen['Penang'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Penang Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [21]:
bar_plots = [
    go.Bar(x = df_prk['Party'], y = df_prk['Perak'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Perak Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [22]:
bar_plots = [
    go.Bar(x = df_sab['Party'], y = df_sab['Sabah'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Sabah Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [23]:
bar_plots = [
    go.Bar(x = df_sar['Party'], y = df_sar['Sarawak'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Sarawak Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [24]:
bar_plots = [
    go.Bar(x = df_sel['Party'], y = df_sel['Selangor'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Selangor Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [25]:
bar_plots = [
    go.Bar(x = df_ter['Party'], y = df_ter['Terengganu'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Terengganu Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [26]:
bar_plots = [
    go.Bar(x = df_wil['Party'], y = df_wil['Wilayah Persekutuan'],name='Total Seats Win')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Wilayah Persekutuan Parlimentary Seats Results',x=1),
yaxis_title='Total Seats Win',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig.show()
In [27]:
df_joh = df_joh.iloc[df_joh['Johor'].idxmax()]
df_ked = df_ked.iloc[df_ked['Kedah'].idxmax()]
df_kel = df_kel.iloc[df_kel['Kelantan'].idxmax()]
df_mel = df_mel.iloc[df_mel['Melaka'].idxmax()]
df_neg = df_neg.iloc[df_neg['Negeri Sembilan'].idxmax()]
df_pah = df_pah.iloc[df_pah['Pahang'].idxmax()]
df_pen = df_pen.iloc[df_pen['Penang'].idxmax()]
df_prk = df_prk.iloc[df_prk['Perak'].idxmax()]
df_per = df_per.iloc[df_per['Perlis'].idxmax()]
df_sab = df_sab.iloc[df_sab['Sabah'].idxmax()]
df_sar = df_sar.iloc[df_sar['Sarawak'].idxmax()]
df_sel = df_sel.iloc[df_sel['Selangor'].idxmax()]
df_ter = df_ter.iloc[df_ter['Terengganu'].idxmax()]
df_wil=df_wil.iloc[df_wil['Wilayah Persekutuan'].idxmax()]

data = {'States':['Johor','Kedah','Kelantan','Kuala Lumpur','Malacca','Negeri Sembilan','Pahang','Penang','Perak',
                  'Perlis','Sabah','Sarawak','Selangor','Terengganu'],
       'Majority Party Win':[df_joh[0],df_ked[0],df_kel[0],df_wil[0],df_mel[0],df_neg[0],df_pah[0],df_pen[0],
                        df_prk[0],df_per[0],df_sab[0],df_sar[0],df_sel[0],df_ter[0]]}
df_map = pd.DataFrame(data,columns=['States','Majority Party Win'])
df_map.rename(columns={'States':'name'},inplace=True)

Malaysia Map Showing the States with Majority Party Win in the election

In [28]:
mas = gpd.read_file(r'/Users/zahiruddinzahidanishah/Google Drive/Python/Geopandas/Malaysia_Polygon.shp')
mas = mas.merge(df_map, on='name')
mas["center"] = mas["geometry"].centroid
mas_points = mas.copy()
mas_points.set_geometry("center", inplace = True)
fig, ax = plt.subplots(1,figsize=(16,9))
mas.plot(column='Majority Party Win',ax=ax,cmap='Set2',edgecolor='black',linewidth=0.5,
         legend=True,legend_kwds={'loc': 'lower right'}).set_facecolor('w')
#ax.set_axis_off()
texts = []
for x, y, label in zip(mas_points.geometry.x, mas_points.geometry.y, mas_points["name"]):
    texts.append(plt.text(x, y, label, fontsize = 9,color='black'))
aT.adjust_text(texts, force_points=0.3, force_text=0.8, expand_points=(1,1), expand_text=(1,1), 
               arrowprops=dict(arrowstyle="-", color='grey', lw=0.5))
ax.set_title("Majority Party Win", fontsize=30, color='black')
ax.set_xlabel('longitude', color='black')
ax.set_ylabel('latitude', color='black')
ax.spines['top'].set_visible(True)
ax.spines['right'].set_visible(True)

Voter Race Composition

In [29]:
data = {'Races':['Malay','Chinese','Indian','Sabahan','Sarawakian','Orang Asli','Others'],
        'Total':[df['Malay Votes'].sum().round(),df['Chinese Votes'].sum().round(),df['Indian Votes'].sum().round(),
                df['Sabahan Votes'].sum().round(),df['Sarawakian Votes'].sum().round(),
                 df['Org Asli Votes'].sum().round(),df['Others Votes'].sum().round()]}
df_voter = pd.DataFrame(data,columns=['Races','Total'])
df_voter['Percentage']=((df_voter['Total']/df_voter['Total'].sum())*100).round(1)
#df_voter.loc['Total']= df_voter.sum(numeric_only=True, axis=0)
df_voter
Out[29]:
Races Total Percentage
0 Malay 6703533.0 56.8
1 Chinese 3379869.0 28.6
2 Indian 861274.0 7.3
3 Sabahan 279432.0 2.4
4 Sarawakian 436430.0 3.7
5 Orang Asli 62041.0 0.5
6 Others 86421.0 0.7
In [30]:
bar_plots = [
    go.Bar(x = df_voter['Races'], y = df_voter['Total'],name='Total Voters')
]
layout = go.Layout(
title=go.layout.Title(text='GE14: Total Voters Compositions',x=1),
yaxis_title='Total Voters',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.show()

Election Predictions

Check dataset conditions on null values, duplicated values and overall dimension

In [31]:
print(df.isnull().values.any())
print(df.duplicated().values.any())
print(df.shape)
True
True
(687, 25)

Modified the dataset to ensure that the non values is removed

In [32]:
df.head(3)
Out[32]:
Parliament ID State District Party Coalition Win Candidate Votes Registered Voters Majority Spoilt votes ... Sarawakian Org Asli Others Malay Votes Chinese Votes Indian Votes Sabahan Votes Sarawakian Votes Org Asli Votes Others Votes
0 P001 Perlis Padang Besar UMNO BN 1 15032 46096.0 1438.0 780.0 ... 0.68 0.02 3.07 32419.8552 3163.004 329.4016 112.296 254.5376 7.4864 1149.1624
1 NaN NaN NaN PPBM PH 0 13594 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN PAS PAS 0 7874 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

3 rows × 25 columns

In [33]:
df= df[df['Win'] != 0]
df.head(3).round()
Out[33]:
Parliament ID State District Party Coalition Win Candidate Votes Registered Voters Majority Spoilt votes ... Sarawakian Org Asli Others Malay Votes Chinese Votes Indian Votes Sabahan Votes Sarawakian Votes Org Asli Votes Others Votes
0 P001 Perlis Padang Besar UMNO BN 1 15032 46096.0 1438.0 780.0 ... 1.0 0.0 3.0 32420.0 3163.0 329.0 112.0 255.0 7.0 1149.0
3 P002 Perlis Kangar PKR PH 1 20909 55938.0 5603.0 806.0 ... 0.0 0.0 1.0 37344.0 6961.0 768.0 32.0 41.0 0.0 558.0
6 P003 Perlis Arau UMNO BN 1 16547 48187.0 4856.0 690.0 ... 0.0 0.0 3.0 35444.0 3097.0 667.0 36.0 28.0 4.0 1160.0

3 rows × 25 columns

In [34]:
print(df.isnull().values.any())
print(df.duplicated().values.any())
print(df.shape)
False
False
(222, 25)
In [35]:
df.columns
Out[35]:
Index(['Parliament ID', 'State', 'District', 'Party', 'Coalition', 'Win',
       'Candidate Votes', 'Registered Voters', 'Majority', 'Spoilt votes',
       'Voter Turnout', 'Malay', 'Chinese', 'Indian', 'Sabahan', 'Sarawakian',
       'Org Asli', 'Others', 'Malay Votes', 'Chinese Votes', 'Indian Votes',
       'Sabahan Votes', 'Sarawakian Votes', 'Org Asli Votes', 'Others Votes'],
      dtype='object')

Election Prediction Using Logistic Regression Methods

In [36]:
X = df[['Win','Candidate Votes', 'Registered Voters', 'Majority', 'Spoilt votes',
       'Voter Turnout', 'Malay Votes', 'Chinese Votes', 'Indian Votes',
       'Sabahan Votes', 'Sarawakian Votes', 'Org Asli Votes', 'Others Votes']]
y = df['Party']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.35,random_state=0)
logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)
#confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
#sn.heatmap(confusion_matrix, annot=True)
print('Prediction table shows the prediction results with', round(metrics.accuracy_score(y_test, y_pred)*100,1),'% accuracy')
X_test['Prediction']=y_pred
#X_test.head()
Prediction table shows the prediction results with 64.1 % accuracy
In [37]:
df2 = df[['Win','Candidate Votes', 'Registered Voters', 'Majority', 'Spoilt votes',
       'Voter Turnout', 'Malay Votes', 'Chinese Votes', 'Indian Votes',
       'Sabahan Votes', 'Sarawakian Votes', 'Org Asli Votes', 'Others Votes']]
y_pred=logistic_regression.predict(df2)
df2['Prediction']=y_pred
#df2.head()
In [38]:
bar_plots = [
    go.Bar(x = df['Party'], y = df['Win'],name='GE14 Actual Result'),
    go.Bar(x = df2['Prediction'], y = df2['Win'],name='Prediction Result'),
]
layout = go.Layout(
title=go.layout.Title(text='GE14 Election Result Vs Predictions',x=0.5),
yaxis_title='Winning Seats',xaxis_tickmode='array')
fig = go.Figure(data=bar_plots, layout=layout)
fig.show()
In [39]:
df3 = df['Party'].value_counts().rename_axis('Party').reset_index(name='Win Actual')
df4 = df2['Prediction'].value_counts().rename_axis('Party').reset_index(name='Win Prediction')
result = df3.merge(df4, on=['Party'],how='outer')
result.loc['Total']= result.sum(numeric_only=True, axis=0)
result.fillna(0)
Out[39]:
Party Win Actual Win Prediction
0 UMNO 54.0 60.0
1 PKR 48.0 52.0
2 DAP 42.0 39.0
3 PAS 18.0 22.0
4 PBB 13.0 16.0
5 PPBM 12.0 5.0
6 AMANAH 11.0 11.0
7 WARISAN 8.0 7.0
8 PRS 3.0 0.0
9 BEBAS 3.0 0.0
10 PDP 2.0 6.0
11 MIC 2.0 1.0
12 MCA 1.0 0.0
13 STAR 1.0 1.0
14 UPKO 1.0 0.0
15 PBS 1.0 2.0
16 SUPP 1.0 0.0
17 PBRS 1.0 0.0
Total 0 222.0 222.0
In [ ]: