[Clug-l] Logistic Regression Modelling

Murphy, Emma Emma.Murphy at edfenergy.com
Wed Aug 22 05:52:10 EDT 2007


Hi

 

I am running my first model in Clementine and have a few questions....

 

 

*	Do you Balance the data before partitioning it into Training &
Testing Data sets or Partition the data before balancing it? 

	*	Can you please tell me why 

 

 

*	What proportion do you have in your train/test - I have 75/25
but have no evidence as to why this is so 

	*	Can you please let me know whey this is so 

 

 

*	Is it necessary to have a validation data set as well? 

 

The following question is more of a data question really...

 

*         If I am aggregating data from account level to premise level
and, for example, there are two accounts with the same Method of Payment
= DD, I have created an ordinal flag called MOP_DD = 2 . However, if the
two accounts have differing Methods of payment, for example, one account
pays by Direct Debit and one by cheque (at the same premise), I have
made MOP_DD=1 and MOP_CHQ=1.  

 

 

My reasoning behind this is that if the person is a DD customer in both
of the account instances the '2' will give more predictiveness in the
model - opposed to simply have binary 'Yes' & 'No'. 

 

Do you agree with my theory - or do you think simply having a binary
field with a 1/0 regardless of how many occurrences the customer has
will be just as predictive?

 

Regards

Emma

This e-mail and any files transmitted with it are 
confidential and may be protected by legal privilege. 
If you are not the intended recipient, please notify 
the sender and delete the e-mail from your system. 
This e-mail has been scanned for malicious content but 
the internet is inherently insecure and EDF Energy plc 
cannot accept any liability for the integrity of this 
message or its attachments. No employee or agent of EDF 
Energy plc or any related company is authorised to 
conclude any binding agreement on behalf of EDF Energy 
plc or any related company by e-mail. 

All e-mails sent and received by EDF Energy plc are 
monitored to ensure compliance with the company's 
information security policy.  Executable and script 
files are not permitted through the EDF Energy plc mail 
gateway.  EDF Energy does not accept or send mails above 
30 Mb in size.

EDF Energy plc
Registered in England and Wales No. 2366852
Registered Office: 40 Grosvenor Place, London SW1X 7EN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cammlist1.spss.com/pipermail/clug-l/attachments/20070822/16a0e276/attachment.html


More information about the Clug-l mailing list