联系方式

您当前位置:首页 >> Python编程Python编程

日期:2024-04-28 08:52

FTEC2101/ESTR2520 Optimization Methods Spring 2024

Project Specification – Binary Classification for Bankruptcy Detection

Last Updated: April 4, 2024, Deadline: May 10, 2024, 23:59 (HKT)

Thus far we have learnt a number of optimization methods, ranging from the simplex method for linear

programming, modeling techniques for integer programming, gradient/Newton methods for unconstrained

optimization, KKT conditions, SOCP formulation, etc.. Besides theories, please remember that optimization

methods are practical tools for solving real world problems. The aim of this project is to practice our skill on

the latter aspect. We will make use of the Julia1

environment.

The project’s theme is about practical solutions for binary classification. It can be considered as an extension to

what we have experimented in Computing Lab 2. We will explore additional aspects about binary classification

and more importantly, implement and compare these ideas on a real-world dataset.

Background. Let us recall that the binary classification task aims at learning a linear classifier that distinguishes a feature vector as positive (+1) or negative (−1). Let d ∈ N be the dimension of the feature vector,

the (linear) classifier is described by the tuple (w, b) where w ∈ R

d

is the direction parameters and b ∈ R is

the bias parameter2

, both specifying the linear model. Given (w, b), a feature vector x ∈ R

d

is classified into

a label y ∈ {±1} such that

y =

(

+1, if w⊤x + b = w1x1 + · · · + wdxd + b ≥ 0,

−1, if w⊤x + b = w1x1 + · · · + wdxd + b < 0.

(1.1)

As an illustration, the following figure shows a case with d = 2 such that the classifier is described by

w = (w1, w2) and the scalar b:


Fig. 1. Example dataset.

According to the rule in (1.1), all the feature vectors that lie above the dashed line shall be classified as +1

(blue points); those that lie below the dashed line shall be classified as −1 (red points).

1As an alternative, you are welcomed to use Python with optimization modeling packages supporting SOCP and MI-NLP such

as cvxpy. However, you have to notify the instructor about the choice and the plan on how you wish to accomplish the project

in the latter case on or before May 1, 2024, i.e., two weeks before the deadline. In the project, you are not allowed to use any

pre-built solver for binary classification such as MLJ.jl, ScikitLearn.jl, flux.jl, scikit-learn in your final submission (though

you are encouraged to try out these packages as extra work to support your solution). Please ask the instructor if you are unsure

whether a specific package can be used.

2Note that this differs slightly from Lecture 16 as we include the bias parameter in the classifier design.

1

FTEC2101/ESTR2520 Project 2

Dataset & Folder Setting We have m ∈ N samples of training data described by {x

(i)

, y(i)}

m

i=1, where

x

(i) ∈ R

d

is the ith feature vector and y

(i) ∈ {±1} is the associated label. We will use a curated version

of the Bankruptcy dataset taken from https://archive.ics.uci.edu/dataset/365/polish+companies+

bankruptcy+data. It includes the d = 64 performance indicators from around 10000 companies in Poland

from 2000 to 2012, and includes a label that tells if the company has gone bankrupt or not. Our goal is to

learn a classifier to predicts if a company is going to bankrupt based on its recent performance.

To prepare the environment for the project, retrieve the Jupyter notebook ftec2101-project-2024.ipynb

and the zip archive ftec-project-files.zip from Blackboard. Place the *.ipynb files in a working directory,

extract the *.zip file and move the *.csv files inside the same working directory. Your working directory

should have the following content:

Notice that:

• ftec-groupi-train.csv – the training dataset for students in group i, i = 1, 2, 3, 4, 5, 6. This is a csv

file that contains 20 samples of company data to be used in the Compulsory Tasks.

• ftec-groupi-test.csv – the testing dataset for students in group i, i = 1, 2, 3, 4, 5, 6. This is a csv file

that contains 60 samples of company data to be used in the Compulsory Tasks.

• ftec-full-train.csv – the training dataset that contains 8000 samples of company data to be used in

the Competitive Tasks.

• ftec-full-test.csv – the training dataset that contains 2091 samples of company data to be used in

the Competitive Tasks.

Lastly, the Jupyter notebook ftec2101-project-2024.ipynb provided detailed descriptions and helper codes

to guide you through the tasks required by this project. Please pay attention to the comments provided inside

the code cells of the Jupyter notebook as well.

1.1 Compulsory Tasks (50% + 2% Bonus)

The compulsory tasks of this project is divided into two parts. You are required to

(A) answer questions related to the optimization theory and modeling related to binary classification; and

(B) implement the modeled optimization problems on computer and solve them with the Bankruptcy dataset.

FTEC2101/ESTR2520 Project 3

Theory and Modeling Denote the training dataset with m samples as {x

(i)

, y(i)}

m

i=1, where x

(i) ∈ R

d

is the ith feature vector with d attributes and y

(i) ∈ {±1} is the associated label. Let R0 > 0 and ℓi > 0,

i = 1, ..., m be a set of positive weights. The following optimization problem designs a soft-margin classifier :

min

w∈Rd,b∈R

Xm

i=1

ℓi max{0, 1 − y

(i)


(x

(i)

)

⊤w + b



} s.t. w⊤w ≤ R0. (1.2)

It can be easily shown that (1.2) is a convex optimization problem.

Task 1: Formulation (10% + 2% Bonus)

Answer the following question:

(a) Suppose that the optimal objective value of (1.2) is zero. Explain why in this case, with reference

to (1.1), any optimal solution to (1.2) is a classifier (w⋆

, b⋆

) that can correctly distinguish the m

training samples into the +1 or −1 labels.

(b) Give an example of training dataset with d = 2 where the optimal objective value of (1.2) is not

zero. You may describe such dataset by drawing it on a 2-D plane.

(c) Rewrite (1.2) as an equivalent nonlinear program like the form given in the lecture notes, e.g.,

min f0(x) s.t. fi(x) ≤ 0, i = 1, ..., m. Make sure that each of the fi(x) is differentiable.

(d) Derive the KKT condition for the equivalent formulation in (c).

(e) Suppose the optimal objective value of (1.2) is zero, show that there may exist more than one

optimal solution to (1.2). (Hint: you may let d = 2 and consider a similar dataset to the one

illustrated in Fig. 1).

(f) (Bonus) Explain the phenomena in (e) using the KKT conditions derived in (d).

Eq. (1.2) is called the soft-margin formulation for binary classification. In particular, we observe that the term

max{0, 1 − y

(i)


(x

(i)

)

⊤w + b



} evaluates the amount of error for the ith sample, and we note that the term is

> 0 if and only if the ith training sample is mis-classified for the sample, i.e., y

(i)


(x

(i)

)

⊤w + b



≱ 1.

Feature/Attribute Selection. Besides the (training) accuracy of a model, for classification problem with

large d, i.e., there are many attributes, another interesting aspect is on the set of selected attributes. Let

w⋆ ∈ R

d be an optimal classifier, e.g., found by solving (1.2), the selected attributes are

S := {i ∈ {1, . . . , d} : |w

i

| ̸= 0}

An attribute j is not selected if w

j = 0 as it does not contribute to the prediction of label in (1.1).

In practice, it is believed that a sparse classifier, i.e., one with a small |S|, is better as it is easier to interpret,

easier to implement, etc. In the following tasks, we will build upon the model (1.2) and incorporate various

constraints to favor a sparse classifier design into the classifier design (via optimizatio

FTEC2101/ESTR2520 Project 4

Task 2: Optimization Formulation (10%)

Answer the following questions:

(a) Show that (1.2) can be written as a Second-order Cone Programming (SOCP) problem.

(b) We now incorporate a shaping constraint into the soft-margin problem. Formulate a similar SOCP

problem to the one in part (a) with the following requirement: for given R0, R1 > 0,

• The objective is the same as in (1.2).

• The directional parameters w ∈ R

d

satisfies the following shaping constraint:

w⊤Σw + c

⊤w ≤ R0,

where Σ ∈ R

d×d

is a given symmetric, positive definite matrix, and c ∈ R

d

is a given vector.

• The directional parameter and bias parameter belongs to an ℓ1 ball to promote sparsity, i.e.,

X

d

i=1

|wi

| + |b| ≤ R1.

You may begin by formulating the problem as specified above, and then demonstrate how the

problem can be converted into an SOCP.

(c) As an alternative to part (b), formulate a mixed integer program (MIP) problem which imposes

a hard constraint on the sparsity of the classifier, i.e., for given R0 > 0, R1 > 0, S > 0, specified as

• The objective is the same as in (1.2).

• The directional parameters w ∈ R

d

satisfies the following shaping constraint:

w⊤Σw + c

⊤w ≤ R0,

where Σ ∈ R

d×d

is a given symmetric, positive definite matrix, and c ∈ R

d

is a given vector.

• Each element in w is bounded such that

−R1 ≤ wi ≤ R1, i = 1, ..., d.

• The number of non-zero elements in the vector w is constrained such that

(no. of non-zero elements in the vector w) ≤ S

Computational We shall put the optimization designs formulated in the above into practice. Our

tasks are structured into 3 stages: data analysis, optimization, interpretation. The Jupyter notebook template

ftec2101-project-2024.ipynb provides descriptions and helper codes to guide you through most of the

following tasks. Please pay attention to the comments in the Jupyter notebook.

In the compulsory tasks, we focus on a training dataset of m = 20 companies (ftec-groupi-train.csv). Each

company has 64 attributes (performance indicators). The dataset also contains information of whether the

company has bankrupted or not, treated as the label yi ∈ {±1}.

FTEC2101/ESTR2520 Project 5

Task 3: Warmup Exercise (5%)

(a) Inspect the dataset by making a 2-D scatter plots of the 20 samples over the features ‘Attr1’ and

‘Attr2’ that corresponds to ‘net profit / total assets’ and ‘total liabilities / total assets’, respectively.

Mark the ‘Bankrupt’ (resp. ‘Not Bankrupt’) companies in red (resp. blue). Comment on the pattern

observed.

(b) Try 2-3 more combinations of pairs of features and make comments on the observations.

Remark: The program template has provided the relevant helper codes for this task, but you may have

to ‘tweak’ the template to examine other pairs of features in part (b).

For part (a) in the above, you may observe an output similar to:

Moreover, you may notice that the training dataset is unbalanced. There are only 15-20% of bankrupted

companies with a +1 label. In the following tasks, you will implement classifier designs based on the MIP and

SOCPs from Task 2.

As a first step, we design the classifier based on the first d = 10 features, i.e., ‘Attr1’ to ‘Attr10’.

FTEC2101/ESTR2520 Project 6

Task 4: Optimization-based Formulation (15%)

(a) Implement and solve the SOCP problem from Task 2-(a) with the following parameters:

ℓi = weight · (yi + 1) + 1, R0 = 5.

Note that weight > 0 is a given scalar that you can modify in the Jupyter notebook. In particular,

it serves the purpose of weighing more for the samples with bankrupted companies. You may use

the solver ECOS in JuMP for the SOCP.

(b) Implement and solve the SOCP problem from Task 2-(b) with the following parameters:

ℓi = weight · (yi + 1) + 1, R0 = 5, R1 = 2.5, c = 0, Σ = I.

You may use the solver ECOS in JuMP for the SOCP.

(c) Implement and solve the MIP problem from Task 2-(c) with the following parameters:

ℓi = weight · (yi + 1) + 1, R0 = 5, R1 = 10, S = 2, c = 0, Σ = I.

You may use the solver Juniper in JuMP for the MIP.

(d) Using the default setting of weight = 1 for the above. Compare the sparsity level of the classifier

solutions found in part (a), (b), (c) by plotting the values of the classifiers learnt. Comment on

whether the classifiers found are reasonable.

Notice that it may take a while to solve the MI-NLP in Task 5-(c) since an MIP problem is quite challenging

to solve in general (with d = 10, in the worst case, it may have to test 210 options).

Recalling from Computing Lab 2, the performance of a classifier can be evaluated by the error rate when

applied on a certain set of data. It can further be specified into false alarm rate and missed detection rate. To

describe these metrics, note that for a given classifier (w, b), the predicted label is

(i) =

(

+1, if w⊤x

(i) + b ≥ 0,

−1, if w⊤x

(i) + b < 0,

Now, with the training dataset {x

(i)

, y(i)}

m

i=1. Suppose that m− is the number of samples with yi = −1 and

D− is the corresponding set of samples, m+ is the number of samples with yi = 1 and D+ is the corresponding

set of samples. The error rates are

False Alarm (FA) Rate = 1

m−

X

i∈D−

1(ˆy

(i)

̸= −1), Missed Detection (MD) Rate = 1

m+

X

i∈D+

1(ˆy

(i)

̸= 1) (1.3)

Notice that both error rates are between 0 and 1. Sometimes they are called the Type I and Type II errors,

respectively, see https://en.wikipedia.org/wiki/False_positives_and_false_negatives.

As our aim is to design a classifier that makes prediction on whether a future company that is not found in the

training dataset will go bankrupt, it is necessary to evaluate the error rate on a testing dataset that is unseen

during the training. Denote the testing dataset with mtest samples as {x

(i)

test, y

(i)

test}

mtest

i=1 , the testing error rate

for a classifier (w, b) can be estimated using similar formulas as in (1.3). Consider the following task:

FTEC2101/ESTR2520 Project 7

Task 5: Error Performance (10%)

For our project, the testing dataset is prepared in ftec-groupi-test.csv.

(a) Write a function fine error rate that evaluates the FA/MD error rates as defined in (1.3).

(b) Evaluate and compare the error rate performances for the 3 formulations you have found in Task

4. For each of the formulation, adjust the parameter weight≥ 0 so that it balances between the

FA and MD rates on the training dataset, e.g., both rates are less than or equal 0.5. (The weight

parameter can be chosen individually for each classifier formulation, you may try anything from 0.5

to 2 until you get the desired performance).

(c) Based on the fine tuned classifiers in part (b), find the top-2 most significant features selected by

the optimization from the MIP formulation. Then, make a scatter plot (similar to Task 3-(a)) of

the training dataset for the two selected features. Then, overlay the fine tuned classifiers found

in part (b) on top of this scatter plot while ignoring other features.

Remark: Please make the function for evaluating error in (a) general such that it takes dataset of any size

and features of any dimension. You will have to reuse the same function in Task 6. For part (c), please

refer to (1.1) how you would define a line on the 2D-plane of the selected attributes, and pay attention to

the comment provided in the helper code.

The scatter plot in Task 5-(c) may look like (the selected attributes may vary from student to student):

FTEC2101/ESTR2520 Project 8

1.2 Competitive Tasks (30%)

The goal of this competitive task is to implement your solver to the binary classifier problem, without relying

on JuMP and its optimizers such as ECOS, Juniper, etc. as we have done so in the previous tasks. To motivate, we observe that while optimization packages such as JuMP are convenient to use, they are often limited

by scalability to large-scale problems when the number of training samples m ≫ 1 and/or the feature is high

dimensional d ≫ 1. The task would require considerably more advanced coding skills.

We shall consider the full dataset and utilize all the 64 available attributes to detect bankruptcy. Our

objectives are to find a classifier with the best training/testing error and the sparsest feature selection.

Our requirement is that (i) the classifier has to be found using a custom-made iterative algorithm such as

projected gradient descent for solving an optimization problem of the form:

min

w∈Rd,b∈R

fb(w, b) s.t. (w, b) ∈ X, (1.4)

where (ii) fb(·) shall be built using the provided training dataset and X ⊆ R

d × R is a convex set.

You are recommended to consider the logistic loss3 as we have done in Lecture 16 / Computing Lab 2:


Minimizing the above function leads to a solution (w, b) such that y


which makes a desired feature for a good classifier. Moreover, as inspired by Task 4, we may take


Our task is specified as follows.

Task 6: Customized Solver for Classifier Optimization (30%)

Using the dataset with the training data from m = 8000 samples in ftec-full-train.csv. Implement

an iterative algorithm to tackle (1.4). You are required to initialize your algorithm by w0 = 0, b0 = 0.

Suggestion: As the first attempt, you may consider the projected gradient descent (PGD) method using

a constant step size with fb(w, b) selected as the logistic function (1.5) and using the projection onto the

set X in (1.6). See Appendix A for the solution of the projection operator onto this X.

Assessment You will receive a maximum of 10% for correctly implementing at least one numerical algorithm (e.g., projected gradient), together with

1. plotting the trajectory of the algorithm is show that the objective value in (1.4) to be decreasing to a

certain value asymptotically and providing comments on the algorithm(s) implemented,

2. providing derivations and justifications on why the implemented algorithm is used.

3Notice that the logistic objective function can be interpreted alternatively as a formulation for the classifier design task with

the maximum-likelihood (ML) principle from statistics. This is beyond the scope of this project specification.

FTEC2101/ESTR2520 Project 9

We will also use the F1 score which is a common metric to evaluate the classifier performance:

F1 =

2(1 − PMD)

2(1 − PMD) + PF A + PMD

,

See https://en.wikipedia.org/wiki/F-score. Moreover, the number of non-zero elements in (w, b) will be

calculated according to the normalized version of latter, and

(# non-zero elements in w, b) = 1

i.e., the magnitude has to be large enough relative to the other elements. For convenience, we have provided

the functions f1 score, no of nonzeros for you in the project template which can be directly used. The

remaining 20% of your marks in this task will be calculated according to the following formula:

Score = 7.5% × exp 

10 · min{0.75, Your Training F1} − 10 · min{0.75, Highest Training F1}



+ 7.5% × exp 

10 · min{0.75, Your Testing F1} − 10 · min{0.75, Highest Testing F1}



+ 5% ×

max{4, Lowest number of non-zero elements in w, b}

max{4, Your number of non-zero elements in w, b}

. (1.7)

The highest F1 are the highest one among the class of FTEC21014

. Some tips for improving the performance

of your design can be found in Appendix B.

If you have tried more than one algorithm and/or more than one type of approximation, algorithm parameters,

you have to select only one set of classifier parameters (w, b) for consideration of the competition in (1.7).

Please indicate clearly which solution is selected in your report and include that in the submission of your

program files. That said, you are encouraged to try more of these different variants and include them in the

project report. Moreover, observe the following rules:

• The algorithms you designed are not allowed to directly optimize on the testing set data. In other

words, your iterative algorithm should not rely on any data in ftec-full-test.csv as you are not

supposed to see the ‘future company’ data while training a classifier. Your score in (1.7) will be set to

zero if we detect such ‘cheating’ behavior. However, you can evaluate the test error performance of your

solution as many time as you like before you find the best setting.

• Your selected algorithm for the competition must be deterministic and terminates in less than 104

iterations. In other words, you can not use stochastic algorithms such as stochastic gradient descent

(SGD) for the competition. That being said, you are encouraged to try such algorithms as an additional

task which may be counted towards the ‘innovation’ section.

If you have questions about the rules, please do not hesitate to consult the instructor at [email protected]

or the TA or ask on Piazza.

1.3 Report (20%)

You are required to compile a project report with answers to the questions posed in Task 1 to Task 6. For

your reference only, you may structure the report according to the order of the tasks:

4The scores for ESTR2520 students will be calculated by taking the best error performance across both ESTR2520 and

FTEC2101 students.

FTEC2101/ESTR2520 Project 10

1. Background and Introduction — In this section, you can briefly introduce the problem, e.g., explaining the goal of classifier design, discussing the role of optimization methods in tackling the problem.

2. Model and Theory — In this section, you can discuss how the classifier design problem is modeled as

optimization problems. More specifically,

– You may begin by discussing the soft-margin formulation (1.2) and then answer Task 1.

– Next, you can describe the optimization models and then answer Task 2.

3. Experiments — In this section, you describe the experiments conducted to test your formulation, i.e.,

– You may first describe the dataset by presenting the results from Task 3. In addition, it is helpful to

describe a few properties regarding the dataset, e.g., the size of the dataset, the range of the values for

the different features.

– Then, you can describe the experiments for each of the 3 formulations with the results from Task 4.

– Finally, you can compare the formulations by answering Task 5.

4. Competitive Task — In this section, you describe the custom solver you built to solve the large-scale

classifier design problem, i.e.,

– You shall first describe your formulation as laid out in the discussion of Section 1.2.

– Then, you shall describe the iterative algorithm you have derived in Task 6.

– Apply the iterative algorithm on the complete training dataset and show the objective value vs. iteration number. Discuss whether the algorithm converges and report on the performance of the designed

classifier.

5. Conclusions — In this section, you shall summarize the findings in the project, and discuss various

aspects that can be improved with the formulation, etc..

Throughout the report, please feel free to write your answer which involves equations (e.g., Task 1-2) on a paper

and scan it to your Word/PDF report as a figure. On the other hand, if you wish to typeset the mathematics

formulas in your report nicely, you are strongly recommended to use Latex, e.g., http://www.overleaf.com

(P.S. This project specification, and other lecture materials in this course have all been typesetted in Latex).

For the latter, a Latex template has been provided on Blackboard.

For Task 3 to 6, please include all the plots and comments as requested. For Task 6, please indicate the

Training F1, Testing F1, No. of non-zero elements in w for your selected solution. We will also run your

code to verify the values reported and take the ones obtained from your code.

The program code in .ipynb has to be submitted separately. However, you are welcomed to use excerpts from

the program codes in the report if you find it helpful for explaining your solution concepts.

Lastly, you are welcomed to use online resources when preparing the project. However, you must give proper

references for sources that are not your original creation.

Assessment Here is a breakdown of the assessment metric for the report writing component.

• (10%) Report Writing: A project report shall be readable to a person with knowledge in optimization

(e.g., your classmates in FTEC2101/ESTR2520). Make sure that your report is written with clarity, and

more importantly, using your own language!

FTEC2101/ESTR2520 Project 11

• (10%) Innovation: You can get innovation marks if you include extra experiments, presentations,

etc.. that are relevant to the project (with sufficient explanations); see Appendix A for some recommendations.

1.4 Submission

This is an individual project. While discussions regarding how to solve the problems is encouraged, students

should answer the problems on their own (just like your HWs). The deadline of submission is May 10 (Friday),

2024, 23:59 (HKT). Please submit with the following content to Blackboard:

• Your Project Report in PDF format.

• Your Program Codes [either in Jupyter notebook (.ipynb), or Julia code (.jl)].

In addition, the project report shall be submitted to VeriGuide for plagiarism check.

A Dataset Description

Here is the list of all the 64 features collected in the Bankruptcy dataset:

Attr1 net profit / total assets

Attr2 total liabilities / total assets

Attr3 working capital / total assets

Attr4 current assets / short-term liabilities

Attr5 [(cash + short-term securities + receivables - short-term liabilities)

/ (operating expenses - depreciation)] * 365

Attr6 retained earnings / total assets

Attr7 EBIT / total assets

Attr8 book value of equity / total liabilities

Attr9 sales / total assets

Attr10 equity / total assets

Attr11 (gross profit + extraordinary items + financial expenses) / total assets

Attr12 gross profit / short-term liabilities

Attr13 (gross profit + depreciation) / sales

Attr14 (gross profit + interest) / total assets

Attr15 (total liabilities * 365) / (gross profit + depreciation)

Attr16 (gross profit + depreciation) / total liabilities

Attr17 total assets / total liabilities

Attr18 gross profit / total assets

Attr19 gross profit / sales

Attr20 (inventory * 365) / sales

Attr21 sales (n) / sales (n-1)

Attr22 profit on operating activities / total assets

Attr23 net profit / sales

Attr24 gross profit (in 3 years) / total assets

Attr25 (equity - share capital) / total assets

Attr26 (net profit + depreciation) / total liabilities

Attr27 profit on operating activities / financial expenses

Attr28 working capital / fixed assets

Attr29 logarithm of total assets

FTEC2101/ESTR2520 Project 12

Attr30 (total liabilities - cash) / sales

Attr31 (gross profit + interest) / sales

Attr32 (current liabilities * 365) / cost of products sold

Attr33 operating expenses / short-term liabilities

Attr34 operating expenses / total liabilities

Attr35 profit on sales / total assets

Attr36 total sales / total assets

Attr37 (current assets - inventories) / long-term liabilities

Attr38 constant capital / total assets

Attr39 profit on sales / sales

Attr40 (current assets - inventory - receivables) / short-term liabilities

Attr41 total liabilities / ((profit on operating activities + depreciation) * (12/365))

Attr42 profit on operating activities / sales

Attr43 rotation receivables + inventory turnover in days

Attr44 (receivables * 365) / sales

Attr45 net profit / inventory

Attr46 (current assets - inventory) / short-term liabilities

Attr47 (inventory * 365) / cost of products sold

Attr48 EBITDA (profit on operating activities - depreciation) / total assets

Attr49 EBITDA (profit on operating activities - depreciation) / sales

Attr50 current assets / total liabilities

Attr51 short-term liabilities / total assets

Attr52 (short-term liabilities * 365) / cost of products sold)

Attr53 equity / fixed assets

Attr54 constant capital / fixed assets

Attr55 working capital

Attr56 (sales - cost of products sold) / sales

Attr57 (current assets - inventory - short-term liabilities) / (sales - gross profit - depreciation)

Attr58 total costs /total sales

Attr59 long-term liabilities / equity

Attr60 sales / inventory

Attr61 sales / receivables

Attr62 (short-term liabilities *365) / sales

Attr63 sales / short-term liabilities

Attr64 sales / fixed assets

B Additional Information

Suggestions — The below are only suggestions for improving the performance of your classifier design in

Task 6. You are more than welcomed to propose and explore new ideas (but still, make sure that they are

mathematically correct – feel free to ask the instructor/TA if in doubt)!

• Formulation Aspect – Here are some tricks to tweak the performance of your classifier design in Task 6.

1. The design of the weights {ℓi}

m

i=1 maybe crucial to the performance of your classifier. Like what you did

in Task 5, try tuning the parameter weight to get better performance.

2. The value of R1 in (1.6) is crucial to the sparsity of the classifier found.

3. The logistic regression loss (1.5) is not the only option. Some reasonable/popular options can be found

in https://www.cs.cornell.edu/courses/cs4780/2022sp/notes/LectureNotes14.html.

• Algorithm Aspect – For Task 6, the recommended algorithm is projected gradient descent (PGD) method,

FTEC2101/ESTR2520 Project 13

which are described as follows. For solving a general optimization:

min

w∈Rd,b∈R

fb(w, b) s.t. (w, b) ∈ X. (1.8)

With a slight abuse of notation, we denote x ≡ (w, b) and the PGD method can be described as

PGD Method

Input: x

(0) ∈ X, constant step size γ > 0, max. iteration number Kmax.

For k = 0, ..., Kmax

x

(k+1) = ProjX



x

(k) − γ∇fb(x

(k)

)


End For

The book [Beck, 2017] is a good reference for learning different optimization algorithms.

When X = {x ∈ R

d

:

Pd

i=1 |xi

| ≤ R} as in (1.6), the projection operator is (see [Duchi et al., 2008])

1. Input: x ∈ R

d

, R > 0.

2. Calculate the vector u = abs.(x) such that it takes the absolute values of the input x.

3. Sort elements in u with decreasing magnitude, denote the sorted vector as v, |v1| ≥ · · · ≥ |vd|.

4. For j = 1, ..., d,

If vj −

1

j

Pj

r=1 vr − R



≤ 0, Then set jsv = j − 1 and break the for-loop.

5. Set θ =

1

jsv Pjsv

r=1 vr − R



.

6. Return: the vector xˆ such that ˆxi = sign(xi) max{0, |xi

| − θ} for i = 1, ..., d.

Besides, you are more than welcomed to explore the use of other iterative algorithms, e.g., conditional

gradient, back tracking line search, etc., for solving the optimization at hand.

Lastly, tips for implementing the MI-NLP, SOCP, etc.. in the compulsory tasks have been included with the

the template program. Be reminded that it is not necessary to follow all the tips therein.

C On the Use of Generative AI Tools

We are following Approach 3 as listed in the University’s Guideline on the matter: https://www.aqs.

cuhk.edu.hk/documents/A-guide-for-students_use-of-AI-tools.pdf — Use of AI tools is allowed with

explicit acknowledgement and proper citation. In short, you are allowed to use generative AI tools to assist

you, provided that you give explicit acknowledgement to the use of such tools, e.g., you may include a

sentence like:

The following section has been completed with the aid of ChatGPT.

Failure to do so will constitute act of academic dishonesty and may result in failure of the course and/or other

penalties; see https://www.cuhk.edu.hk/policy/academichonesty/. Below we list a number of advices for

the do’s and don’ts using AI tools:

• DO’s: You may use AI tools for polishing your writeups, e.g., to correct grammatical mistakes, typos,

or summarizing long/complicated paragraphs, etc. The results are usually quite robust especially for

improving the writings from less experienced writers. Of course, you are responsible for the integrity of

the edited writing, e.g., check if the AI tools have distorted the meaning of your original writeups or not.

FTEC2101/ESTR2520 Project 14

• DON’Ts: You should not ask AI tools to solve mathematical questions. Not only this will spoil the

purpose of learning, AI tools do a notoriously bad job for tasks involving facts and mathematical/logical

reasoning. Worst still, they tend to produce solutions that sound legit but are completely wrong.

• DON’Ts: You should not ask AI tools to write the entire project (report) for you. Likewise, AI tools

are notoriously bad at creating (technical and logical) content. They tend to produce writings that sound

legit but are completely illogical.

We believe that when properly used, they can be helpful in improving students’ overall learning experience.

In fact, you are even encouraged to try them out at your leisure time. Nevertheless, we emphasize again that

you have to provide explicit acknowledgement in your submission if you have used any generative AI tools

to assist you in this course.

References

A. Beck. First-order methods in optimization. SIAM, 2017.

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in

high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272–279,

2008.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8