联系方式

您当前位置:首页 >> Python编程Python编程

日期:2024-04-01 08:09

Financial Machine Learning (CMSE11475)

Group Project Assignment

2023/2024

Content

Content................................................................................................................................................................................................. 1

Project Description......................................................................................................................................................................... 2

Individual Project: ......................................................................................................................................................................... 2

Project Deadline and Submission:........................................................................................................................................... 2

Project topic ................................................................................................................................................................................... 2

Project Hints ................................................................................................................................................................................... 2

Suggested Topics ............................................................................................................................................................................ 3

Forecasting Limit Order Book ............................................................................................................................................... 3

Forecasting Stock Volatility.................................................................................................................................................... 5

Forecasting High Frequency Cryptocurrency Return.................................................................................................. 7

Project Description

The project aims to practice the use of state-of-art machine learning models to analyse financial data and

solve financial problems.

Individual Project:

The project is individual project. No group is required. Students shall select their own topic with data to

complete their own research question alone. Cooperation and discussion with each other in the learning

process is encouraged but the project shall be completed by students’ own work, not a grouped work.

Project Deadline and Submission:

Individual projects run from 15

th January 2024 (week 1) to 29th March 2024 (week 10).

The deadline of submission is 14:00, Thursday, 4

th April 2024.

The submision of the project includes the project report and all implementation codes (do NOT submit any

data). The code shall work on the originally provided datasets. The report and the codes shall be ZIPPED to

one package for submission.

The report MUST follow the given template. All sections are required. The code MUST have complete and

detailed comments for every major logical section.

Project topic

Each student should individually choose a topic from the following suggested topics (with provided data) as

your own project. You are encouraged to revise/improve the project topic to make it more practical,

challenging, and suitable for your own research question. It’s fine if many students select the same suggested

topics as their projects as long as the codes and project reports are significantly distinctive.

The aim of this project is to apply at least THREE out of five techniques illustrated in the course (Deep Neural

Network; XGBoost; Cross-validation; Ensemble Model; Interpretability) to solve a financial problem.

Project Hints

All suggested topics are based on the computer lab examples with some changes and extensions. You can

easily find similar methods and models in the computer lab examples. Carefully studying those examples

and codes are crucial for understanding this course and complete the group coursework.

Suggested Topics

Forecasting Limit Order Book

Topic

Can we use deep neural network to forecast the high-frequency return at multiple horizon for stocks using

their limit order book information?

Data

10-level high frequency Limit Order Book of five stocks: Apple, Amazon, Intel, Microsoft, and Google on 21st

June 2012. Data size from 40MB to 100+MB. You can select to use part of the data.

Method

You may define the following features:are the ask and bid price of 10 levels (𝑖 = 1, … ,10), and 𝑣𝑡

𝑖,𝑎

and 𝑣𝑡

𝑖,𝑏

are the volume of 10 levels

(𝑖 = 1, … ,10). 𝑠𝑡

𝐿𝑂𝐵 ∈ ℝ40

2) Bid-Ask Order Flow (OF)

𝑏𝑂𝐹𝑡,𝑖 = {

𝑣𝑡

𝑖,𝑏

, 𝑖𝑓 𝑏𝑡

𝑖 > 𝑏𝑡−1

𝑖

𝑣𝑡

𝑖,𝑏 − 𝑣𝑡−1

𝑖,𝑏

,𝑖𝑓 𝑏𝑡

𝑖 = 𝑏𝑡−1

𝑖

−𝑣𝑡

𝑖,𝑏

, 𝑖𝑓 𝑏𝑡

𝑖 < 𝑏𝑡−1

𝑖

𝑎𝑂𝐹𝑡,𝑖 = {

𝑣𝑡

𝑖,𝑎

, 𝑖𝑓 𝑎𝑡

𝑖 > 𝑎𝑡−1

𝑖

𝑣𝑡

𝑖,𝑎 − 𝑣𝑡−1

𝑖,𝑎

,𝑖𝑓 𝑎𝑡

𝑖 = 𝑎𝑡−1

𝑖

−𝑣𝑡

𝑖,𝑎

, 𝑖𝑓 𝑎𝑡

𝑖 < 𝑎𝑡−1

𝑖

𝑂𝐹𝑖 ∈ ℝ20

3) Order Flow Imbalance (OFI)

𝑂𝐹𝐼𝑡 = 𝑏𝑂𝐹𝑡,𝑖 − 𝑎𝑂𝐹𝑡,𝑖

𝑂𝐹𝐼𝑡 ∈ ℝ20

The features can be defined as a vector

𝐗𝑡 = (𝑠𝑡

𝐿𝑂𝐵

, 𝑏𝑂𝐹𝑡,𝑖

, 𝑎𝑂𝐹𝑡,𝑖

,𝑂𝐹𝐼𝑡)

𝑇

The total dimension of feature vector 𝐗𝑡

is 40+20+10=70. 𝐗𝑡 ∈ ℝ70

.

The target is the the LOB mid-point return 𝐫𝑡 over 𝐻 future horizons (𝐻 ≥ 1).

𝐫𝑡 = (𝑟𝑡,1, … , 𝑟𝑡,𝐻)

𝑇

This project is to estimate the function 𝑓(∙), that takes a sequence of historical 𝐗𝑡 as input and generates

vector 𝐫𝑡 as output:

𝐫𝑡 = 𝑓(𝐗𝑡

,𝐗𝑡−1, 𝐗𝑡−2, … , 𝐗𝑡−𝑾)

Where 𝑾 is the look back window, 𝐫𝑡 = (𝑟𝑡,1, … , 𝑟𝑡,𝐻)

𝑇

𝑗 = 1, … , 𝐻.

This topic shall use LSTM as one of the potential models. You may try to train the LSTM model with the raw

70-dimension features 𝐗𝑡 with different 𝑾. You may also extract the features with lower dimensions 𝑀 < 70

by autoencoder and then train the LSTM model using the extracted features with different 𝑾. You can provide

a comparison of those two methods.

This project shall also address the question of the feature importance.

Forecasting Stock Volatility

Topic

This topic comprises two subtopics, both pertaining to volatility forecasting. These subtopics are as follows:

1) Is stock volatility path-dependent?

2) Is stock volatility past-dependent?

To address these questions, you have the option to employ various machine learning models for forecasting

stock return volatility. This can be achieved either by utilising past returns (path-dependent) or past volatilities

(past-dependent).

Addressing either of the aforementioned sub-questions fulfils the coursework requirements for the

FML course. There is no need to complete work for both questions.

Data

In computer lab_3_1, we show the method to download stock prices from Yahoo Finance. This topic uses the

stock adjusted prices to calculate its volatility. You shall calculate the volatility as the standard deviation of the

𝑁 daily arithmetic returns, but it's essential to note that this volatility should be computed based on returns

within distinct, non-overlapping 𝑁-day intervals. 𝑁 can be five or ten days. The following figure shows the

volatility calculation, where 𝑟𝑖

is the daily return and 𝜎𝑖

is the five-day volatility.

To successfully complete the coursework, you must choose a minimum of two stocks to assess one of the

aforementioned questions. The selection of these stocks should align with your personal interests.

Method

The topic is to investigate whether the volatility is path-dependent or past-dependent. But the length 𝐿 of

the path and past are unknown. You can select 𝐿 as 5, 10, 15, 20, or 40 days in the investigation and conclude

with a best 𝐿. Please decide by yourself what lengths 𝐿 to select in your coursework.

For the question of path-dependent, the input features contain the daily returns in past 𝐿 days:

𝐗𝑡 = (𝑟𝑡−1, 𝑟𝑡−2, 𝑟𝑡−2, … , 𝑟𝑡−𝐿

)

𝑇

The output is the volatility 𝑦𝑡 = 𝜎𝑡

. Please be aware that the returns in 𝐗𝑡

shall not be included in the

calculation of the output volatility 𝑦𝑡

. As illustrated in figure below, to forecast the volatility 𝜎𝑡

, you can use

the daily returns 𝑟𝑡−1, 𝑟𝑡−2,…, 𝑟𝑡−𝐿

in past 𝐿 days.

For the question of past-dependent, the input features contain the previous 𝐿 volatilities:

𝐗𝑡 = (𝜎𝑡−1, 𝜎𝑡−2, 𝜎𝑡−3, … , 𝜎𝑡−𝐿

)

𝑇

The output is the volatility 𝑦𝑡 = 𝜎𝑡

.

This topic shall use any of the machine learning models.

This topic may also answer what length 𝐿 generate the best forecasting results for the path- and pastdependence.

Forecasting High Frequency Cryptocurrency Return

Topic

This topic is to study how machine learning models perform in forecasting 15-minute ahead return in any of

the 14 popular cryptocurrencies.

Data

A dataset “cryptocurrency_prices.csv” of millions of rows of 1-minute frequency market data dating back to

2018 is provided for building the model. The dataset contains 14 popular cryptocurrencies, distinguished by

asset IDs. The details of the asset IDs and names are in the file “asset_details.csv”. You may choose any

cryptocurrencies to forecast. The “Weight” in the file is to calculate the whole market of cryptocurrency and

will be introduced in next section.

Asset_ID Weight Asset_Name

2 2.397895273 Bitcoin Cash

0 4.304065093 Binance Coin

1 6.779921907 Bitcoin

5 1.386294361 EOS.IO

7 2.079441542 Ethereum Classic

6 5.894402834 Ethereum

9 2.397895273 Litecoin

11 1.609437912 Monero

13 1.791759469 TRON

12 2.079441542 Stellar

3 4.406719247 Cardano

8 1.098612289 IOTA

10 1.098612289 Maker

4 3.555348061 Dogecoin

In the file “cryptocurrency_prices.csv”, the target has been calculated and provided as the column “Target”.

The target is derived from the log return over the future 15 minutes, for each cryptocurrency asset 𝑎 as the

residual of 15 minutes log return Target𝑡

𝑎

. Noted that, in each row, the “Target” has already been aligned as

the future 15 minute return residual and is to be forecasted. (Target: Residual log-returns for the asset over

a 15 minute horizon.)

We can see the features included in the dataset as the following:

timestamp: All timestamps are returned as second Unix timestamps (the number of seconds elapsed since

1970-01-01 00:00:00.000 UTC). Timestamps in this dataset are multiple of 60, indicating minute-by-minute

data.

Asset_ID: The asset ID corresponding to one of the crytocurrencies (e.g. Asset_ID = 1 for Bitcoin). The mapping

from Asset_ID to crypto asset is contained in asset_details.csv.

Count: Total number of trades in the time interval (last minute).

Open: Opening price of the time interval (in USD).

High: Highest price reached during time interval (in USD).

Low: Lowest price reached during time interval (in USD).

Close: Closing price of the time interval (in USD).

Volume: Quantity of asset bought or sold, displayed in base currency USD.

VWAP: The average price of the asset over the time interval, weighted by volume. VWAP is an aggregated

form of trade data.

Method

You may define some additional features. For example, the past 5 minute log return, the past 5 minute

absolute log return, past 5 minute highest, past 5 minute lowest, etc.

You may try simple models, i.e., linear tree, and complex models, i.e., LSTM and compare their forecasting

performance.

If using LSTM, you may also study what length of the looking back window provide the best forecasting

performance.

In addition, the feature importance shall also be studied to show which features contribute to the stock relative

performance in the future the best.

Appendix

This appendix introduces how the target is calculated.

The log return at time 𝑡 for asset 𝑎 is calculated as:

𝑅𝑡

𝑎 = log (

𝑃𝑡+16

𝑎

𝑃𝑡+1

𝑎 )

As the crypto asset returns are highly correlated, forecasting returns for individual asset shall remove the

market signal from individual asset returns. Therefore, the weighted average cryptocurrency market return 𝑀𝑡

is defined as:

is the weight for each cryptocurrency and is defined in the column “Weight” in the file

“asset_details.csv”.

Then, a beta is calculated for each asset 𝛽

Where the bracket 〈∙〉 calculate the rolling window average over the past 3750 minute windows.

Then, a regression residual is defined as the target for each asset Target𝑡

BUT, you don’t need to do this calculation. The target values have been calculated and provided in the column.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8