newsboard_data
2024-02-02
newsboard_data.Rmd
Introduction to swissechangedata’s newsboard_data() function
This vignette offers a glimpse on the inner functions of the
swissexchangedata
package, built up for educational
purpose to learn and sharpen skills in web-scrapping of data
from online sources, data wrangling and package development.
When extracting the Swiss Stock Exchange data from the Six Group newsboard website, you must:
- Define the
Date Range
,Message Type
,Market
andProducts
. - Open in separate view for printing and saving into a pdf file
The swissexchangedata
facilitates ease of defining the
desired parameters and accessing the data directly into your
workspace
, making use of the currently existing function
news newsboard_data()
.
More functions will be added to capture access of data from other tabs provided on the company website.
The data range capture only 2 years of data, so the stored data requires updating over time to capture most of the data. Looking forward to have various versions of data that update over time.
The scrapped data is cleaned further to add nested data.
Data Access
Get the data for September 2023 up-to-date:
all_items_data_sep_2024 <- newsboard_data(firstDate = '2023-10-01', lastDate = Sys.Date())
#> [1] "Page 0"
#> [1] "Page 0 has 10 observations."
#> [1] "Page 1"
#> [1] "Page 1 has 10 observations."
#> [1] "Page 2"
#> [1] "Page 2 has 10 observations."
#> [1] "Page 3"
#> [1] "Page 3 has 10 observations."
#> [1] "Page 4"
#> [1] "Page 4 has 10 observations."
#> [1] "Page 5"
#> [1] "Page 5 has 10 observations."
#> [1] "Page 6"
#> [1] "Page 6 has 10 observations."
#> [1] "Page 7"
#> [1] "Page 7 has 10 observations."
#> [1] "Page 8"
#> [1] "Page 8 has 10 observations."
#> [1] "Page 9"
#> [1] "Page 9 has 10 observations."
#> [1] "Page 10"
#> [1] "Page 10 has 10 observations."
#> [1] "Page 11"
#> [1] "Page 11 has 10 observations."
#> [1] "Page 12"
#> [1] "Page 12 has 10 observations."
#> [1] "Page 13"
#> [1] "Page 13 has 10 observations."
#> [1] "Page 14"
#> [1] "Page 14 has 10 observations."
#> [1] "Page 15"
#> [1] "Page 15 has 10 observations."
#> [1] "Page 16"
#> [1] "Page 16 has 10 observations."
#> [1] "Page 17"
#> [1] "Page 17 has 10 observations."
#> [1] "Page 18"
#> [1] "Page 18 has 8 observations."
#> [1] "Page 19"
#> [1] "Page 19 has observations."
all_items_data_sep_2024
Data Cleaning
Apparently, the newsText
column contains further tables
on the trades, for which access require further cleaning. The cleaning
code is also used in cleaning data package data
swissexchangedata::newsboardmarketdata
and replicated
herein.
# obtain the preceding text
itemlist_newsText_clean <-function(vec){
# obtain the preceding text
read_html(vec) |> html_text()
}
# obtain the data table
itemlist_newsText_table_clean <-function(vec){
# obtain the html tabke from the vector
temp_obj = read_html(vec) |> html_table(fill=TRUE) #|> unlist()
# |> do.call(rbind, lapply(., as.data.frame))
# if no data, skip, else convert to dataframe
if (!is.null(temp_obj)){
temp_obj = temp_obj |> as.matrix() |>
t() |>
as.data.frame() #|> unlist()
} else {
NA
}
temp_obj
}
# get copy of data to all_items_data_sep_2024
all_items_data_sep_2024_original <- all_items_data_sep_2024
# extract text and table from newsText through apply
newsText_column_index <- grep("newsText", colnames(all_items_data_sep_2024))
#
all_items_data_sep_2024$data <- apply(
X = all_items_data_sep_2024[,newsText_column_index,drop=FALSE],
MARGIN = 1, simplify = T,
FUN = itemlist_newsText_clean)
#
all_items_data_sep_2024$table_data <- apply(
X = all_items_data_sep_2024[,newsText_column_index,drop=FALSE],
MARGIN = 1, simplify = F,
FUN = itemlist_newsText_table_clean)
Output
all_items_data_sep_2024[1:2,1:15]
#> messageNo isin valorSymbol title
#> 1 209039 CH1139756881 ZSMIAZ Mistrade Decision in ZSMIAZ / CH1139756881
#> 2 209038 LU0950674761 EMUSRI Mistrade Decision in EMUSRI / LU0950674761
#> messageType broadcastDateTime security tradingSegment
#> 1 Mistrade 2.02402e+13 ZSMIAZ ZKB C Structured Products
#> 2 Mistrade 2.02402e+13 UBSETF MSCI EMU SRI EUR ACC ETF
#> priority markets products currency
#> 1 Normal XQMH DE CHF
#> 2 Normal XSWX FU EUR
#> newsText
#> 1 <div><p>In accordance with the rules of SIX Swiss Exchange, the following trade in <strong>'ZSMIAZ ZKB C'</strong> has been declared a mistrade and has therefore been cancelled: </p><table><thead><tr><th align="center" style="width: 75px">Trade Date</th><th align="center" style="width: 75px">Time</th><th align="center" style="width: 45px">Cur</th><th align="left" style="width: 75px">Size</th><th align="left" style="width: 75px">Price</th><th align="left" style="width: 75px">Trade Type</th><th align="left" style="width: 75px">Book Type</th><th align="left" style="width: 75px">Ref Exch</th></tr></thead><tbody><tr><td align="center">02.02.2024</td><td align="center">09:15:58</td><td align="center">CHF</td><td align="left">17</td><td align="left">875.0000</td><td align="left">OnExchange</td><td align="left">QuoteBook</td><td align="left"/></tr></tbody></table><p>Please find further information concerning mistrades in Directive 4: Market Control on our website.</p><p>Regards,<br/>Exchange Operations, SIX Swiss Exchange</p></div>
#> 2 <div><p>In accordance with the rules of SIX Swiss Exchange, the following trade in <strong>'UBSETF MSCI EMU SRI EUR ACC'</strong> has been declared a mistrade and has therefore been cancelled: </p><table><thead><tr><th align="center" style="width: 75px">Trade Date</th><th align="center" style="width: 75px">Time</th><th align="center" style="width: 45px">Cur</th><th align="left" style="width: 75px">Size</th><th align="left" style="width: 75px">Price</th><th align="left" style="width: 75px">Trade Type</th><th align="left" style="width: 75px">Book Type</th><th align="left" style="width: 75px">Ref Exch</th></tr></thead><tbody><tr><td align="center">01.02.2024</td><td align="center">16:00:29</td><td align="center">EUR</td><td align="left">200</td><td align="left">22.6600</td><td align="left">OnExchange</td><td align="left">QuoteBook</td><td align="left"/></tr></tbody></table><p>Please find further information concerning mistrades in Directive 4: Market Control on our website.</p><p>Regards,<br/>Exchange Operations, SIX Swiss Exchange</p></div>
#> newsTypeCode
#> 1 MI
#> 2 MI
#> data
#> 1 In accordance with the rules of SIX Swiss Exchange, the following trade in 'ZSMIAZ ZKB C' has been declared a mistrade and has therefore been cancelled: Trade DateTimeCurSizePriceTrade TypeBook TypeRef Exch02.02.202409:15:58CHF17875.0000OnExchangeQuoteBookPlease find further information concerning mistrades in Directive 4: Market Control on our website.Regards,Exchange Operations, SIX Swiss Exchange
#> 2 In accordance with the rules of SIX Swiss Exchange, the following trade in 'UBSETF MSCI EMU SRI EUR ACC' has been declared a mistrade and has therefore been cancelled: Trade DateTimeCurSizePriceTrade TypeBook TypeRef Exch01.02.202416:00:29EUR20022.6600OnExchangeQuoteBookPlease find further information concerning mistrades in Directive 4: Market Control on our website.Regards,Exchange Operations, SIX Swiss Exchange
The data column on trades has been extracted from the HTML paragraph.
all_items_data_sep_2024[1,16][[1]][[1]]
#> [[1]]
#> # A tibble: 1 × 8
#> `Trade Date` Time Cur Size Price `Trade Type` `Book Type` `Ref Exch`
#> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <lgl>
#> 1 02.02.2024 09:15:58 CHF 17 875 OnExchange QuoteBook NA