utils

tf2xgb.utils.gen_random_dataset(n, n_subgrp, n_grp, beta, sigma)

Generate random Pandas Data Frame with n observations split to n_subgrp distinct subgroups described by column 'subgrp_id' and 'n_grp' distinct groups described by column 'grp_id'. The target in column 'y' is linear combination of feature vector in column 'X' with true coefficient vector beta and standard error sigma. Intercept is zero.

Parameters

n – number of observations
n_subgrp – number of subgroups
n_grp – number of groups
beta – true coefficients
sigma – standard error

Returns

random dataset

tf2xgb.utils.get_ragged_nested_index_lists(df, id_col_list)

Gets the ragged nested lists of indices (= row numbers of df). Hierarchy in the nesting is set up by the df columns with names in id_col_list.

Parameters

df – Pandas Data Frame with the sample. It ias to contain columns listed in id_col_list.
id_col_list – list of columns to df which correspond to the levels of nesting in the resulting index list. Higher-level groups have to be mentioned first, e.g. ['grp_id', 'subgrp_id'].

Returns

Pandas DF with two columns: copy of df[id_col_list[0]] and column '_row_' containing nested list of row numbers, which is input to decorator xgb_tf_loss().