LibCapy - dataset

Dataset class.

Macros:

Enumerations:

Type of field. datetime1: datetime dd-mm-yyyy hh:ii datetime2: datetime hh:ii

Type of column.

Typedefs:

None.

Struct CapyDatasetRow :

Struct CapyDatasetRow's properties:

Original string of the row with commas replaced with \0.

Pointers to each field in the row.

Index of the row in the dataset, starting at 0 not counting the header lines.

Flag to memorise if the row contains null/unknown values

Number of fields with a null/unknown value in that row.

Struct CapyDatasetRow's methods:

Destructor

Struct CapyDatasetFieldDesc :

Struct CapyDatasetFieldDesc's properties:

Pointer to the field label.

Field type.

Field interface.

Index in the row.

For fields of categorical types, number of value in the category, for field of numerical types, number of row in the dataset

For fields of categorical types, array of pointer to the category's values

Range of values (converted to numerical if the field not numerical)

Flag to memorise if the field contains null/unknown values

Number of rows with a null/unknown value in that field.

Struct CapyDatasetFieldDesc's methods:

Destructor

Struct CapyDataset :

Struct CapyDataset's properties:

Number of rows.

Number of rows with at least one null/unknown value.

Number of fields in each row.

Number of fields with at least one null/unknown value.

Fields interface row with comma replaced with \0.

Fields type row with comma replaced with \0.

Fields label row with comma replaced with \0.

Fields description.

Array of rows.

Number of threads for multithreaded operation (default: 10)

Struct CapyDataset's methods:

Destructor

Load the dataset from a file at a given path

Input argument(s):

path: path to the dataset file

Exception(s):

May raise CapyExc_MallocFailed, CapyExc_StreamReadError, CapyExc_InvalidStream.

Print the dataset description of the dataset on a given stream.

Input argument(s):

stream: the stream to print onto

Print the dataset data of the dataset on a given stream.

Input argument(s):

stream: the stream to print onto
nbRow: if not 0 print the first nbRow rows only

Get the number of input fields

Output and side effect(s):

Return the number of input fields

Get the number of output fields

Output and side effect(s):

Return the number of output fields

Get the number of fields of a given type

Input argument(s):

type: the type of field to be counted

Output and side effect(s):

Return the number of fields

Get the field index of the i-th input

Input argument(s):

iInput: index of the input

Output and side effect(s):

Return the index

Exception(s):

May raise CapyExc_InvalidElemIdx.

Get the field index of the i-th output

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return the index

Exception(s):

May raise CapyExc_InvalidElemIdx.

Get a value as a numeral. Inputs: iRow: index of the row iField: index of the field

Output and side effect(s):

For numeral fields return the value as it is, for categorical fields return the index of the value in the list of possible values (fieldDesc.categoryVals)

Exception(s):

May raise CapyExc_UndefinedExecution, , CapyExc_InvalidParameters.

Get a value as a normalised numeral. Inputs: iRow: index of the row iField: index of the field

Output and side effect(s):

Return the value, converted to numerical if the field is categorical, after normalisation according to the 'range' property of the field description.

Exception(s):

May raise CapyExc_UndefinedExecution.

Convert a dataset to a matrix to be used by a single category predictor

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus one. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The output value is assigned to the last column in the matrix, and equal to the category index.

Exception(s):

May raise CapyExc_UnsupportedFormat.

Convert a dataset to a matrix to be used by a numerical predictor

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus one. The output must be of type capyDatasetFieldType_num. Input and output values are converted using getValAsNum. The output value is assigned to the last column in the matrix.

Exception(s):

May raise CapyExc_UnsupportedFormat.

Get the number of different values for a given output Inputs: iField: index of the output

Output and side effect(s):

Return the number of different values for a categorical output field or the number of rows for a numerical output field

Convert a dataset to a matrix to be used by a one hot predictor

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus as many values the given output takes. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The one hot encoding of the output value is assigned to the last columns in the matrix, and take values 0 for 'is not this category' and 1 for 'is this category'.

Exception(s):

May raise CapyExc_UnsupportedFormat.

Get the index of a field from its name

Input argument(s):

name: the name

Output and side effect(s):

Return the index of the field, or raise the exception CapyExc_InvalidParameters if it couldn't be found.

Get the distribution of a field values as an array of bins.

Input argument(s):

iField: the index of the field
nbBin: the number of bins

Output and side effect(s):

Return a CapyArrSize of size 'nbBin'.

Get the distribution of a field values as an array of bins for a given value of a given categorical field

Input argument(s):

iField: the index of the field
nbBin: the number of bins
iCatField: the index of the categorical field
valCatField: the falue of the categorical field

Output and side effect(s):

Return a CapyArrSize of size 'nbBin'.

Get the number of rows containing a particular value for a given field.

Input argument(s):

iField: the filtering field
valField: the filtering value

Output and side effect(s):

Return the number of rows.

Get the pair of values from two fields as two vectors.

Input argument(s):

iField: the first field
jField: the second field
u: the vector receiving values from the first field
v: the vector receiving values from the second field

Output and side effect(s):

'u' and 'v' are destructed, created afresh and populated with values. Rows with null value in one or the other field are ignored.

Check if a value is a null/unknown value

Input argument(s):

val: the value to check

Output and side effect(s):

Return true if the value is considered to be null/unknown. A null/unknown value is the empty string or a string equal to "nan" (case insensitive).

Convert the dataset into a point cloud

Output and side effect(s):

Return a CapyPointCloud of dimension equal to the number of fields and number of point equal to the number of sample. All values are converted to numerical values.

Functions:

Create a CapyDatasetRow.

Output and side effect(s):

Return a CapyDatasetRow.

Allocate memory for a new CapyDatasetRow and create it.

Output and side effect(s):

Return a CapyDatasetRow.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDatasetRow* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDatasetRow to free

Create a CapyDatasetFieldDesc.

Output and side effect(s):

Return a CapyDatasetFieldDesc.

Allocate memory for a new CapyDatasetFieldDesc and create it.

Output and side effect(s):

Return a CapyDatasetFieldDesc.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDatasetFieldDesc* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDatasetFieldDesc to free

Create a CapyDataset.

Output and side effect(s):

Return a CapyDataset.

Allocate memory for a new CapyDataset and create it.

Output and side effect(s):

Return a CapyDataset.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDataset* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDataset to free.

2022-07-11
in LibCapy,
49 views
Copyright 2021-2025 Baillehache Pascal