LibCapy - dataset

Dataset class.

Macros:

Enumerations:

Type of field.

Type of column.

Typedefs:

Type for the number of fields in a row.

Type for the number of rows in a dataset.

Type for the number of different values in a field.

Struct CapyDatasetRow :

Struct CapyDatasetRow's properties:

Original string of the row with commas replaced with \0.

Pointers to each field in the row.

Index of the row in the dataset, starting at 0 not counting the header lines.

Struct CapyDatasetRow's methods:

Destructor

Struct CapyDatasetFieldDesc :

Struct CapyDatasetFieldDesc's properties:

Pointer to the field label.

Field type.

Field interface.

Index in the row.

For fields of categorical types, number of value in the category, for field of numerical types, number of row in the dataset

For fields of categorical types, array of pointer to the category's values

Range of values (converted to numerical if the field is categorical)

Struct CapyDatasetFieldDesc's methods:

Destructor

Struct CapyDataset :

Struct CapyDataset's properties:

Number of rows.

Number of fields in each row.

Fields interface row with comma replaced with \0.

Fields type row with comma replaced with \0.

Fields label row with comma replaced with \0.

Fields description.

Array of rows.

Number of threads for multithreaded operation (default: 10)

Struct CapyDataset's methods:

Destructor

Load the dataset from a file at a given path

Input argument(s):

path: path to the dataset file

Exception(s):

May raise CapyExc_MallocFailed, CapyExc_StreamReadError, CapyExc_InvalidStream.

Print the dataset description of the dataset on a given stream.

Input argument(s):

stream: the stream to print onto

Get the number of input fields

Output and side effect(s):

Return the number of input fields

Get the number of output fields

Output and side effect(s):

Return the number of output fields

Get the field index of the i-th input

Input argument(s):

iInput: index of the input

Output and side effect(s):

Return the index

Exception(s):

May raise CapyExc_InvalidElemIdx.

Get the field index of the i-th output

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return the index

Exception(s):

May raise CapyExc_InvalidElemIdx.

Get a value as a numeral. Inputs: iRow: index of the row iField: index of the field

Output and side effect(s):

For numeral fields return the value as it is, for categorical fields return the index of the value in the list of possible values (fieldDesc.categoryVals)

Exception(s):

May raise CapyExc_UndefinedExecution.

Get a value as a normalised numeral. Inputs: iRow: index of the row iField: index of the field

Output and side effect(s):

Return the value, converted to numerical if the field is categorical, after normalisation according to the 'range' property of the field description.

Exception(s):

May raise CapyExc_UndefinedExecution.

Convert a dataset to a matrix to be used by a single category classifier

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus one. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The output value is assigned to the last column in the matrix, and equal to the category index.

Exception(s):

May raise CapyExc_UnsupportedFormat.

Get the number of different values for a given output Inputs: iField: index of the output

Output and side effect(s):

Return the number of different values for a categorical output field or the number of rows for a numerical output field

Convert a dataset to a matrix to be used by a one hot classifier

Input argument(s):

iOutput: index of the output

Output and side effect(s):

Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus as many values the given output takes. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The one hot encoding of the output value is assigned to the last columns in the matrix, and take values 0 for 'is not this category' and 1 for 'is this category'.

Exception(s):

May raise CapyExc_UnsupportedFormat.

Functions:

Create a CapyDatasetRow.

Output and side effect(s):

Return a CapyDatasetRow.

Allocate memory for a new CapyDatasetRow and create it.

Output and side effect(s):

Return a CapyDatasetRow.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDatasetRow* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDatasetRow to free

Create a CapyDatasetFieldDesc.

Output and side effect(s):

Return a CapyDatasetFieldDesc.

Allocate memory for a new CapyDatasetFieldDesc and create it.

Output and side effect(s):

Return a CapyDatasetFieldDesc.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDatasetFieldDesc* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDatasetFieldDesc to free

Create a CapyDataset.

Output and side effect(s):

Return a CapyDataset.

Allocate memory for a new CapyDataset and create it.

Output and side effect(s):

Return a CapyDataset.

Exception(s):

May raise CapyExc_MallocFailed.

Free the memory used by a CapyDataset* and reset '*that' to NULL.

Input argument(s):

that: a pointer to the CapyDataset to free.

2022-07-11
in LibCapy,
22 views
Copyright 2021-2023 Baillehache Pascal