Dataset class.
Macros:
Enumerations:
Type of field. datetime1: datetime dd-mm-yyyy hh:ii datetime2: datetime hh:ii
Type of column.
Typedefs:
None.
Struct CapyDatasetRow :
Struct CapyDatasetRow's properties:
Original string of the row with commas replaced with \0.
Pointers to each field in the row.
Index of the row in the dataset, starting at 0 not counting the header lines.
Flag to memorise if the row contains null/unknown values
Number of fields with a null/unknown value in that row.
Struct CapyDatasetRow's methods:
Destructor
Struct CapyDatasetFieldDesc :
Struct CapyDatasetFieldDesc's properties:
Pointer to the field label.
Field type.
Field interface.
Index in the row.
For fields of categorical types, number of value in the category, for field of numerical types, number of row in the dataset
For fields of categorical types, array of pointer to the category's values
Range of values (converted to numerical if the field not numerical)
Flag to memorise if the field contains null/unknown values
Number of rows with a null/unknown value in that field.
Struct CapyDatasetFieldDesc's methods:
Destructor
Struct CapyDataset :
Struct CapyDataset's properties:
Number of rows.
Number of rows with at least one null/unknown value.
Number of fields in each row.
Number of fields with at least one null/unknown value.
Fields interface row with comma replaced with \0.
Fields type row with comma replaced with \0.
Fields label row with comma replaced with \0.
Fields description.
Array of rows.
Number of threads for multithreaded operation (default: 10)
Struct CapyDataset's methods:
Destructor
Load the dataset from a file at a given path
Input argument(s):
path: path to the dataset file
Exception(s):
May raise CapyExc_MallocFailed, CapyExc_StreamReadError, CapyExc_InvalidStream.
Print the dataset description of the dataset on a given stream.
Input argument(s):
stream: the stream to print onto
Print the dataset data of the dataset on a given stream.
Input argument(s):
stream: the stream to print onto
nbRow: if not 0 print the first nbRow rows only
Get the number of input fields
Output and side effect(s):
Return the number of input fields
Get the number of output fields
Output and side effect(s):
Return the number of output fields
Get the number of fields of a given type
Input argument(s):
type: the type of field to be counted
Output and side effect(s):
Return the number of fields
Get the field index of the i-th input
Input argument(s):
iInput: index of the input
Output and side effect(s):
Return the index
Exception(s):
May raise CapyExc_InvalidElemIdx.
Get the field index of the i-th output
Input argument(s):
iOutput: index of the output
Output and side effect(s):
Return the index
Exception(s):
May raise CapyExc_InvalidElemIdx.
Get a value as a numeral. Inputs: iRow: index of the row iField: index of the field
Output and side effect(s):
For numeral fields return the value as it is, for categorical fields return the index of the value in the list of possible values (fieldDesc.categoryVals)
Exception(s):
May raise CapyExc_UndefinedExecution, , CapyExc_InvalidParameters.
Get a value as a normalised numeral. Inputs: iRow: index of the row iField: index of the field
Output and side effect(s):
Return the value, converted to numerical if the field is categorical, after normalisation according to the 'range' property of the field description.
Exception(s):
May raise CapyExc_UndefinedExecution.
Convert a dataset to a matrix to be used by a single category predictor
Input argument(s):
iOutput: index of the output
Output and side effect(s):
Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus one. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The output value is assigned to the last column in the matrix, and equal to the category index.
Exception(s):
May raise CapyExc_UnsupportedFormat.
Convert a dataset to a matrix to be used by a numerical predictor
Input argument(s):
iOutput: index of the output
Output and side effect(s):
Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus one. The output must be of type capyDatasetFieldType_num. Input and output values are converted using getValAsNum. The output value is assigned to the last column in the matrix.
Exception(s):
May raise CapyExc_UnsupportedFormat.
Get the number of different values for a given output Inputs: iField: index of the output
Output and side effect(s):
Return the number of different values for a categorical output field or the number of rows for a numerical output field
Convert a dataset to a matrix to be used by a one hot predictor
Input argument(s):
iOutput: index of the output
Output and side effect(s):
Return a matrix with as many rows as there are rows in the dataset, and as many columns as there are inputs in the dataset plus as many values the given output takes. The output must be of type capyDatasetFieldType_cat. Input values are converted using getValAsNum. The one hot encoding of the output value is assigned to the last columns in the matrix, and take values 0 for 'is not this category' and 1 for 'is this category'.
Exception(s):
May raise CapyExc_UnsupportedFormat.
Get the index of a field from its name
Input argument(s):
name: the name
Output and side effect(s):
Return the index of the field, or raise the exception CapyExc_InvalidParameters if it couldn't be found.
Get the distribution of a field values as an array of bins.
Input argument(s):
iField: the index of the field
nbBin: the number of bins
Output and side effect(s):
Return a CapyArrSize of size 'nbBin'.
Get the distribution of a field values as an array of bins for a given value of a given categorical field
Input argument(s):
iField: the index of the field
nbBin: the number of bins
iCatField: the index of the categorical field
valCatField: the falue of the categorical field
Output and side effect(s):
Return a CapyArrSize of size 'nbBin'.
Get the number of rows containing a particular value for a given field.
Input argument(s):
iField: the filtering field
valField: the filtering value
Output and side effect(s):
Return the number of rows.
Get the pair of values from two fields as two vectors.
Input argument(s):
iField: the first field
jField: the second field
u: the vector receiving values from the first field
v: the vector receiving values from the second field
Output and side effect(s):
'u' and 'v' are destructed, created afresh and populated with values. Rows with null value in one or the other field are ignored.
Check if a value is a null/unknown value
Input argument(s):
val: the value to check
Output and side effect(s):
Return true if the value is considered to be null/unknown. A null/unknown value is the empty string or a string equal to "nan" (case insensitive).
Convert the dataset into a point cloud
Output and side effect(s):
Return a CapyPointCloud of dimension equal to the number of fields and number of point equal to the number of sample. All values are converted to numerical values.
Functions:
Create a CapyDatasetRow.
Output and side effect(s):
Return a CapyDatasetRow.
Allocate memory for a new CapyDatasetRow and create it.
Output and side effect(s):
Return a CapyDatasetRow.
Exception(s):
May raise CapyExc_MallocFailed.
Free the memory used by a CapyDatasetRow* and reset '*that' to NULL.
Input argument(s):
that: a pointer to the CapyDatasetRow to free
Create a CapyDatasetFieldDesc.
Output and side effect(s):
Return a CapyDatasetFieldDesc.
Allocate memory for a new CapyDatasetFieldDesc and create it.
Output and side effect(s):
Return a CapyDatasetFieldDesc.
Exception(s):
May raise CapyExc_MallocFailed.
Free the memory used by a CapyDatasetFieldDesc* and reset '*that' to NULL.
Input argument(s):
that: a pointer to the CapyDatasetFieldDesc to free
Create a CapyDataset.
Output and side effect(s):
Return a CapyDataset.
Allocate memory for a new CapyDataset and create it.
Output and side effect(s):
Return a CapyDataset.
Exception(s):
May raise CapyExc_MallocFailed.
Free the memory used by a CapyDataset* and reset '*that' to NULL.
Input argument(s):
that: a pointer to the CapyDataset to free.