Problem:
What is the purpose for Entity attribute property: "skipInDataAccess": false?
Solution:
Background
Sometimes data exported on S3 can not be accessed in DataAccess due to Spark's limitation on the number of attributes in the Dataframe. The excessive number of attributes usually comes from Reference attributes and Nested attributes.
To resolve the attribute limitation problem in DataAccess, It is wise to skip attributes that are not needed in RI, and export only required attributes from the MDM platform to RI. This not only solves the problem with the attribute limit, but the exported Dataframes will also be efficient with only required attributes and occupies less storage on S3. Any attribute(Simple, Reference, Nested) can be skipped in RI by providing an optional property "skipInDataAccess" in tenant L3 configuration. An attribute whose "skipInDataAccess" is true is not going to be in parquet schema as well as in Dataframe.
Existing L3 configuration can be modified by adding "skipInDataAccess" to attributes, which are not required in RI. Updating L3 configuration and Re-exporting data (if already exported before) will result in configured attributes skipped in RI.
"skipInDataAccess" can be set to Simple, Nested or Reference attribute for any entity type. This property is not applicable to analytics attributes.
Example:
{
"label": "Commenters",
"name": "Commenters",
"description": "Commenters",
"type": "String",
"hidden": true,
"important": false,
"system": false,
"attributeOrdering": {
"orderType": "ASC",
"orderingStrategy": "LUD"
},
"uri": "configuration/entityTypes/Organization/attributes/Commenters",
"skipInDataAccess": false
},
Comments
Please sign in to leave a comment.