Dataset¶
Depending on the type of task, you can implement a Dataset that applies to a specific task or reads a specific data format.
Dataset is mainly divided into two parts: load data and output data. Dataset should inherit from
the parent class SATTaskAbstractDataset.
Load Data¶
In the data loading module you can implement a function _load_dataset to read the label information and construct a graph of a
specific type. The function to construct the graph can be found in Raw Input.
An example can be found in the following code:
# MaxSATDataset Dataset
class MaxSATDataset(SATTaskAbstractDataset):
def __init__(self, config, cnf_dir, label_path):
super().__init__(config, cnf_dir, label_path, name="maxsat_dataset")
self._load_dataset()
def _load_dataset(self):
label_df = pd.read_csv(self.label_path, sep=',')
self.data_list = []
for idx, row in tqdm(label_df.iterrows(), total=label_df.shape[0]):
name = row['name']
label = eval(row['maxsat'])
cnf_path = os.path.join(self.cnf_dir, name)
num_variable, num_clause, clause_list = parse_cnf_file(cnf_path)
cnf_graph = self._build_graph(num_variable, num_clause, clause_list, cnf_path)
info = self._get_info(num_variable, num_clause, clause_list)
self.data_list.append({"g": cnf_graph, "label": label, "info": info})
Output Data¶
Since Dataset inherits DGLDataset, two functions, __getitem__ and __len__, need to be implemented.
An example can be found in the following code:
# MaxSATDataset Dataset
def __getitem__(self, idx):
return self.data_list[idx]
def __len__(self):
return len(self.data_list)