Skip to content

[Bug?] Doc fields with "image" not saved to jsonl file when using log_samples=True #712

Open
@YongchengYAO

Description

@YongchengYAO

When setting log_samples=True, the doc, target, model output, and other info are saved to a jsonl file.

With this line:

if "image" not in key:

in the evaluator.py, any fields with "image" in the doc key are filtered out. Any reason for that?

Code block:

if log_samples:
      target = task.doc_to_target(doc)
      saved_doc = {}
      for key, value in doc.items():
          # If image is not in key
          if "image" not in key:
              # If audio is also not the value
              if isinstance(value, dict) and "array" in value:
                  continue
              else:
                  saved_doc[key] = value
      filtered_arguments = []
      for req in requests:
          # check if req.args is a list of tuples, and each item in the list is a serializable object
          for value in req.args:
              if isinstance(value, (str, int, float, bool, list, dict, type(None))):
                  filtered_arguments.append(value)
              # else:
              #     filtered_arguments.append(_handle_non_serializable(value))

      example = {
          "doc_id": doc_id,
          "doc": saved_doc,
          "target": target,
          "arguments": filtered_arguments,
          "resps": [req.resps for req in requests],
          "filtered_resps": [req.filtered_resps[filter_key] for req in requests],
          "doc_hash": hash_string(
              json.dumps(
                  requests[0].doc,
                  indent=2,
                  default=handle_non_serializable,
                  ensure_ascii=False,
              )
          ),
          "prompt_hash": hash_string(requests[0].arguments[0]),
          "target_hash": hash_string(str(target)),
      }
      example.update(metrics)
      task_output.logged_samples.append(example)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions