Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataSink: Add possibility to replace regex match with another regex match in #3586

Open
JohannesWiesner opened this issue Jul 5, 2023 · 1 comment

Comments

@JohannesWiesner
Copy link

Summary

I have a feature request: It would be nice if it was possible to replace one regex-match with another one from the same input string for the the regexp_substitutions argument of nipype.interfaces.io.Datasink. Then one wouldn't have to define one substitution for every possible case. E.g. when the parameterization puts out something like this:

_subject_100206_task_WM
_subject_100206_task_EMOTION
_subject_100206_task_FACES

and I would like to have the folders like this:

WM
EMOTION
FACES

Then I currently have to do this:

datasink.inputs.regexp_substitutions = [('subject_[a-zA-Z0-9]+',''),
                                         ('_task_EMOTION','EMOTION'),
                                         ('_task_WM','WM'),
                                         ('_task_FACES','FACES')
                                        ]

It would be nice if one could use regex groups to something like (pseudo-code!):

datasink.inputs.regexp_substitutions = [('subject_[a-zA-Z0-9]+',''),
                                         ('_task_\w+','_task_(\w+)'),
                                        ]
@effigies
Copy link
Member

effigies commented Jul 5, 2023

It looks like we've got existing tests (

def test_datasink_substitutions(tmpdir):
indir = tmpdir.mkdir("-Tmp-nipype_ds_subs_in")
outdir = tmpdir.mkdir("-Tmp-nipype_ds_subs_out")
files = []
for n in ["ababab.n", "xabababyz.n"]:
f = str(indir.join(n))
files.append(f)
open(f, "w")
ds = nio.DataSink(
parameterization=False,
base_directory=str(outdir),
substitutions=[("ababab", "ABABAB")],
# end archoring ($) is used to assure operation on the filename
# instead of possible temporary directories names matches
# Patterns should be more comprehendable in the real-world usage
# cases since paths would be quite more sensible
regexp_substitutions=[
(r"xABABAB(\w*)\.n$", r"a-\1-b.n"),
("(.*%s)[-a]([^%s]*)$" % ((os.path.sep,) * 2), r"\1!\2"),
],
)
setattr(ds.inputs, "@outdir", files)
ds.run()
assert sorted(
[os.path.basename(x) for x in glob.glob(os.path.join(str(outdir), "*"))]
) == [
"!-yz-b.n",
"ABABAB.n",
] # so we got re used 2nd and both patterns
), so I would say if you can come up with a patch that implements this that doesn't break those expectations (or those expectations are incidental), that would be fine.

It might be easier just to write your own DataSink that does what you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants