Regular expression template using lookahead assertions in Python











up vote
-1
down vote

favorite












I built the following regular expression pattern using grouping and named groups after Landsat's new product identifier pattern. It looks like this (see also link to pythex.org):



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20dd)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20dd)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


Given a directory that contains, say, the following file(names):



LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20173328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_ANG.txt
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B1.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B2.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B3.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B6.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B7.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B8.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B9.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_MTL.txt
LC08_L1TP_184033_20181128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF


and a slightly modified template, i.e. the following:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20\d\d)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20\d\d)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B{BAND}).TIF


the following Python function:



def retrieve_selected_bands(bands, scene):
"""
Retrieve user requested bands from a Landsat scene

Parameters
----------
bands :
User requested bands

scene :
Landsat scene directory

Returns
-------
Returns list of filenames of user requested bands

Example
-------
...
"""
requested_bands =
for band in bands:
for filename in os.listdir(scene):
template = regular_expression_variable.format(BAND=band)
pattern = re.compile(template)
if pattern.match(filename):
requested_bands.append(glob.glob(filename)[0])
print
print('n'.join(map(str, requested_bands)))


will retrieve successfully what is asked for, i.e.:



retrieve_selected_bands(bands, '.')

LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF


I want to understand if and how the regular expression template can be improved, i.e. become shorter and thus more readable, by using for example lookahead assertions, like:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<year>19|20dd)(?P<month>0[1-9]|1[012])(?P<day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P=year)(?P=month)(?P=day)(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


However, the latter version of the template fails to capture all "valid" strings. If I am not wrong, it fails on the (?=day) part.



How can the first template be improved? I.e., become shorter and still include all date patterns of valid Landsat product identifier strings?










share|improve this question









New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
    – Graipher
    16 hours ago










  • @Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
    – Nikos Alexandris
    10 hours ago












  • If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
    – Graipher
    10 hours ago










  • @Graipher I tried to improve the question. Is it valid for Code Review now?
    – Nikos Alexandris
    9 hours ago















up vote
-1
down vote

favorite












I built the following regular expression pattern using grouping and named groups after Landsat's new product identifier pattern. It looks like this (see also link to pythex.org):



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20dd)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20dd)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


Given a directory that contains, say, the following file(names):



LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20173328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_ANG.txt
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B1.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B2.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B3.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B6.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B7.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B8.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B9.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_MTL.txt
LC08_L1TP_184033_20181128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF


and a slightly modified template, i.e. the following:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20\d\d)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20\d\d)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B{BAND}).TIF


the following Python function:



def retrieve_selected_bands(bands, scene):
"""
Retrieve user requested bands from a Landsat scene

Parameters
----------
bands :
User requested bands

scene :
Landsat scene directory

Returns
-------
Returns list of filenames of user requested bands

Example
-------
...
"""
requested_bands =
for band in bands:
for filename in os.listdir(scene):
template = regular_expression_variable.format(BAND=band)
pattern = re.compile(template)
if pattern.match(filename):
requested_bands.append(glob.glob(filename)[0])
print
print('n'.join(map(str, requested_bands)))


will retrieve successfully what is asked for, i.e.:



retrieve_selected_bands(bands, '.')

LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF


I want to understand if and how the regular expression template can be improved, i.e. become shorter and thus more readable, by using for example lookahead assertions, like:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<year>19|20dd)(?P<month>0[1-9]|1[012])(?P<day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P=year)(?P=month)(?P=day)(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


However, the latter version of the template fails to capture all "valid" strings. If I am not wrong, it fails on the (?=day) part.



How can the first template be improved? I.e., become shorter and still include all date patterns of valid Landsat product identifier strings?










share|improve this question









New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
    – Graipher
    16 hours ago










  • @Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
    – Nikos Alexandris
    10 hours ago












  • If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
    – Graipher
    10 hours ago










  • @Graipher I tried to improve the question. Is it valid for Code Review now?
    – Nikos Alexandris
    9 hours ago













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I built the following regular expression pattern using grouping and named groups after Landsat's new product identifier pattern. It looks like this (see also link to pythex.org):



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20dd)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20dd)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


Given a directory that contains, say, the following file(names):



LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20173328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_ANG.txt
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B1.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B2.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B3.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B6.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B7.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B8.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B9.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_MTL.txt
LC08_L1TP_184033_20181128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF


and a slightly modified template, i.e. the following:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20\d\d)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20\d\d)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B{BAND}).TIF


the following Python function:



def retrieve_selected_bands(bands, scene):
"""
Retrieve user requested bands from a Landsat scene

Parameters
----------
bands :
User requested bands

scene :
Landsat scene directory

Returns
-------
Returns list of filenames of user requested bands

Example
-------
...
"""
requested_bands =
for band in bands:
for filename in os.listdir(scene):
template = regular_expression_variable.format(BAND=band)
pattern = re.compile(template)
if pattern.match(filename):
requested_bands.append(glob.glob(filename)[0])
print
print('n'.join(map(str, requested_bands)))


will retrieve successfully what is asked for, i.e.:



retrieve_selected_bands(bands, '.')

LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF


I want to understand if and how the regular expression template can be improved, i.e. become shorter and thus more readable, by using for example lookahead assertions, like:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<year>19|20dd)(?P<month>0[1-9]|1[012])(?P<day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P=year)(?P=month)(?P=day)(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


However, the latter version of the template fails to capture all "valid" strings. If I am not wrong, it fails on the (?=day) part.



How can the first template be improved? I.e., become shorter and still include all date patterns of valid Landsat product identifier strings?










share|improve this question









New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I built the following regular expression pattern using grouping and named groups after Landsat's new product identifier pattern. It looks like this (see also link to pythex.org):



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20dd)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20dd)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


Given a directory that contains, say, the following file(names):



LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20173328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_ANG.txt
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B1.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B2.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B3.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B6.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B7.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B8.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B9.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_MTL.txt
LC08_L1TP_184033_20181128_20181027_01_RT_B1.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF


and a slightly modified template, i.e. the following:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(?P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<acquisition_year>19|20\d\d)(?P<acquisition_month>0[1-9]|1[012])(?P<acquisition_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<processing_year>19|20\d\d)(?P<processing_month>0[1-9]|1[012])(?P<processing_day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B{BAND}).TIF


the following Python function:



def retrieve_selected_bands(bands, scene):
"""
Retrieve user requested bands from a Landsat scene

Parameters
----------
bands :
User requested bands

scene :
Landsat scene directory

Returns
-------
Returns list of filenames of user requested bands

Example
-------
...
"""
requested_bands =
for band in bands:
for filename in os.listdir(scene):
template = regular_expression_variable.format(BAND=band)
pattern = re.compile(template)
if pattern.match(filename):
requested_bands.append(glob.glob(filename)[0])
print
print('n'.join(map(str, requested_bands)))


will retrieve successfully what is asked for, i.e.:



retrieve_selected_bands(bands, '.')

LC08_L1TP_184033_20181028_20181028_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20170328_20161027_01_RT_B4.TIF
LC08_L1TP_184033_20171128_20181027_01_RT_B4.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B5.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B10.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_B11.TIF
LC08_L1TP_184033_20181128_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181027_01_RT_BQA.TIF
LC08_L1TP_184033_20181028_20181028_01_RT_BQA.TIF


I want to understand if and how the regular expression template can be improved, i.e. become shorter and thus more readable, by using for example lookahead assertions, like:



(?P<prefix>L)(?P<sensor>[C|O|T|E|M])(?P<satellite>0[14578])(P<delimiter>_)(?P<processing_correction_level>(L1(?:TP|GT|GS)))(?P=delimiter)(?P<path>[012][0-9][0-9])(?P<row>[01][0-9][0-9]|2[0-4][0-3])(?P=delimiter)(?P<year>19|20dd)(?P<month>0[1-9]|1[012])(?P<day>0[1-9]|[12][0-9]|3[01])(?P=delimiter)(?P=year)(?P=month)(?P=day)(?P=delimiter)(?P<collection>0[12])(?P=delimiter)(?P<category>RT|T[1|2])(?P=delimiter)(?P<band>B[0-9Q][01A]?).TIF


However, the latter version of the template fails to capture all "valid" strings. If I am not wrong, it fails on the (?=day) part.



How can the first template be improved? I.e., become shorter and still include all date patterns of valid Landsat product identifier strings?







python regex template






share|improve this question









New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 9 hours ago





















New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 17 hours ago









Nikos Alexandris

1023




1023




New contributor




Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Nikos Alexandris is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
    – Graipher
    16 hours ago










  • @Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
    – Nikos Alexandris
    10 hours ago












  • If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
    – Graipher
    10 hours ago










  • @Graipher I tried to improve the question. Is it valid for Code Review now?
    – Nikos Alexandris
    9 hours ago


















  • Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
    – Graipher
    16 hours ago










  • @Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
    – Nikos Alexandris
    10 hours ago












  • If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
    – Graipher
    10 hours ago










  • @Graipher I tried to improve the question. Is it valid for Code Review now?
    – Nikos Alexandris
    9 hours ago
















Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
– Graipher
16 hours ago




Hey, welcome to Code Review! This question does not match what this site is about. Code Review is about improving existing, working code. Code Review is not the site to ask for help in fixing or changing what your code does. Once the code does what you want, we would love to help you do the same thing in a cleaner way! Please see our help center for more information.
– Graipher
16 hours ago












@Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
– Nikos Alexandris
10 hours ago






@Graipher, if I copy-paste the python code that uses the first pattern, which works fine, and ask again, the same question (which is: can I make this template shorter?--so as to save space, make it perhaps more readable, so improving), will you then accept this question as a valid one for Code Review?
– Nikos Alexandris
10 hours ago














If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
– Graipher
10 hours ago




If you only use the first one and show valid Python code that uses it (just wrap it in a function) and preferably add those test cases you mention, it would probably be on-topic.
– Graipher
10 hours ago












@Graipher I tried to improve the question. Is it valid for Code Review now?
– Nikos Alexandris
9 hours ago




@Graipher I tried to improve the question. Is it valid for Code Review now?
– Nikos Alexandris
9 hours ago















active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Nikos Alexandris is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209310%2fregular-expression-template-using-lookahead-assertions-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








Nikos Alexandris is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Nikos Alexandris is a new contributor. Be nice, and check out our Code of Conduct.













Nikos Alexandris is a new contributor. Be nice, and check out our Code of Conduct.












Nikos Alexandris is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209310%2fregular-expression-template-using-lookahead-assertions-in-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei