Document “Search Engine” in Excel











up vote
3
down vote

favorite












I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.



What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.



Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.



After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.



Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long


OptimizeVBA True 'Makes processing this a lot faster

searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With

With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters

'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section

names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.


'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox

If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If

Next
Next

For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE

Next x


End With

With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With


End Sub

Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks

With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False

Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")

For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next

.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"

End With

OptimizeVBA False

End Sub

Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub

Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))

For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function


Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..



My difficulties when I started this out:




  • Singular/Plural terms: That's why there's an "Or" statement that
    inverts the variables in the "Like" statement as a resolution.

  • 2-3 letter words being found inside bigger words: As we're using
    acronyms or shortcuts, there are times that there are results that
    are being found that's not really related to what's supposed to show.
    So an additional If statements were added specifically for the short
    words.

  • Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.

  • Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.










share|improve this question









New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • @Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
    – Basher
    2 days ago










  • Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
    – Comintern
    2 days ago










  • Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
    – Basher
    2 days ago










  • I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
    – Basher
    2 days ago










  • Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
    – TinMan
    2 days ago















up vote
3
down vote

favorite












I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.



What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.



Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.



After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.



Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long


OptimizeVBA True 'Makes processing this a lot faster

searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With

With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters

'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section

names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.


'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox

If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If

Next
Next

For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE

Next x


End With

With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With


End Sub

Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks

With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False

Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")

For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next

.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"

End With

OptimizeVBA False

End Sub

Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub

Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))

For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function


Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..



My difficulties when I started this out:




  • Singular/Plural terms: That's why there's an "Or" statement that
    inverts the variables in the "Like" statement as a resolution.

  • 2-3 letter words being found inside bigger words: As we're using
    acronyms or shortcuts, there are times that there are results that
    are being found that's not really related to what's supposed to show.
    So an additional If statements were added specifically for the short
    words.

  • Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.

  • Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.










share|improve this question









New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • @Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
    – Basher
    2 days ago










  • Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
    – Comintern
    2 days ago










  • Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
    – Basher
    2 days ago










  • I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
    – Basher
    2 days ago










  • Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
    – TinMan
    2 days ago













up vote
3
down vote

favorite









up vote
3
down vote

favorite











I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.



What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.



Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.



After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.



Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long


OptimizeVBA True 'Makes processing this a lot faster

searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With

With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters

'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section

names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.


'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox

If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If

Next
Next

For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE

Next x


End With

With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With


End Sub

Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks

With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False

Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")

For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next

.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"

End With

OptimizeVBA False

End Sub

Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub

Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))

For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function


Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..



My difficulties when I started this out:




  • Singular/Plural terms: That's why there's an "Or" statement that
    inverts the variables in the "Like" statement as a resolution.

  • 2-3 letter words being found inside bigger words: As we're using
    acronyms or shortcuts, there are times that there are results that
    are being found that's not really related to what's supposed to show.
    So an additional If statements were added specifically for the short
    words.

  • Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.

  • Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.










share|improve this question









New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.



What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.



Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.



After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.



Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long


OptimizeVBA True 'Makes processing this a lot faster

searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With

With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters

'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section

names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.


'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox

If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If

Next
Next

For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE

Next x


End With

With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With


End Sub

Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks

With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False

Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")

For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next

.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"

End With

OptimizeVBA False

End Sub

Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub

Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))

For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function


Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..



My difficulties when I started this out:




  • Singular/Plural terms: That's why there's an "Or" statement that
    inverts the variables in the "Like" statement as a resolution.

  • 2-3 letter words being found inside bigger words: As we're using
    acronyms or shortcuts, there are times that there are results that
    are being found that's not really related to what's supposed to show.
    So an additional If statements were added specifically for the short
    words.

  • Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.

  • Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.







vba excel






share|improve this question









New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago





















New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Basher

162




162




New contributor




Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Basher is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • @Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
    – Basher
    2 days ago










  • Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
    – Comintern
    2 days ago










  • Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
    – Basher
    2 days ago










  • I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
    – Basher
    2 days ago










  • Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
    – TinMan
    2 days ago


















  • @Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
    – Basher
    2 days ago










  • Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
    – Comintern
    2 days ago










  • Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
    – Basher
    2 days ago










  • I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
    – Basher
    2 days ago










  • Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
    – TinMan
    2 days ago
















@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago




@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago












Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago




Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago












Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago




Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago












I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago




I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago












Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago




Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago















active

oldest

votes











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Basher is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208661%2fdocument-search-engine-in-excel%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








Basher is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Basher is a new contributor. Be nice, and check out our Code of Conduct.













Basher is a new contributor. Be nice, and check out our Code of Conduct.












Basher is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208661%2fdocument-search-engine-in-excel%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei