Document “Search Engine” in Excel
up vote
3
down vote
favorite
I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.
What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.
Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.
After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.
Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long
OptimizeVBA True 'Makes processing this a lot faster
searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With
With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters
'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section
names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.
'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If
Next
Next
For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE
Next x
End With
With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With
End Sub
Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks
With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False
Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")
For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next
.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"
End With
OptimizeVBA False
End Sub
Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub
Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))
For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function
Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..
My difficulties when I started this out:
- Singular/Plural terms: That's why there's an "Or" statement that
inverts the variables in the "Like" statement as a resolution. - 2-3 letter words being found inside bigger words: As we're using
acronyms or shortcuts, there are times that there are results that
are being found that's not really related to what's supposed to show.
So an additional If statements were added specifically for the short
words. - Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.
- Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.
vba excel
New contributor
add a comment |
up vote
3
down vote
favorite
I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.
What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.
Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.
After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.
Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long
OptimizeVBA True 'Makes processing this a lot faster
searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With
With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters
'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section
names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.
'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If
Next
Next
For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE
Next x
End With
With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With
End Sub
Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks
With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False
Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")
For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next
.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"
End With
OptimizeVBA False
End Sub
Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub
Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))
For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function
Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..
My difficulties when I started this out:
- Singular/Plural terms: That's why there's an "Or" statement that
inverts the variables in the "Like" statement as a resolution. - 2-3 letter words being found inside bigger words: As we're using
acronyms or shortcuts, there are times that there are results that
are being found that's not really related to what's supposed to show.
So an additional If statements were added specifically for the short
words. - Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.
- Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.
vba excel
New contributor
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.
What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.
Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.
After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.
Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long
OptimizeVBA True 'Makes processing this a lot faster
searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With
With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters
'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section
names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.
'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If
Next
Next
For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE
Next x
End With
With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With
End Sub
Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks
With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False
Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")
For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next
.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"
End With
OptimizeVBA False
End Sub
Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub
Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))
For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function
Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..
My difficulties when I started this out:
- Singular/Plural terms: That's why there's an "Or" statement that
inverts the variables in the "Like" statement as a resolution. - 2-3 letter words being found inside bigger words: As we're using
acronyms or shortcuts, there are times that there are results that
are being found that's not really related to what's supposed to show.
So an additional If statements were added specifically for the short
words. - Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.
- Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.
vba excel
New contributor
I've created a somewhat "Brute-force search engine" based on a document's filename, as well as an additional keyword that describes the said document. I had to make one as my company's resources are a mess and not fully structured, so it's tough to search for a document using just the regular Windows search tool.
What I first did is to simply extract the filepath of all documents using CMD and DIR, and output it to a csv file which I processed.
Then, using an excel formula, extract the filename from the path and remove the filetype as well. I've added an extra column for the keywords portion, which is split via commas.
Once a search is initiated, the search term is broken down to each individual words via spaces. This goes the same for the filename, and keywords. After that, a simple for loop just iterates each word and see if's a match. If it is, a counter is added. Once done, the data is copied onto a temporary sheet.
After checking all available filepaths and moving the matched result on the temporary sheet, I sort them based on the counter, so that the highest match goes on top. Then, I copy the results (including the path) to the main sheet with the searchbox, display the results, and add a hyperlink so that it can be clicked.
Sub Searchresult()
Dim x As Range, y As Long, count As Long, i As Integer, j As Integer, k As Integer, l As Integer
Dim names() As Variant, namesdup() As Variant
Dim search() As String, keyword() As String, namesraw() As String, searchval As String
Dim result As String
Dim tbl As ListObject, sortcol As Range, lrow As Long, lrow2 As Long
OptimizeVBA True 'Makes processing this a lot faster
searchval = Worksheets("Sheet1").Range("E8").Value 'Gets the searchbox text
With Worksheets("Sheet3") 'Prep for placing results in table.
Set tbl = .ListObjects("tblSearch")
Set sortcol = .Range("tblSearch[sort]")
tbl.DataBodyRange.ClearContents
End With
With Worksheets("Sheet2")
search = Split(Trim(searchval), " ") 'split search terms via spaces
lrow2 = .Cells(Rows.count, 1).End(xlUp).Row
For Each x In Range("A2:A" & lrow2) 'Iterate all values in Sheet2
count = 0
lrow = Worksheets("Sheet3").Cells(Rows.count, 1).End(xlUp).Row + 1 'get the last row in Sheet2
keyword() = Split(.Range("d" & x.Row), ",") ' split keywords via comma
namesraw() = Split(Replace(Replace(Replace(Replace(Replace(.Range("c" & x.Row), "-", " "), "(", ""), ")", ""), "'", ""), "_", " "), " ") 'splits names via spaces, deleting any unwanted characters
'This section converts the String array from above to a Variant array
ReDim namesdup(LBound(namesraw) To UBound(namesraw))
Dim index As Long
For index = LBound(namesraw) To UBound(namesraw)
namesdup(index) = namesraw(index)
Next index
'end section
names() = RemoveDupesColl(namesdup())
'We need to remove duplicates from the name search, as it affects accuracy.
'For example, if you search for something that has the word "loc", the filename that repeats this word multiple times will get top results.
'//SEARCH FUNCTION STARTS HERE
'This first part will compare each word typed in the searchbox form each word in the keywords column in Sheet2.
For i = LBound(keyword) To UBound(keyword) 'Iterate the number of keywords in a given row
For j = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If UCase(search(j)) Like "*" & UCase(keyword(i)) & "*" Or UCase(keyword(i)) Like "*" & UCase(search(j)) & "*" Then 'compare search term and keyword
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value 'Copy A & B to Sheet3.
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count 'Put a count on Sheet3
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value 'Copy D to Sheet3
End If
Next
Next
For k = LBound(names) To UBound(names) 'Iterate the number of names that were split from the document name.
For l = LBound(search) To UBound(search) 'Iterate the number of words in the searchbox
If Len(names(k)) <= 3 And Len(names(k)) > 1 Then 'Prevents getting top results for being part of a long word, for ex: the word LOC will be found on all words that has it, like "LOCATION".
If UCase(search(l)) = UCase(names(k)) Or UCase(names(k)) = UCase(search(l)) Then 'If it's a short word, it must be the same as the search term
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
Else
If (UCase(search(l)) Like "*" & UCase(names(k)) & "*" Or UCase(names(k)) Like "*" & UCase(search(l)) & "*") And Len(names(k)) > 2 And Len(search(l)) > 2 Then 'compare search term and document name
Worksheets("Sheet3").Range("A" & lrow, "B" & lrow).Value = .Range("A" & x.Row, "B" & x.Row).Value
count = count + 1
Worksheets("Sheet3").Range("C" & lrow).Value = count
Worksheets("Sheet3").Range("D" & lrow).Value = .Range("E" & x.Row).Value
End If
End If
Next
Next
'//SEARCH FUNCTION ENDS HERE
Next x
End With
With tbl.Sort 'sort everything based on count to get best result on top
.SortFields.Clear
.SortFields.Add Key:=sortcol, SortOn:=xlSortOnValues, Order:=xlDescending
.Header = xlYes
.Apply
End With
End Sub
Sub copysearch()
Dim linkrange As Range, c As Range
Dim namerange As Range
Dim hyp As Hyperlink
Dim hyps As Hyperlinks
With Worksheets("Sheet1")
Worksheets("Sheet3").Range("A2:D21").Copy 'Copy the first 20 results
.Range("D13").PasteSpecial Paste:=xlPasteValues 'and paste them on Sheet1
Application.CutCopyMode = False
Set linkrange = .Range("D13:D32")
Set namerange = .Range("E13:E32")
For Each c In namerange 'Iterates all cells from E13 to E32
c.ClearHyperlinks 'Remove all hyperlinks if there are any
If c <> "" Then 'Make sure to not add hyperlinks on empty cells
c.Hyperlinks.Add c, .Range("D" & c.Row) 'Add a hyperlink based on the value of D.
If .Range("G" & c.Row).Value = True Then 'Check if G value is True
.Range("E" & c.Row).Font.Color = vbWhite 'Link is valid, so it's colored white
Else
.Range("E" & c.Row).Font.Color = vbRed 'Link is not valid, colored Red, needs updating.
End If
End If
Next
.Range("E13:E32").Font.Underline = False
.Range("E13:E32").Font.name = "Cambria"
End With
OptimizeVBA False
End Sub
Sub OptimizeVBA(isOn As Boolean) 'Optimize VBA by not running calculations, events, and updates up until Macro is done. Found online.
Application.Calculation = IIf(isOn, xlCalculationManual, xlCalculationAutomatic)
Application.EnableEvents = Not (isOn)
Application.ScreenUpdating = Not (isOn)
End Sub
Function RemoveDupesColl(MyArray As Variant) As Variant 'Online code to remove any duplicates in an array.
'DESCRIPTION: Removes duplicates from your array using the collection method.
'NOTES: (1) This function returns unique elements in your array, but
' it converts your array elements to strings.
'SOURCE: https://wellsr.com
'-----------------------------------------------------------------------
Dim i As Long
Dim arrColl As New Collection
Dim arrDummy() As Variant
Dim arrDummy1() As Variant
Dim item As Variant
ReDim arrDummy1(LBound(MyArray) To UBound(MyArray))
For i = LBound(MyArray) To UBound(MyArray) 'convert to string
arrDummy1(i) = CStr(MyArray(i))
Next i
On Error Resume Next
For Each item In arrDummy1
arrColl.Add item, item
Next item
Err.Clear
ReDim arrDummy(LBound(MyArray) To arrColl.count + LBound(MyArray) - 1)
i = LBound(MyArray)
For Each item In arrColl
arrDummy(i) = item
i = i + 1
Next item
RemoveDupesColl = arrDummy
End Function
Everything is working as intended, though I believe that the code can be further optimized. Please note that I only dabble at VBA and had some experience with VB.Net but am not really a programmer. However, I understand formatting so I still made sure that my coding can still be understood. Comments are added everywhere as I am only a temp in the company and would like to pass it on to someone else just in case..
My difficulties when I started this out:
- Singular/Plural terms: That's why there's an "Or" statement that
inverts the variables in the "Like" statement as a resolution. - 2-3 letter words being found inside bigger words: As we're using
acronyms or shortcuts, there are times that there are results that
are being found that's not really related to what's supposed to show.
So an additional If statements were added specifically for the short
words. - Repeating words in filenames: For some reason, there are filenames that repeat the same word multiple times (The acronyms), and it skews the top result as it matches multiple times. I've used an online code to remove the duplicates via collection method. Hence, there was a need to convert the array to a Variant.
- Opening write-protected documents: Sadly, this wasn't fixed, but basically a Word popup when asking to open as read only does not show on top of the Excel program. Excel meanwhile, is unresponsive until the popup box was answered. Wait too long, and an OLE error will show up. A workaround to this one is to have another Word program open, and the popup will show there.
vba excel
vba excel
New contributor
New contributor
edited 2 days ago
New contributor
asked 2 days ago
Basher
162
162
New contributor
New contributor
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago
add a comment |
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Basher is a new contributor. Be nice, and check out our Code of Conduct.
Basher is a new contributor. Be nice, and check out our Code of Conduct.
Basher is a new contributor. Be nice, and check out our Code of Conduct.
Basher is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208661%2fdocument-search-engine-in-excel%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@Comintern As the codes were sourced online, I wasn't sure if I have to really put it in, but if that's required, I've added it in now.
– Basher
2 days ago
Thanks. The other question I have is what you're referring to with the "Word popup when asking to open as read only". I didn't catch any references in there that would indicate that you're automating Word. Is that what the hyperlinks target? (I think the Z-ordering is a Windows 10 "feature" - i.e. bug)
– Comintern
2 days ago
Yes, the hyperlink also targets Word (.docx) files, as well as PDF, excel files, etc. The other files are opened without issues, and Word documents will open fine too as long as they're not write-protected. Write-protected, meaning it prompts for a password upon opening the file, yet also offers Read Only mode.
– Basher
2 days ago
I've experimented with this, and as long as Word does not prompt any popup box, it will be fine. Changing the file property to Read Only will also work, but the documents are updated on a daily basis, so passwords are used as protection instead. Also we're using Windows 7.
– Basher
2 days ago
Using dictionaries would greatly simplify and speed up your code. Regex would be best but it can be complicated.
– TinMan
2 days ago